Skip to content

How to Automate Classification Without Losing Control

Content professionals are often faced with large volumes of content requiring classification. Rules-based processing and automated machine learning classification help make the content classification workload more manageable, but with trade-offs in accuracy, speed, or level of control. This blog addresses ways to balance velocity, accuracy, and control to achieve the optimum outcome.

In there are two ways you can classify content: rules-based classification and machine learning classification. Many people will often select rules-based classification because they feel it gives them control of how content is classified. Machine learning classification relies on a previous corpus of classified documents and the results may not feel as straightforward. But there is a way to take advantage of the benefits of machine learning classification without losing control and I’ll show you how.

First, understanding the difference between classification types

With rules-based classification, content is classified using pre-defined rules. For example, you want all contracts to be classified as type: Contracts.

To do this in, you would create a new rule in the Rules Management section of the Add a New Classifier page. As part of the rule definition process, you either select existing criteria or create a new query that will find all the documents you want to apply the rule to. Once the rule is set up and you’ve tested it, you can turn it on so that as new documents match the query, the rule is automatically applied.

rules-based classification

Rules-based classifier in

Automated data classification is different. It relies on machine learning to help you determine how to classify your information.

“Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.” (from SaS)

Machine learning-based classification works best when you have a large volume of previously classified that can be used to train the machine and a defined set of classifications you will apply to these documents. You then use this document set and classifications to train the machine.

machine-learning classification

Machine learning classifier in

In a perfect world, the machine learns quickly and the classifier can be turned on to automatically run. That’s a perfect world. But what if you aren’t prepared to give full control to automatic classifier?

Automating Classification – Yet Keeping Control

With you can choose to set up an automatic (machine learning) classifier but not auto-validate the classification. Instead, you can set the classifier to make only “suggestions” and you can review and choose to accept the classification or not. You can perform this type of automated machine learning classification on a one-to-one basis or for a segment of files. Essentially, you get the benefits of machine learning but still maintain control.

machine learning classification with suggestions

Machine learning classification – suggestions – in

Controlled machine learning classification provides you with the ability to validate the quality of your matches. By engaging in this two-step process, you inject your knowledge in classification process until you are comfortable that the algorithm is providing high-quality results.

The Benefits of Automated Classification

Machine learning is leveraged when it is hard to identify the precise elements that make a document fit into a class, and is more like “I know it when I see it”. So, the classifier can run and analyze the content, then automatically apply classification using the training documents it learned from.

In the case of controlled automated classification, the machine can make suggestions and a reviewer can either approve or adjust the suggestions based on their knowledge of the information. This two-step process of classification works very well for large volumes of content spread across repositories in a company decreasing the time it takes to classify the content and improving the quality of the classification more quickly than with only machine learning classification.

It would be nearly impossible for a person to know what’s in every repository and how that information should be classified, even with certain known rules. Allowing a machine to perform the first set of analysis on a dataset provides the reviewer with a first look they can then adjust as necessary after either further investigation. This suggest and review approach can continue until the reviewer is confident that the machine is classifying all content accurately.

Any file and content analytics tool worth looking at with provide both fully automated and semi-automatic classification and we’re happy to say that is built to support these and rules-based classification. Interested in seeing how it works? Request a demo and we’ll show you.