How to Automate Classification Without Losing Control

Web Master

5 June 2019

Content professionals are often faced with large volumes of content requiring classification. Rules-based processing and automated machine learning classification help make the content classification workload more manageable, but with trade-offs in accuracy, speed, or level of control. This blog addresses ways to balance velocity, accuracy, and control to achieve the optimum outcome.

In everteam.discover there are two ways you can classify content: rules-based classification and machine learning classification. Many people will often select rules-based classification because they feel it gives them control of how content is classified. Machine learning classification relies on a previous corpus of classified documents and the results may not feel as straightforward. But there is a way to take advantage of the benefits of machine learning classification without losing control and I’ll show you how.

First, understanding the difference between classification types

With rules-based classification, content is classified using pre-defined rules. For example, you want all contracts to be classified as type: Contracts.

To do this in everteam.discover, you would create a new rule in the Rules Management section of the Add a New Classifier page. As part of the rule definition process, you either select existing criteria or create a new query that will find all the documents you want to apply the rule to. Once the rule is set up and you’ve tested it, you can turn it on so that as new documents match the query, the rule is automatically applied.

Rules-based classifier in everteam.discover

Automated data classification is different. It relies on machine learning to help you determine how to classify your information.

“Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.” (from SaS)

Machine learning-based classification works best when you have a large volume of previously classified that can be used to train the machine and a defined set of classifications you will apply to these documents. You then use this document set and classifications to train the machine.

Machine learning classifier in everteam.discover

In a perfect world, the machine learns quickly and the classifier can be turned on to automatically run. That’s a perfect world. But what if you aren’t prepared to give full control to automatic classifier?

Automating Classification – Yet Keeping Control

With everteam.discover you can choose to set up an automatic (machine learning) classifier but not auto-validate the classification. Instead, you can set the classifier to make only “suggestions” and you can review and choose to accept the classification or not. You can perform this type of automated machine learning classification on a one-to-one basis or for a segment of files. Essentially, you get the benefits of machine learning but still maintain control.

Machine learning classification – suggestions – in everteam.discover

Controlled machine learning classification provides you with the ability to validate the quality of your matches. By engaging in this two-step process, you inject your knowledge in classification process until you are comfortable that the algorithm is providing high-quality results.

The Benefits of Automated Classification

Machine learning is leveraged when it is hard to identify the precise elements that make a document fit into a class, and is more like “I know it when I see it”. So, the classifier can run and analyze the content, then automatically apply classification using the training documents it learned from.

In the case of controlled automated classification, the machine can make suggestions and a reviewer can either approve or adjust the suggestions based on their knowledge of the information. This two-step process of classification works very well for large volumes of content spread across repositories in a company decreasing the time it takes to classify the content and improving the quality of the classification more quickly than with only machine learning classification.

It would be nearly impossible for a person to know what’s in every repository and how that information should be classified, even with certain known rules. Allowing a machine to perform the first set of analysis on a dataset provides the reviewer with a first look they can then adjust as necessary after either further investigation. This suggest and review approach can continue until the reviewer is confident that the machine is classifying all content accurately.

Any file and content analytics tool worth looking at with provide both fully automated and semi-automatic classification and we’re happy to say that everteam.discover is built to support these and rules-based classification. Interested in seeing how it works? Request a demo and we’ll show you.

classification, everteam.discover, Machine Learning, ml

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
__hssrc	session	This cookie is set by Hubspot. According to their documentation, whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.
cli_user_preference	1 year	Stores the user's cookies consent status.
cookielawinfo-checkbox-advertisement	1 day	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	1 day	This cookies is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-functional	1 day	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 day	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	1 day	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Others".
cookielawinfo-checkbox-performance	1 day	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 month	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
PHPSESSID	session	This cookie is native to PHP applications. The cookie stores and identifies a user's unique session ID to manage user sessions on the website. The cookie is a session cookie and will be deleted when all the browser windows are closed.
pll_language	1 year	This cookie is set by Polylang plugin for WordPress powered websites. The cookie stores the language code of the last browsed page.
rc::a		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::b		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wpEmojiSettingsSupports		WordPress place ce cookie lorsqu'un utilisateur interagit avec des emojis sur un site WordPress. Il permet de déterminer si le navigateur de l'utilisateur peut afficher correctement les emojis.

Cookie	Duration	Description
__hssc	30 minutes	This cookie is set by HubSpot. The purpose of the cookie is to keep track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.
li_gc	6 months	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
yt-player-headers-readable		The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available		The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed		The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-fast-check-period		The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app		The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name		The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY		The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
_first_pageview	10 minutes	This is a session cookie set during the first display of the page on each visit. This cookie is used to shoot certain codes on the first display of the page and also to enhance the speed of the website.
AMCV_*AdobeOrg	1 year 1 month 4 days	Adobe-Dtm sets this cookie to find the unique user ID that recognises the user on returning visits.
AMCVS_*AdobeOrg	session	Adobe-Dtm sets this cookie to store a unique ID to identify a unique visitor.

Cookie	Duration	Description
__hstc	1 year 24 days	This cookie is set by Hubspot and is used for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_jsuid	1 year	Clicky sets this cookie to store information about a user's first visit to the site.
_pk_ses.1.00ba	1 hour	Allows temporary storage of your visit data (if Piwik/Matomo audience measurement is enabled)
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
cluid		This cookie is used for websites with multiple domains to identify the same visitor across multiple domains.
CONSENT	16 years 5 months 19 days 15 hours	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.
demdex	6 months	The demdex cookie, set under the domain demdex.net, is used by Adobe Audience Manager to help identify a unique visitor across domains.
hubspotutk	1 year 24 days	This cookie is used by HubSpot to keep track of the visitors to the website. This cookie is passed to Hubspot on form submission and used when deduplicating contacts.
s_cc	session	Adobe Analytics sets this cookie to determine whether or not cookies are enabled in the user's browser.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.

Cookie	Duration	Description
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
PREF		PREF cookie is set by Youtube to store user preferences like language, format of search results and other customizations for YouTube Videos embedded in different sites.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.