One of the most important elements of an information governance program is the proper classification of your data. A central, formal classification scheme is critical, especially when much of the information – structured and unstructured – is used in multiple departments or teams across the organization. If data is the lifeblood of your organization, then a good classification scheme ensures that everyone can find and leverage that data in their daily work. It also means you have a proven strategy to manage that data appropriately.

The Benefits of Classification

Imagine trying to find a document among thousands of documents spread across multiple file shares, or file-sharing applications. Maybe you know what the document is called, or maybe you only know what its contents hold. Maybe there are multiple versions of the document or multiple copies stored by other departments. Frustrated yet? Who wouldn’t be?

Not only do you have to move from one repository to another to try and find your document, but searching for it in repositories that do offer a search function is returning so many results, it will take forever to sort through it all.

Two things could help you here, and one of them is having a company-wide classification scheme. (For this article, I’ll focus on the classification of your documents and other unstructured content.)

Now, before I go any further, I don’t want you to think that you have to drop everything you are doing and kick off a year-long project to document your entire company’s taxonomy. Not only is that unreasonable to expect, but it also has the potential to slow down your information governance efforts.

Instead, we want you to think about your classification/taxonomy planning the same way we recommend you think about information governance – in phases or projects. Build your classification scheme as you build your governance program – one step at a time. By creating your taxonomy this way, you can add new content types or build on the content types already in the taxonomy, slowly and carefully developing a classification scheme that will work for everyone.

Let’s get back to it.

Effective classification of your content provides many benefits with the ultimate benefit that it gives you greater visibility into your information:

Identify sensitive information like PII, PCI, and other personal information
Separate the good information from the ROT
Respond quicker to requests for information
Assign cost-effective storage tiers
Apply appropriate security controls to prevent accidental disclosure or cyberhacking

There are many examples of the benefits of classification, but I’ll provide two:

The first is responding to data subject requests from privacy regulations like upcoming CCPA or GDPR. Both of these regulations require you to provide a person with all the information you store on them in a certain amount of time (GDPR gives you 30 days, CCPA, 45 days). If you store customer information across many different repositories and each repository stores that information according to its own classification scheme, it’s going to be very challenging to find everything in a short period of time (unless, of course, you have many employees working together to do it – then you’re expending huge amounts of resources on each data subject request).
The second is the risk of a cyber hack, which everyone says is not a matter of “if,” but a matter of “when.” According to a Harris Poll conducted for Symantec in January 2018, 60 million Americans have been affected by identity theft. Much of the data needed to perform identity theft is stolen from businesses that store customer information inappropriately or secure it inadequately. From the same article, “Cybercriminals will steal an estimated 33 billion records in 2023. That’s according to a 2018 study from Juniper Research. This compares with 12 billion records Juniper expects to be swiped in 2018.” If you’re not classifying your information and applying appropriate security policies to it, then you may find yourself one of those affected businesses.

Getting Started with Classification

Some people might think the first step to classification is to get a tool, but that’s not the first step. The first step is to bring together key stakeholders who create, curate and work with your organization’s information to get a complete picture of how information is used not only in one department or division but how that same information may be used in other departments or divisions. Keep in mind that you can still iteratively do this as you work on governance projects.

When you take the time to talk with everyone, you’re able to create a classification scheme that meets the needs of everyone. That’s very important because you don’t want one department to classify content differently from another – you’ll never be able to support regulations like CCPA. Maybe you won’t make everyone happy, but that’s not exactly the point of a central classification strategy.

After you’ve received input from key stakeholders, you can start to define content categories (or content types) and associated metadata. Share the classification scheme with everyone to ensure they follow it.

I said you don’t need tools to start, but investing in the right tools early does have certain benefits. For starters, as you define your taxonomy, you will need a place to record that taxonomy, indicating where and how it’s applied. A solution like everteam.policy can help you do that.

Our product, everteam.discover, connects to all your unstructured repositories, indexes your content and automatically applies your classification schema. It integrates seamlessly with everteam.policy to pull the classification schema to apply.

In everteam.discover, you can classify content in three ways: manually, using rules (query matches), or using machine learning (scanning the contents of a content asset). Auto-classification using rules or ML is necessary when you have enormous amounts of content to classify. It will help you meet regulatory requirements much quicker (and more accurately) than manual classification. But there are also situations where manual classification is necessary.

Machine learning makes it possible to analyze unstructured data semantically to suggest classifications based on text found. You can then add these recommended classifications to everteam.policy.

Classifying Content with everteam.discover

You know how you want to classify your information, but there’s too much do manually (as in one document at a time), so you bring in everteam.discover. Everteam.discover connects to all your repositories and indexes the content. You can then review the content by different facets or views or search for content by a range of parameters. To manually classify a group of documents, you select them all and apply a classification category/content type using the taxonomy you have previously added to the tool.

Once you have identified the rules for classifying documents, you can easily set up steps to begin to automate. Add the rules to an everteam rules-based classifier. The classifier will automatically execute any time a new document is added and apply a category to any documents that match the rules. Any newly added documents will automatically be classified eliminating the manual process.

Machine learning is the third way to classify content in everteam.discover. It enables you to analyze your content and suggest classifications. For machine learning to work, you have to provide some training sets of documents for each classification for everteam.discover to learn from. As more content is indexed and classified, it will get better at assigning the correct classification.

Here’s a look at everteam.discover’s classifier feature:

It’s not always possible to let the machine apply your classifications; you may need to provide a way for certain employees to apply classification manually. A good example here is identifying and dealing with ROT. You may be able to start with auto-classification, but you should have some people intervention to ensure you are getting rid of only the information no longer required.

I’ve only provided a quick overview of how you can use everteam.discover to help you apply your taxonomy to your content. There’s a lot more to understand about how you can use the Classifiers, as well as train a machine learning Classifier; topics we’ll cover in upcoming blogs, so make sure you sign up for our newsletter to hear about new blogs when we publish them.

Classification is not a one-time job

Whether you do it all at once (not recommended if you want to get things done) or do it in phases by initiatives, classification is not a one-time job. You can’t define it once and assume it works that way forever. Managing classifications (taxonomy) is an on-going process as you add new content types to your information, other content changes and the rules for how you manage information changes (new regulations, changes to existing regulations). How you want to use your information for supporting decisions will also affect how you classify your information.

To help you manage your taxonomy on an on-going basis, you can use everteam.policy. It not only enables you to define and manage your current taxonomy, but you can also define retention and life cycle management rules, identify access permissions and share all this information with people and systems across the organization that need to know and follow these classification rules.

I’ll leave you with one final note about classifying your information. A classification content type (or category, depending on the term you use) should provide several things:

Description of the content type and all associated metadata/attributes
The rules for handling that information
How / where to store it
How to dispose of it when it’s no longer needed
The security/permissions to apply to it to ensure only the right people have access

If you’re interested in learning more about how everteam.discover can support the classification of your information (including the 80% dark data hidden in your repositories), reach out and request a demo.

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
__hssrc	session	This cookie is set by Hubspot. According to their documentation, whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.
cli_user_preference	1 year	Stores the user's cookies consent status.
cookielawinfo-checkbox-advertisement	1 day	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	1 day	This cookies is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-functional	1 day	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 day	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	1 day	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Others".
cookielawinfo-checkbox-performance	1 day	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 month	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
PHPSESSID	session	This cookie is native to PHP applications. The cookie stores and identifies a user's unique session ID to manage user sessions on the website. The cookie is a session cookie and will be deleted when all the browser windows are closed.
pll_language	1 year	This cookie is set by Polylang plugin for WordPress powered websites. The cookie stores the language code of the last browsed page.
rc::a		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::b		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wpEmojiSettingsSupports		WordPress place ce cookie lorsqu'un utilisateur interagit avec des emojis sur un site WordPress. Il permet de déterminer si le navigateur de l'utilisateur peut afficher correctement les emojis.

Cookie	Duration	Description
__hssc	30 minutes	This cookie is set by HubSpot. The purpose of the cookie is to keep track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.
li_gc	6 months	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
yt-player-headers-readable		The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available		The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed		The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-fast-check-period		The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app		The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name		The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY		The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
_first_pageview	10 minutes	This is a session cookie set during the first display of the page on each visit. This cookie is used to shoot certain codes on the first display of the page and also to enhance the speed of the website.
AMCV_*AdobeOrg	1 year 1 month 4 days	Adobe-Dtm sets this cookie to find the unique user ID that recognises the user on returning visits.
AMCVS_*AdobeOrg	session	Adobe-Dtm sets this cookie to store a unique ID to identify a unique visitor.

Cookie	Duration	Description
__hstc	1 year 24 days	This cookie is set by Hubspot and is used for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_jsuid	1 year	Clicky sets this cookie to store information about a user's first visit to the site.
_pk_ses.1.00ba	1 hour	Allows temporary storage of your visit data (if Piwik/Matomo audience measurement is enabled)
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
cluid		This cookie is used for websites with multiple domains to identify the same visitor across multiple domains.
CONSENT	16 years 5 months 19 days 15 hours	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.
demdex	6 months	The demdex cookie, set under the domain demdex.net, is used by Adobe Audience Manager to help identify a unique visitor across domains.
hubspotutk	1 year 24 days	This cookie is used by HubSpot to keep track of the visitors to the website. This cookie is passed to Hubspot on form submission and used when deduplicating contacts.
s_cc	session	Adobe Analytics sets this cookie to determine whether or not cookies are enabled in the user's browser.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.

Cookie	Duration	Description
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
PREF		PREF cookie is set by Youtube to store user preferences like language, format of search results and other customizations for YouTube Videos embedded in different sites.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Tech Tuesday: Getting Started with Classification

The Benefits of Classification

Getting Started with Classification

Classifying Content with everteam.discover

Classification is not a one-time job