It’s a simple problem that every organization has – finding and managing dark data – information spread across the organization, often hidden in systems and repositories.
It starts with a use case:
“I need to find duplicate, obsolete and sensitive information in hidden silos to reduce cost and risk.”
Managing dark content (or dark data – you decide which term works best for you) was the topic of our recent webinar and our VP of North America and resident expert on all things related to information governance, Ken Lownie, shared some insights into the challenge and how you can solve it.
Why Is Dark Content a Problem?
Research from the Global Databerg Report said that it will cost organizations 3.3 trillion dollars by 2020 to manage useless data. So yes, the cost is a factor, but it isn’t the biggest driver.
But it’s not the most compelling argument. Risk – in particular, data theft – is a bigger driver. It’s not a matter of if your data will be stolen, but when. And the question becomes what they will get when they do breach your network and how much will they get? What will you do about it afterward? Facebook, Equifax, eBay, JP Morgan and many more have faced major data breaches.
One way you can prevent a breach from being too bad is to reduce the surface area of information they can access so that when they do get through there will less for them to take out.
A third factor – compliance – is getting the most attention right now. Records management, security, privacy – all are influencing the initiatives organizations are starting to work on. Privacy regulations are especially concerning today and include GDPR, CCPA and other regulations in place or coming into play in the near future.
A poll taken during the webinar said that they have a dark content problem, but it’s not a priority yet.
Illuminate Dark Content
There are four core capabilities of this step that help you manage your data content better.
- Private and sensitive information
- Named Entities
Technology got us into this mess, but it will also get us out of it. Here are a few technologies you can leverage to help you find all those things mentioned above:
- Named Entity Extraction: The ability to recognize words and phrases as specific types of objects. For example, find me all the documents that mention a town, or find me all the documents that reference a specific company.
- Machine Learning: You feed the system a lot of documents that match a certain pattern and then ask the system to go find other documents that match and automatically classify them – “How much is this item like these other sample items?”
- Metadata enhancement: Adding new tags based on extracted entities and other content elements. It could be location-based, context-based, information-type based.
Manage Dark Content
Now that you have illuminated (or shone a light) on your dark content, you have indexed it all and classified according to the rules you set up. Now you are ready to move to the next step: Associate.
You’ve sorted everything (classified it), now give it a classification name. Once you classify it, you can associate all that information with certain properties, with a retention rule, with the regulations that define the retention period and it’s associated with certain access rules. It’s also associated with a lifecycle that outlines how to store the information and when to dispose of it.
This association is essentially mapping the content type to one or more policies.
Once you have associated your content with a policy, you can take action on it. For example, you have classified a document as a mortgage document that is associated with the MO-10-02 Mortgage – Closures (MLPA) Policy. That policy includes a retention rule, that when expired, puts the document through a workflow approval and deletion process. It records an audit trail that shows it was deleted and why it was deleted (based on the retention rule).
Identifying Solutions to Help
When you start looking at technologies help you manage your dark content, a few core governance competencies are key to look for:
- Connect – Connect to lots of different repositories – Documentum, SharePoint team drives, structured data, file shares, etc.
- Identify – Automatically find the ROT and sensitive information
- Classify – Based on those properties, automatically classify the content
- Govern – Match those classifications with taxonomies, policies and retention rules
- Act – Enact the policies and do something with the content – move it, migrate it, delete it and remediate it using configurable workflows
- Archive – Preserve the content identified as records in long-term archives to reduce costs and address compliance. Not everything will go in a single repository – only some records will move here. The alternative is to leave the records and content where it is and manage it in-place.
- Enforce – Enforce retention rules, manage disposition and track legal holds – this happens on an on-going basis
You can continue to go on saving every piece of content created and curated and accept the costs of storing that data, but can you afford to take the associated risks? Only keep the content you need to reduce the surface-area that hackers can get at and then make sure you have applied the information and retention policies necessary to ensure that information is secure and accessible by only those who need access. It might be a bit of work to set up, but the benefits in the short and long-run will be worth it.
The good news is that Everteam offers solutions to support the work you need to do to manage your dark data and you can check out a quick demo in the webinar recording below. No time to watch the entire webinar? Jump to 22:43 minutes and check out the demo!