Building a data inventory

Building Your Data Inventory

Organizations are inundated with information. We collect, curate and use it everywhere and we have a lot of it. The volume and velocity of information collected are causing many challenges from storing it, to dealing with privacy-related, to getting rid of it at the right time and more. Addressing these challenges manually system by system or repository by repository is time-consuming and error-prone. There is a better way – and it starts with a data inventory.

Oh the places you store information

Information Assets

A data inventory is a critical element of your information governance model. Set up correctly, it provides you with all the details you need on what information you have, where it’s located and who has access to it. When you need to find a piece of information quickly, it’s where you go. So why do most organizations not have one?

You store information in many places in an organization:

  • Business applications like HR, CRM, ERP and financial systems
  • Line of business applications
  • File shares
  • SharePoint
  • Cloud-drives like Google Drive, DropBox, Office 365

There are the locations you know, and then there’s the dark data that you don’t remember storing, don’t realize is out there. Remember that over 80% of the data located within an organization is dark data. A data inventory helps you keep track of all this information – both dark and known.

Set up a data inventory

The process of setting up a data inventory is not simple, but it’s worth the effort in the long run. It can be as simple as a spreadsheet to start where you record information such as:

  • Repository: The name of the system that contains the information (include details such as description, owner, location, access)
  • Content Type: The identifier for one set of assets (content type, record series) located in the repository (include details such as description and whether or not it contains personal information). You will likely have a lot of content types.
  • Personal Information ID: PI Description (include a description of the personal information, PI Reason, and PI Policy)

How do you find all this information? You can start by talking to people. Schedule meetings, collect information, normalize it and then review it. This approach is very helpful but it’s subject to error. People don’t always know what information they have or where it’s located. In some cases, information may have been stored in file shares by an employee no longer in the department or with the company. In other cases, an application may have been used for a short period of time, but later stopped being used.

To make things even harder, employees often know about information stored in business applications where the data is structured in databases, but they don’t always know or remember about information stored in unstructured file repositories such as shared folders, SharePoint, Box or DropBox or, worst of all, local drives.

Part of the challenge with unstructured repositories is that they are in many formats: Word, PDF, Excel, PowerPoint, plain text, images, videos, etc.. With these types of information, there is often little metadata to help you understand what a piece of content is for.

Tools to help build and manage your data inventory

The manual, interview process of building a data inventory is a good start, but it’s not enough. You need analytics technology to help you truly get a handle on the information located in your organization. The analytics tool you need can connect to various systems and repositories throughout the company, both structured and unstructured. It will profile and index each repository giving you a centralized view of your information.

From there, the right tool will incorporate AI technologies such as machine learning and natural language processing to analyze the information, help categorize and organize it, and when necessary add metadata or additional metadata to improve its organization.

As for that dark data, file and content analytics can help you identify dark content, using information such as:

  • File properties
  • File location
  • File content
  • Personal Information
  • Matched Patterns
  • “Named Entities”

It then uses that information to enhance metadata and categorize the content.

Using and maintaining a data inventory

There are many ways you use a data inventory for business. If you comply with the new EU GDPR regulations, it will help you find information for a data subject request. If you are decommissioning an application, it can help you organize its current content, clean out the ROT and migrate information you want to keep to a new location.

If you are trying to find PII or PCI data stored in unsecured locations, it can tell you where it is and then help you defensibly delete it, or migrate it to a secure repository. If you have an employee who continually places inappropriate information in unsecured locations, you can find out everywhere they have content stored and find out if the information is in the right place.

Like any system that regularly adds new information, your data inventory will grow on a regular basis and you need to continually run the connectors and indexers to add new repositories and re-check existing ones. Information management is a key activity in any organization and as the information created and collected continues to grow, getting – and keeping – control of your information has never been more critical.

To learn how Everteam supports file and content analytics, check out everteam.discover.

And if you want to hear more about building a data inventory, watch the second half of our on demand webinar: How to Get Ready for GDPR with a Data Inventory.