value of ML. NLP in InfoGov

Machine Learning, NLP and AI: Key to Information Governance

Imagine the information you store in your file shares. Now imagine how much information is stored in your accounting system, invoicing system, your email system, SharePoint… Now imagine you need to find a document and you aren’t sure where it’s stored. Or you are tasked with finding all data and content related to a customer or a contract? Or maybe you’ve realized that the information you are storing is getting out of hand and you need to get a handle on it before something happens. Feeling a little overwhelmed? I don’t blame you. That’s why we need machine learning (ML), natural language processing (NLP) and other artificial intelligence (AI) technology – they make our ability to manage information easier.

ML, NLP  Meet Discovery and Analysis

I’ve always been a fan of any technology that can speed up the insights I need to make better decisions. That’s what AI does, it helps us make better decisions.

Both machine learning and natural language processing are areas of AI, each serving different functions.

Machine Learning (ML)

With ML, a computer processes information and identifies patterns or trends. The more data it ingests and analyzes, the better the results it provides. For example, a computer analyzes the records of a financial institution to try and detect fraudulent activity. The more data it examines, the better it is able to predict fraud will happen before it actually does.

You don’t code ML, but you can ‘teach’ it by giving it a known set of sample “training” data to learn from. In our fraud example, if you have already determined fraudulent activity in a dataset, you can use it to show the computer what to look for in other datasets. There are many types of algorithms you can use in your ML program depending on what outcome you want to achieve. You can also create your own algorithms (which many software vendors do).

Unsupervised ML happens where there is no “training” set of data available. The machine analyzes the information and looks for patterns on its own. A person can then verify the results and provide feedback to the machine, which can then refine its future analysis.

Using ML you can analyze data, identify patterns, make decisions automatically or surface the insights for a human to make decisions.

Natural Language Processing (NLP)

NLP is a branch of AI that processes language. It analyzes large volumes of unstructured data like email, Word and PDF documents to derive syntactic and semantical insights. There are a few different approaches to NLP and they cross the spectrum from analyzing against fully defined ontologies/taxonomies, to leveraging dictionaries all the way to using ML to perform sentiment analysis, statistical entity extraction or completely content-driven discovery.

NLP is what you need to analyze text for personally identifiable information like credit cards, names and addresses. It also helps find common topics and categories of information as well as determines the sentiment of a piece of content. NLP analyzes the full text of a piece of content and in doing so, helps understand its relationship with other content.

How Information Governance Uses ML, NLP

I’ve already given you some idea of how you use ML and NLP in your information governance strategy. Their use is extensive in analyzing the large volumes of information your organization stores. One of the biggest challenges for organizations is dealing with the volume of information they have and the speed (velocity) at which they are creating and ingesting it. When you are dealing with a lot of information, it’s usually across multiple applications and content repositories. It’s also a combination of structured and unstructured information. Finding a technology that helps you bring all this information together, analyze and organize it can be difficult. But that’s exactly what you need, and where ML and NLP come into play.

AI technologies help you analyze large amounts of information in a shorter period, and are usually more accurate. Could you do discovery, analysis and organize functions manually? Sure, but it would take you a long time and the results are prone to human error or non-objective interpretations.

A few examples where AI, including ML and NLP help you:

  • For financial institutions it can help detect fraudulent activity or predict when fraud might occur. It can also help analyze datasets and help predict the best investments to makes (of course that isn’t an information governance capability).
  • For healthcare, you can use AI to analyze IoT data to predict potential problems with machines. From a governance perspective, it can help you scour file shares and other repositories to find PII that isn’t secure and help move that information to a secure location, protecting you from risk.
  • For oil and gas companies you could leverage AI technology to analyze reports written by inspectors when they check on oil wells and machinery to identify potential problems with a well. It can also help them search across a wide range of documents to find all instances of a specific well, showing its history of usage and maintenance.

The point is AI technologies help you understand your information faster, enabling you to reduce the risk of exposure of that information to the wrong people (intentionally or not). It can help you organize that information quickly, allowing you the time to make decisions on whether you want to keep the information where it is, or move it secure locations (or delete it).

As a provider of information governance solutions, Everteam understands the value of ML and NLP. We leverage both AI technologies in our Everteam.discover solution (file and content analytics). We’d love to show you what we’ve created and how it can help you get a better handle on your information.