Integrating Structured and Unstructured data : are we there already ?

“By 2022, 50% of organizations will include unstructured, semistructured and structured data within the same governance program, up from less than 10% today.” Gartner Market Guide for File Analytics

How many companies have separate solutions to manage structured (database, transactional data) and unstructured data (documents, text, videos, images, email, social media, etc..)? After all, they are very different types of information, so they require different technology and governance approaches. Barb touched on this a bit when she wrote about innovations in information governance for 2019; I want to dig a little deeper.

What if this requirement to separate unstructured and structured data is no longer necessary? What if we merged the strategies and technologies related to structured data governance and unstructured information governance? Can we look at both types of data in a single governance program?

The fact is, we do that today already. Consider the Salesforce object with an invoice attached. Or records in an SAP system connected to some files. Or a NoSQL database with some text fields. Much of the data we manage today is semi-structured, so why have separate solutions to manage each one?

Making Unstructured Data, Structured

“80% of data is unstructured.” I’m sure you’ve heard this. You’ve implemented or are looking at file and content analysis solutions to help you manage it. In your efforts to manage your unstructured data, did you know you are actually making unstructured data structured?

File and content analysis solutions provide capabilities to analyze your information and either automatically or manually enrich and classify it by assigning taxonomy and metadata. You can scan your information for PII, PHI, PCI, custom regular expressions, named entities and so on, to automate metadata creation. This information could be anything – it could be text in a document, a string in a database, or just a tweet. By assigning taxonomy and metadata you are essentially extracting structure from your unstructured content.

Once you extracted structure, you can relate it and examine it alongside other structured data. It only makes sense then that you would want a file analytics solution that can analyze both structured and unstructured data doesn’t it?

Of course, due to compliance and security requirements you can’t simply merge all your data and provide it to everyone in the company in a big data lake; you need data governance.

Federation is the New Repository

Not so long ago we talked about moving everything into a single repository, whether that was Documentum, FileNet or some other system.

But the notion of moving everything to a single repository never became a reality. Now, it’s all about federation and managing “in-place”. So you have data in your ERP and CRM systems, content in your file shares, SharePoint, Office365, and numerous other applications and repositories and you want to keep them where they are. At the same time, you need to ensure they are managed following business and regulatory lifecycle and adequate information policies.

You don’t want to deal with separate solutions to manage data and content in-place. You need a solution that can help you look at your data as a whole and manage it appropriately.

Another thing to keep in mind. GDPR, CCPA (California Consumer Privacy Act) and other soon to come privacy regulations do not differentiate between structured data and unstructured content. It’s all personal information regardless of its form, and you need to be able to connect the dots easily between it all to support things like requests for information and right to be forgotten.

Blurring the Line Between Data Governance and Information Governance

We talk about data governance and we talk about information governance. But the lines are blurring between the two. Often, it’s more a matter of who you are talking to which term you use. If you are talking to IT, you refer to it as data governance, and if you are speaking to lines of business people, you call it information governance.

In the end, we always talk about the same thing – providing the capabilities necessary to connect to your data and content repositories regardless of where or what they are, analyzing the data they contain, figuring out how to organize, enrich and classify it (and get rid of the ROT), and manage the good data according to your business and compliance policies.

Data catalogs exist today to manage structured data and file analysis solutions exist to manage unstructured data. Is there a demand for a single information/data governance catalog?

From the records management and archiving world, we get classification, taxonomy, metadata and data retention or data minimization rules by information asset class. These solutions have been offering these capabilities for the past 20 years. By merging them with data catalogs for structured data and bringing in, not only records but all information (work in progress, convenience copies, and other renditions, etc) we get metadata and taxonomy alignment and we can manage all our data more effectively.

On an Everteam note, this is something we think about as we develop our information governance products (everteam.discover, everteam.policy, and everteam.archive). We already have a structured database connector in everteam.discover mainly used for application decommissioning and to archive some of the data. We can analyze structured and unstructured data side by side. There’s is still work to do to make this convergence happen and we are excited to keep moving forward to create the governance solutions enterprises need. If you’d like to learn more about our products and roadmap, do not hesitate to drop us a note.

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
__hssrc	session	This cookie is set by Hubspot. According to their documentation, whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.
cli_user_preference	1 year	Stores the user's cookies consent status.
cookielawinfo-checkbox-advertisement	1 day	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	1 day	This cookies is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-functional	1 day	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 day	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	1 day	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Others".
cookielawinfo-checkbox-performance	1 day	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 month	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
PHPSESSID	session	This cookie is native to PHP applications. The cookie stores and identifies a user's unique session ID to manage user sessions on the website. The cookie is a session cookie and will be deleted when all the browser windows are closed.
pll_language	1 year	This cookie is set by Polylang plugin for WordPress powered websites. The cookie stores the language code of the last browsed page.
rc::a		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::b		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f		This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wpEmojiSettingsSupports		WordPress place ce cookie lorsqu'un utilisateur interagit avec des emojis sur un site WordPress. Il permet de déterminer si le navigateur de l'utilisateur peut afficher correctement les emojis.

Cookie	Duration	Description
__hssc	30 minutes	This cookie is set by HubSpot. The purpose of the cookie is to keep track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.
li_gc	6 months	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
yt-player-headers-readable		The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available		The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed		The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-fast-check-period		The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app		The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name		The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY		The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
_first_pageview	10 minutes	This is a session cookie set during the first display of the page on each visit. This cookie is used to shoot certain codes on the first display of the page and also to enhance the speed of the website.
AMCV_*AdobeOrg	1 year 1 month 4 days	Adobe-Dtm sets this cookie to find the unique user ID that recognises the user on returning visits.
AMCVS_*AdobeOrg	session	Adobe-Dtm sets this cookie to store a unique ID to identify a unique visitor.

Cookie	Duration	Description
__hstc	1 year 24 days	This cookie is set by Hubspot and is used for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_jsuid	1 year	Clicky sets this cookie to store information about a user's first visit to the site.
_pk_ses.1.00ba	1 hour	Allows temporary storage of your visit data (if Piwik/Matomo audience measurement is enabled)
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
cluid		This cookie is used for websites with multiple domains to identify the same visitor across multiple domains.
CONSENT	16 years 5 months 19 days 15 hours	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.
demdex	6 months	The demdex cookie, set under the domain demdex.net, is used by Adobe Audience Manager to help identify a unique visitor across domains.
hubspotutk	1 year 24 days	This cookie is used by HubSpot to keep track of the visitors to the website. This cookie is passed to Hubspot on form submission and used when deduplicating contacts.
s_cc	session	Adobe Analytics sets this cookie to determine whether or not cookies are enabled in the user's browser.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.

Cookie	Duration	Description
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
PREF		PREF cookie is set by Youtube to store user preferences like language, format of search results and other customizations for YouTube Videos embedded in different sites.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.