Home

In the News

Virus Report

Subscribe Now Online

Media Kit

Archives

Contacts

Calendar of Events

Articles

Article Submissions

Web Seminars

White Papers

Inside Current Issue

Dec/Jan 2007 Issue

Articles

Data: Find, Govern and Comply
By Todd Goldman

Sensitive data is everywhere. Organizations store, process and maintain a great deal of sensitive data and in order to meet compliance requirements, avoid embarrassing data losses or breaches, organizations - both public and private in all industries - are under pressure to better secure their sensitive data.

There are many valid approaches and techniques to securing data, from firewalls, intrusion prevention, intrusion detection, to data encryption and user authentication; but no matter how organizations choose to secure their data, very often companies don't know exactly where all of their sensitive data is located. For example, the first three digits of phone number and the last four digits of social security number often weave their way into many other data fields; fields that are often forgotten. So one of the first necessary steps is to locate all of the sensitive data, including hidden sensitive data and discover how it relates and flows into other data sets. You can't secure what you can't find.

Unsecured data is no longer simply a security issue -- regulatory compliance, negative publicity resulting from data breaches, class-action suits etc., have elevated data security to a C-Level executive concern.

CIO, CSO, CISO, and other C-Level executives are asking:

- How do we protect our data and meet current and future compliance regulations?
- Is data governance the answer? If so, what is data governance?
- Where is the data? And why will it take so long to 'find' it and understand its relationship to the rest of the enterprise?
- What solutions are available?

How can organizations meet current and future compliance regulations?
Industry compliance from Sarbanes-Oxley, HIPAA, PCI, The Gramm-Leach-Bliley Act, Basel II - to the numerous international and local regulations - all require strict data safeguards. It is also safe to assume that any and all future compliance regulations will have similar security requirements. The message is clear, organizations need to better protect their data, but first they need to locate the sensitive data and create an enterprise data map showing lineage of how it flows and transforms as it moves across the organization. Meeting current and future compliance regulations is the driver and the current buzzword that is being deployed is data governance.

What is data governance?
Look up data governance in the dictionary and you won't find a clear definition, but if you were to ask 100 CIOs to define data governance, most will agree that it encompasses the people, processes and procedures to create a consistent, enterprise view of data in order to improve data security, increase consistency and confidence in decision-making, and decrease the risk of regulatory fines.

No matter how your organization defines data governance, the bottom line is that managing data is a good thing, but the first step to managing data is knowing where it is located and how it moves. You can't govern what you can't find.

So the first step is to discover where sensitive and critical data is located and how it flows across the enterprise so it can be managed, secured and, if necessary, consolidated. Unfortunately, the very manual nature of data relationship discovery and mapping makes a seemingly simple process exceedingly time consuming and costly. Governing data encompasses not only locating the data in silos, but also discovering and mapping how it relates to other data elements between datasets, identifying and eliminating inconsistencies, consolidating data sources where appropriate, properly securing systems and creating the ongoing processes to keep the data ?governed? as the business and data evolve in the future.

Why do organizations need to govern data?
Data governance is just now beginning to gain momentum in heavily regulated industries such as financial services and healthcare. But the trend is clear that this mandate is not going away and it will not be long before it finds its way into the infrastructure of every organization that stores and processes sensitive data. Much like regulatory compliance itself, data governance too will start with large organizations and will filter down to even the smallest organizations. Let's face it; every organization has some sensitive data - be it employment records or credit card information.

As compliance regulations continue to proliferate, compliance auditors are no longer asking their clients about data controls, they are now demanding that clients positively attest that they are able to both control and secure sensitive data as well as ensure consistency of data across systems. Data governance is on its way to becoming a business best practice; some may even argue that data governance is the ultimate goal and that compliance is simply the test.

Where is the data? And why will it take so long to 'find'?
While it only takes up three line items on the project plan - find all the locations where sensitive data can be found, discover how that sensitive data flows through the organization and map the data - this seemingly simple process can be impossible even for organizations with very large IT budgets. As with most data integration projects, up to 70 percent of the time and money is spent in the data discovery and mapping phase. And not so coincidently, this is also the place in the process where large data intensive projects fail or are simply abandoned due to high cost and complexity. This is because discovering this map presents many obstacles to the data analyst -- data relationships can be old, inconsistent, missing, hidden inside a software program in a language no one (computer or human) can understand, etc.

Currently, the difficult problem of discovering, documenting, and validating the data relationships between systems is left up to data analysts to perform manually. This is a 40-year-old problem that today is solved by a data analyst armed with his or her trusty highlighter, manually and very slowly, examining the data values in spreadsheets taped to the walls.

How about Metadata tools?
One approach to the problem that is currently in fashion is to examine metadata. This entails comparing column names between systems and using a profiling tool that can discover metadata for a single source at a time to augment the column name information. However, having accurate metadata can help reduce their discovery effort by perhaps only 10 percent. This is because column names are often not very descriptive and even when they are metadata can only help an analyst find relationships where there is no transformation. Since most of the effort spent mapping and discovering business rules between datasets is spent on complex transformations (i.e. case statements, substrings, arithmetic etc.), this approach to mapping data falls short. The basic problem with metadata is that it misses some of the most crucial relationships hidden inside the data.

Solution: data-driven mapping
There is good news on the horizon. There is an emerging category of software solutions that automate this time consuming and manual process, significantly reducing risk and accelerating time to deployment for data governance and integration projects.

This new category of mapping software is an automated data-driven mapping approach in which the software examines and compares actual data values of multiple datasets simultaneously. This approach automatically discovers relationships and complex transformations based on the implicit business rules hidden in the data, organizing them for human analysis and equally important finding the exceptions that exist between datasets which reveal inconsistencies across systems - exposing costly business errors. This approach has been proven to accelerate the mapping process by over five times (5x) while delivering a much higher level of accuracy.

Conclusion
Whether data governance or meeting compliance requirements is the end goal, all organizations should, and increasingly will, be required to know where their data is and how it flows and is transformed as it moves throughout the organization. The bad news is that this discovery and mapping process can be extremely slow and therefore costly, but the good news is that there are new automated data mapping tools coming to the market that can solve this problem. This new generation of data-driven mapping software discovers data relationships the same way data analysts do in the real world by examining the data values.

Once these automated tools are deployed, the previously onerous process of finding, discovering, and mapping data is no longer a roadblock and instead become the foundation for data governance and compliance.

About the Author:
Todd Goldman is vice president of Exeros (www.exeros.com), the providers of Exeros DataMapper, a data mapping and automated data relationship discovery software that analyzes data values to automate the discovery of data relationships and data inconsistencies between structured datasets, thereby radically reducing the risk, cost and time required for data governance and data integration projects.

Go Back

© IMPIRE Communications, LLC All Rights Reserved.  

Website designed & managed by Oculus Networks