|
Sensitive data is everywhere. Organizations store, process and maintain a great deal of
sensitive data and in order to meet compliance requirements, avoid embarrassing data
losses or breaches, organizations - both public and private in all industries - are under
pressure to better secure their sensitive data.
There are many valid approaches and techniques to securing data, from firewalls,
intrusion prevention, intrusion detection, to data encryption and user authentication;
but no matter how organizations choose to secure their data, very often companies don't
know exactly where all of their sensitive data is located. For example, the first three
digits of phone number and the last four digits of social security number often weave
their way into many other data fields; fields that are often forgotten. So one of the
first necessary steps is to locate all of the sensitive data, including hidden sensitive
data and discover how it relates and flows into other data sets. You can't secure what
you can't find.
Unsecured data is no longer simply a security issue -- regulatory compliance, negative
publicity resulting from data breaches, class-action suits etc., have elevated data
security to a C-Level executive concern.
CIO, CSO, CISO, and other C-Level executives are asking:
- How do we protect our data and meet current and future compliance regulations?
- Is data governance the answer? If so, what is data governance?
- Where is the data? And why will it take so long to 'find' it and understand its
relationship to the rest of the enterprise?
- What solutions are available?
How can organizations meet current and future compliance regulations?
Industry compliance from Sarbanes-Oxley, HIPAA, PCI, The Gramm-Leach-Bliley Act, Basel II
- to the numerous international and local regulations - all require strict data
safeguards. It is also safe to assume that any and all future compliance regulations will
have similar security requirements. The message is clear, organizations need to better
protect their data, but first they need to locate the sensitive data and create an
enterprise data map showing lineage of how it flows and transforms as it moves across the
organization. Meeting current and future compliance regulations is the driver and the
current buzzword that is being deployed is data governance.
What is data governance?
Look up data governance in the dictionary and you won't find a clear definition, but if
you were to ask 100 CIOs to define data governance, most will agree that it encompasses
the people, processes and procedures to create a consistent, enterprise view of data in
order to improve data security, increase consistency and confidence in decision-making,
and decrease the risk of regulatory fines.
No matter how your organization defines data governance, the bottom line is that managing
data is a good thing, but the first step to managing data is knowing where it is located
and how it moves. You can't govern what you can't find.
So the first step is to discover where sensitive and critical data is located and how it
flows across the enterprise so it can be managed, secured and, if necessary,
consolidated. Unfortunately, the very manual nature of data relationship discovery and
mapping makes a seemingly simple process exceedingly time consuming and costly. Governing
data encompasses not only locating the data in silos, but also discovering and mapping
how it relates to other data elements between datasets, identifying and eliminating
inconsistencies, consolidating data sources where appropriate, properly securing systems
and creating the ongoing processes to keep the data ?governed? as the business and data
evolve in the future.
Why do organizations need to govern data?
Data governance is just now beginning to gain momentum in heavily regulated industries
such as financial services and healthcare. But the trend is clear that this mandate is
not going away and it will not be long before it finds its way into the infrastructure of
every organization that stores and processes sensitive data. Much like regulatory
compliance itself, data governance too will start with large organizations and will
filter down to even the smallest organizations. Let's face it; every organization has
some sensitive data - be it employment records or credit card information.
As compliance regulations continue to proliferate, compliance auditors are no longer
asking their clients about data controls, they are now demanding that clients positively
attest that they are able to both control and secure sensitive data as well as ensure
consistency of data across systems. Data governance is on its way to becoming a business
best practice; some may even argue that data governance is the ultimate goal and that
compliance is simply the test.
Where is the data? And why will it take so long to 'find'?
While it only takes up three line items on the project plan - find all the locations
where sensitive data can be found, discover how that sensitive data flows through the
organization and map the data - this seemingly simple process can be impossible even for
organizations with very large IT budgets. As with most data integration projects, up to
70 percent of the time and money is spent in the data discovery and mapping phase. And
not so coincidently, this is also the place in the process where large data intensive
projects fail or are simply abandoned due to high cost and complexity. This is because
discovering this map presents many obstacles to the data analyst -- data relationships
can be old, inconsistent, missing, hidden inside a software program in a language no one
(computer or human) can understand, etc.
Currently, the difficult problem of discovering, documenting, and validating the data
relationships between systems is left up to data analysts to perform manually. This is a
40-year-old problem that today is solved by a data analyst armed with his or her trusty
highlighter, manually and very slowly, examining the data values in spreadsheets taped to
the walls.
How about Metadata tools?
One approach to the problem that is currently in fashion is to examine metadata. This
entails comparing column names between systems and using a profiling tool that can
discover metadata for a single source at a time to augment the column name information.
However, having accurate metadata can help reduce their discovery effort by perhaps only
10 percent. This is because column names are often not very descriptive and even when
they are metadata can only help an analyst find relationships where there is no
transformation. Since most of the effort spent mapping and discovering business rules
between datasets is spent on complex transformations (i.e. case statements, substrings,
arithmetic etc.), this approach to mapping data falls short. The basic problem with
metadata is that it misses some of the most crucial relationships hidden inside the data.
Solution: data-driven mapping
There is good news on the horizon. There is an emerging category of software solutions
that automate this time consuming and manual process, significantly reducing risk and
accelerating time to deployment for data governance and integration projects.
This new category of mapping software is an automated data-driven mapping approach in
which the software examines and compares actual data values of multiple datasets
simultaneously. This approach automatically discovers relationships and complex
transformations based on the implicit business rules hidden in the data, organizing them
for human analysis and equally important finding the exceptions that exist between
datasets which reveal inconsistencies across systems - exposing costly business errors.
This approach has been proven to accelerate the mapping process by over five times (5x)
while delivering a much higher level of accuracy.
Conclusion
Whether data governance or meeting compliance requirements is the end goal, all
organizations should, and increasingly will, be required to know where their data is and
how it flows and is transformed as it moves throughout the organization. The bad news is
that this discovery and mapping process can be extremely slow and therefore costly, but
the good news is that there are new automated data mapping tools coming to the market
that can solve this problem. This new generation of data-driven mapping software
discovers data relationships the same way data analysts do in the real world by
examining the data values.
Once these automated tools are deployed, the previously onerous process of finding,
discovering, and mapping data is no longer a roadblock and instead become the foundation
for data governance and compliance.
About the Author:
Todd Goldman is vice president of Exeros (www.exeros.com), the providers of
Exeros DataMapper, a data mapping and automated data relationship discovery software that
analyzes data values to automate the discovery of data relationships and data
inconsistencies between structured datasets, thereby radically reducing the risk, cost
and time required for data governance and data integration projects.
Go Back
|