Home

In the News

Virus Report

Subscribe Now Online

Media Kit

Archives

Contacts

Calendar of Events

Articles

Article Submissions

Web Seminars

White Papers

Inside Current Issue

Feb/March 2007 Issue

Articles

Data Disclosures: Good Intentions Gone Bad
By Paul A. Henry

Identity theft is now occurring every 79 seconds and it has become the fastest growing crime in America. In light of the safety net available in technology to mitigate the risk of unintended personal data disclosure, the continuing wave of data breaches that are fueling identity theft are simply unacceptable.

The breadth of the problem is also rapidly expanding; personal data breaches are no longer only limited to credit card clearing firms, online banks, brokerage firms and off-shore data clearing firms. This article explores the recent and unintentional data disclosures at three different organizations that perhaps would not traditionally be considered “at risk” for personal data disclosure issues.

Federal Energy Regulatory Commission
The Federal Energy Regulatory Commission (FERC) had released a massive amount of information that was a result of their investigation of Enron and the Western Energy Crisis.

The released information included:
    1. Roughly 92 percent of Enron Staff emails
    2. Over 85,000 records and 150,000 scanned pages of information that was provided to the FERC during the investigation
    3. Forty transcripts related to the case
Anyone with an Internet connection can simply go to the FERC website (www.ferc.gov) and choose to order copies of the data on CD or be forwarded to a link that permits browsing and searching through the data on line.

The data was never sanitized before it was publicly released. A cursory review of only the email data found:
    1. Searching for the term “password” returned 3840 hits
    2. Searching for the term “username” returned 767 hits
    3. Searching for specific banking, credit card and brokerage firm names resulted in complete sets of user credentials
The FERC data disclosure issue is further compounded by Trampoline Systems in their “good intentioned” effort to showcase the capabilities of their Sonar product using the Enron data from FERC. Trampoline Systems has placed a copy of the Enron email database on line at http://enron.trampolinesystems.com/.

Clearly the good intentions of both the FERC and for that matter also Trampoline Systems have gone bad in that neither had considered the exposure of the personal information of perhaps innocent parties that were simply a part of the Enron email system.

AOL’s User Search Data Released
AOL provided another example of good intentions gone bad in its release of search data for 685,000 of their users in an effort to gain recognition from the academic/research community. The data was quickly mirrored across the Internet and was easily downloadable by anyone with an Internet connection.

AOL had apparently thought that by simply removing the users ID number from the respective search string that the data was sanitized enough for release. Unfortunately, they never considered that the search data might also contain other personal data on their users. A cursory analysis of the 20 million search queries released by AOL revealed:

    1. 223 hits for valid social security numbers
    2. 70 hits for valid credit card numbers
    3. Complete names, addresses, telephone numbers and even driver’s license numbers were also easily found in the released AOL data.
A class action lawsuit has been filed by three AOL users in Northern California seeking $1,000 in damages per user affected and an additional $ 4,000 for each user residing in California.

Google’s Safe Browsing Initiative
Google provides a free product as a tool bar add-on to alert users that a web page they are visiting may be asking for personal or financial information under false pretenses (http://www.google.com/tools/firefox/safebrowsing/)

While the intention for Google to thwart phishing with a free product is noble, the data in the form of URL updates provided by Google in support of its effort actually exposes personal information.

A cursory examination of data updates for Google Safe Browsing reveals little has been done to sanitize the data collected and made publicly available by Google. In fact, the Google Safe Browsing data actually contains personal information of persons that had previously visited the URL of phishing sites while Google was collecting data.

A quick web search for “goog-black-url” returns a Google Safe Browsing update: http://sb.google.com/safebrowsing/update?version=goog-black-url:1:-1

Searching within the Google Safe Browsing update data quickly reveals the user names and passwords for Paypal accounts, online Bank Accounts and MySpace accounts. All from victims that had apparently visited a phishing website and mistakenly entered their user name and password.

Another issue has recently been raised regarding the Google Safe Browse product. When running in enhanced mode, each request to visit a web page sends the entire GET request to Google (http://www.google.com/safebrowsing/lookup) in the clear (without the use of encryption). More disturbing is that even when you are visiting a webpage that utilizes SSL to encrypt the data, Google sends a copy of the decrypted GET request to their server. Hence if you were submitting a credit card number to an SSL web server in a GET request, the entire request would be sent to Google in the clear. Effectively anyone on the wire between you and Google would have the ability to see your credit card number.

In its efforts to protect the user from a potential phishing exploit that may expose personal information, Google is actually exposing the personal information found within the users GET requests “in the clear,” which is easily intercepted even when the user is visiting legitimate websites that use encryption (SSL) to protect their personal information.

The good intentions of Google to protect users from phishing sites has gone bad in a number of respects:

    1. The data collected by Google and used with the Google Safe Browsing product is available to anyone with an Internet connection. Hence anyone with an Internet connection has the ability to search through the data to harvest the personal data of users perhaps inadvertently collected by Google.

    2. The transmission of the users personal data in the clear within user GET requests to even legitimate web sites when operating in enhanced mode exposes the users personal data (even when they are doing business on a web server that uses SSL to protect the users data) to anyone along the connection path between the user and Google.
While Google maintains that the user is warned that data may be sent in the clear when running in enhanced mode before they are able to enact it, there is simply no excuse for Google to make the personal information found within the URL updates available for harvesting to the Internet connected public through a simple Google search.

Current methodologies / technologies to mitigate personal data leakage
The technology is readily available to mitigate the risk of personal data exposure. We can quickly examine the use of three different methodologies that are commonly in use today and how they could have impacted the above data leakage examples:

    1. Digital Rights Management
    2. Traditional Secure Content Management
    3. Adaptive Secure Content Management
Digital Rights Management (DRM)


Figure 1.

DRM based content management is effective only in maintaining control over specified documents and is not simply effective in securing data (Figure 1). Further DRM provides no safety net for user error in rights assignment. Hence a wayward/disgruntled document owner or user with access to an unprotected document could potentially assign rights to a third party in order to pass along personal information.

Work flow (Figure 1) in a typical DRM implementation for content security:

    1. Author receives a client licensor certificate (CLC) the first time they rights-protect information.
    2. Author defines a set of usage rights and rules for their file; application creates a “publishing license” and encrypts the file.
    3. Author distributes file.
    4. Recipient clicks file to open, the application calls to the Rights Management Server (RMS), which validates the user and issues a use license.
    5. Application renders file and enforces rights.
The use of DRM in securing the examples given earlier could have potentially restricted access to the AOL data to only the researchers that it was originally intended for, but would not have mitigated the risks of exposure for either FERC or Google where the data was intended to be made generally available to the public.

Secure Content Management (SCM)
The security afforded in the implementation of a traditional SCM is based in part on the administrative development of a data dictionary. In the simplest of terms, the data dictionary contains information such as watermarks, keywords (i.e. password and user name) and generic templates describing the format of credit card numbers, social security numbers, drivers license numbers and other personal information. All content is then filtered against the data dictionary to provide for compliance.

The action taken by a traditional SCM is typically administratively configurable and can in part include blocking an entire document or file that contains administratively prohibited information or by obscuring the administratively prohibited data within a given file or document as determined in a test against the data dictionary

Traditional SCM could have afforded effective risk mitigation in each of the three examples of data leakage given earlier. However security would have been at the cost of high administrative burden in the development of an effective data dictionary.

Adaptive Secure Content Management
Adaptive Secure Content Management (ASCM) provides for the granular filtering capability of a traditional SCM without the administrative burden of creating an extensive data dictionary. While still utilizing traditional content analysis of pattern matching, ASCM also introduces many additional capabilities to further enhance SCM risk mitigation capabilities and operational efficiency (see figure 2) including but not limited to:

    1. Fingerprinting: The fingerprinting engine decomposes a document into a series of algorithm-generated hashes. This collection of hashes is referred to as the document fingerprint. The engine then creates algorithmic hashes for all data being tested and will compare those hashes to known hashes. Fingerprinting looks for exact replicas of protected documents or to detect modifications to protected documents.

    2. Adaptive Lexical Analysis: Documents fed into this engine are examined for lexical structures such as frequency of words and position of words with respect to each other. Once engine is trained on protected documents, it will analyze data looking for lexical structures similar to those within the documents and or data that it was trained on.

    3. Clustering: The clustering engine is trained on groups of documents or data sets that are similar in nature. Clustering considers the individual words, the counts of those words and the correlations between the words in a document or data, and the correlation of the documents and data in relation to others within the group. This way documents and data are placed in mathematical clusters. The clustering engine scans documents and data to determine whether the document or data is similar to know clusters, which would indicate protected content.

    4. Advanced Content Filtering: Allows for searching content using "and" and "or" expressions so that multiple dictionaries and Boolean expressions can be used in combination. Therefore, advanced content filtering can search for combinations of expressions that when used together could constitute a violation, but used individually would not.


ASCM could have afforded effective risk mitigation in each of the three examples of data leakage given earlier without the high administrative burden of traditional SCM offerings in the development of an effective data dictionary.

In Closing
Organizations that perhaps would not typically be considered to be at risk for personal data disclosure are finding themselves inadvertently in the middle of serious data disclosure issues. Even with the best of intentions things can go horribly wrong when technology safety nets are not utilized to support the security of personal data.

About the Author:
Paul A. Henry is Vice President, Technology Evangelism, Secure Computing.

Go Back


Warning: main(copyright.inc) [function.main]: failed to open stream: No such file or directory in /var/www/vhosts/itdefensemag.com/httpdocs/3_07/articles1.php on line 394

Warning: main(copyright.inc) [function.main]: failed to open stream: No such file or directory in /var/www/vhosts/itdefensemag.com/httpdocs/3_07/articles1.php on line 394

Warning: main() [function.include]: Failed opening 'copyright.inc' for inclusion (include_path='.:') in /var/www/vhosts/itdefensemag.com/httpdocs/3_07/articles1.php on line 394