Protecting Sensitive Data from Embarrassing Leaks

Monday, 29 May 2006 13:58 by RanjanBanerji

Everyone must have heard the news on how data for 26 million veterans was lost when an analyst's laptop and external driver were stolen from his house. As surprising this may sound this is not new. Having worked for many years in both the government and private/commercial sector I have seen plenty of situations when data has been taken home and therefore has been susceptible to theft. In fact I have heard of many cases where the theft has occurred but discussing those is not the purpose of this article. This article will focus on how to prevent such losses by instituting good process and practices.

The issue at hand is what can be done to prevent such incidents of loss/theft of data? There are basically two ways data can be stolen: 1) Someone hacks into your system and steals the data, and 2) You take the data outside of its domain and you lose it or someone steals it. The purpose of this article is to focus on how to prevent the latter.

Why Does Data Leave its Resting Place?

Let's start with identifying reasons as to why data leaves the secure boundaries in which it is kept.

1. Not all software development occurs within the same domain as the production data. Developers often state that they need production quality data to build software. Hence copies of the production database are made and shipped to locations where the software development takes place.

2. Data is needed in various locations for various reasons such as analysis, reports, etc. Copies of the database are made and shipped.

3. People, developers, analysts etc sometimes work from home or other locations and hence carry with them a copy of the database.

4. Developers may be working with a local copy of the database on their laptops, which they then end up carrying all around.

5. Some so called crisis occurs and people stop thinking and make stupid decisions (See box below).

The list above is not exhaustive. However, various strategies and more importantly policies and process can be established to prevent loss of real data which could result in loss of sensitive, confidential, or secret information.

The problem with most current approaches to preventing such loss of data is that it is very reactive. The current incident of the loss of personal data of 25.5 million veterans is going to be yet another one to create a whole bunch of reactionary policies, most of which will be counter productive and at the end quite ineffective.

The most likely outcome will be a strict lock down of access to data, with lots of permissions required to see the data, with the permissions requiring extensive and lengthy processes. The end result: gross inefficiency.

As I write this, someone walks into my office with a CD with customer data on it asking where they can find a box running SQL Server to setup a database and transfer this data. Fearing the worst, I ask if the data is obfuscated (I will be talking about this) and of course the answer I get is a big "no". I request the immediate destruction of the CD. Management has a different perspective. Apparently the customer is very upset with a certain bug. The developer who did the work is on vacation. It's Friday evening and the customer wants the problem fixed, even if everyone has to work the weekend.

Ah! Crisis management at its best. It appears that the customer is willing to compromise all data security policies. If this story ever leaks, I am guessing the poor developer who carried the CD to our office from the customer's location will be fired and senior management/authorities at the customer will express shock. The same guys who said the bug must be fixed at all costs.

So do not be surprised if you hear of more data being leaked or lost from Banks and government agencies etc.

So we know we need to protect our data and we know we do not want some asinine rigid rules limiting productivity. So what can we do?

Protecting your Data

First look at the different type of data we have. As a gross generalization I am going to say 3 types. They are:

1. Public data. Needs no protection. Contains information anyone can get.

2. Secret Data. This may be confidential business data or confidential government/military data etc. Needs tremendous amount of protection.

3. Sensitive/Personal Data. This is the data with which most problems occur. Requires protection but also requires intelligent distribution.

In this paper I will discuss strategies and processes around the handling of "Sensitive/Personal Data". Public data requires no strict polices and secret data better be guarded. It's the sensitive but not secret data that we often hear about being leaked.

So how do you protect sensitive data? It's actually not that difficult but requires a wide range and strategies from management to technology. Here are some steps that can be taken:

1. Have a policy in place.

Make sure you have a policy in place and all employees are well aware of the policy. Some aspects of the policy should be:

1. The production database or copies of it never ever leave the customers site.

2. The DBA at all times must maintain a list of all databases categorized by security level to be maintained. This list must be reviewed and audited. The number of time I have seen databases that follow the pattern of copy_of_db, copy_of_copy_of_db is amazing. No copies of production should ever exist.

Review and practice your policies frequently. Everyone has rules and polices, but as time goes by we all ignore them and then that's when the problems begin. Have the DBA produce a monthly report on all people who have access to the database, who have requested access and who have requested data dumps.

2. Always Obfuscate Data

Clearly rule and policies are broken. We all know that happens. So what if basic data policy is attached with some basic processes. What if data can be moved around? What if we do need to make a copy of the production data for testing or development? Clearly we should not be developing software against a production database and using developer generated data is never very reliable.

The answer to this is data obfuscation. Identify all fields in your database that contain data that can be classified as sensitive or personal data. Examples of this are:

  • Social Security Number (Though I see no reason for most systems to ever have this)

  • Name, first and last.

  • Telephone

  • Address etc.

Any time a copy of the production data is made change all this data into some gibberish. Ah! Such a time consuming practice you say. I have a crisis to deal with and I need the data right now. I know I know, but there is a process for that too. Clearly you are backing up your database every night? No? You should be fired. Stop reading this, go pack your belongings, and leave.

Every night as you create a backup you could also create an obfuscated copy of your production database. Now you have a database that looks like production but reveals no sensitive data.

Clearly not all databases can be replicated and obfuscated on a nightly basis. But I am sure you get the idea.

3. Database Design

Not all issues can be handled just via process and policies. The design of your database is of extreme importance when it comes to protecting data. Also good databases design will make obfuscation a lot easier.

Here are some design guidelines:

  • Do not use natural data as primary keys. For example do not use SS number as the primary key for TBL_PERSON. This will also give you better performance on your queries.

  • Do not use natural data as foreign keys. For example do not have SSN in two tables and do not use SSN as the foreign key. This will also give you better performance on your queries.

  • Do not keep data that you do not need or does not belong to you. A lot of systems need to communicate with other systems within an organization (sometimes outside too) to access some data. If your database is not the system of record for this data, then hold as little of it as possible. Don't become responsible for something you should not have in the first place.

  • Based on requirements and use cases consider moving sensitive data to another schema that has very limited access. Some systems I have worked on very rarely need to use a person's name, address, SS, telephone etc. Most of the code and system is happy using personID ( a numeric sequential field). However, keeping the person's personal data private and safe is of utmost importance. A good strategy is to separate this data from the main schema.


A combination of these three strategies will hopefully protect you from accidentally losing sensitive data and save you from a lot of grief. Yet these strategies provide the flexibility required for software development and day to day operations of your system. To recap, make sure you have:

  • A policy in place regarding copying and distribution of your production database.

  • Have routine audits and reports of all users and all copies of the database.

  • Obfuscate all sensitive data prior to making a copy of the production database.

  • Make an obfuscated database copy on a routine basis so that a copy of the production database as a "near" production quality is always available.

  • Design your database to never use real information or natural primary keys. This will make obfuscation and transfer of data much easier.

Categories:   IT
Actions:   E-mail | Permalink | Comments (1) | Comment RSSRSS comment feed


February 21. 2011 11:14


Pingback from

Finding Low-Cost Air Purifiers | Wedding News

Add comment

(Will show your Gravatar icon)

  Country flag

  • Comment
  • Preview