Protecting Sensitive Data from Embarrassing Leaks

Monday, 29 May 2006 13:58 by RanjanBanerji

Everyone must have heard the news on how data for 26 million veterans was lost when an analyst's laptop and external driver were stolen from his house. As surprising this may sound this is not new. Having worked for many years in both the government and private/commercial sector I have seen plenty of situations when data has been taken home and therefore has been susceptible to theft. In fact I have heard of many cases where the theft has occurred but discussing those is not the purpose of this article. This article will focus on how to prevent such losses by instituting good process and practices.

The issue at hand is what can be done to prevent such incidents of loss/theft of data? There are basically two ways data can be stolen: 1) Someone hacks into your system and steals the data, and 2) You take the data outside of its domain and you lose it or someone steals it. The purpose of this article is to focus on how to prevent the latter.

Why Does Data Leave its Resting Place?

Let's start with identifying reasons as to why data leaves the secure boundaries in which it is kept.

1. Not all software development occurs within the same domain as the production data. Developers often state that they need production quality data to build software. Hence copies of the production database are made and shipped to locations where the software development takes place.

2. Data is needed in various locations for various reasons such as analysis, reports, etc. Copies of the database are made and shipped.

3. People, developers, analysts etc sometimes work from home or other locations and hence carry with them a copy of the database.

4. Developers may be working with a local copy of the database on their laptops, which they then end up carrying all around.

5. Some so called crisis occurs and people stop thinking and make stupid decisions (See box below).

The list above is not exhaustive. However, various strategies and more importantly policies and process can be established to prevent loss of real data which could result in loss of sensitive, confidential, or secret information.

The problem with most current approaches to preventing such loss of data is that it is very reactive. The current incident of the loss of personal data of 25.5 million veterans is going to be yet another one to create a whole bunch of reactionary policies, most of which will be counter productive and at the end quite ineffective.

The most likely outcome will be a strict lock down of access to data, with lots of permissions required to see the data, with the permissions requiring extensive and lengthy processes. The end result: gross inefficiency.

As I write this, someone walks into my office with a CD with customer data on it asking where they can find a box running SQL Server to setup a database and transfer this data. Fearing the worst, I ask if the data is obfuscated (I will be talking about this) and of course the answer I get is a big "no". I request the immediate destruction of the CD. Management has a different perspective. Apparently the customer is very upset with a certain bug. The developer who did the work is on vacation. It's Friday evening and the customer wants the problem fixed, even if everyone has to work the weekend.

Ah! Crisis management at its best. It appears that the customer is willing to compromise all data security policies. If this story ever leaks, I am guessing the poor developer who carried the CD to our office from the customer's location will be fired and senior management/authorities at the customer will express shock. The same guys who said the bug must be fixed at all costs.

So do not be surprised if you hear of more data being leaked or lost from Banks and government agencies etc.

So we know we need to protect our data and we know we do not want some asinine rigid rules limiting productivity. So what can we do?

Protecting your Data

First look at the different type of data we have. As a gross generalization I am going to say 3 types. They are:

1. Public data. Needs no protection. Contains information anyone can get.

2. Secret Data. This may be confidential business data or confidential government/military data etc. Needs tremendous amount of protection.

3. Sensitive/Personal Data. This is the data with which most problems occur. Requires protection but also requires intelligent distribution.

In this paper I will discuss strategies and processes around the handling of "Sensitive/Personal Data". Public data requires no strict polices and secret data better be guarded. It's the sensitive but not secret data that we often hear about being leaked.

So how do you protect sensitive data? It's actually not that difficult but requires a wide range and strategies from management to technology. Here are some steps that can be taken:

1. Have a policy in place.

Make sure you have a policy in place and all employees are well aware of the policy. Some aspects of the policy should be:

1. The production database or copies of it never ever leave the customers site.

2. The DBA at all times must maintain a list of all databases categorized by security level to be maintained. This list must be reviewed and audited. The number of time I have seen databases that follow the pattern of copy_of_db, copy_of_copy_of_db is amazing. No copies of production should ever exist.

Review and practice your policies frequently. Everyone has rules and polices, but as time goes by we all ignore them and then that's when the problems begin. Have the DBA produce a monthly report on all people who have access to the database, who have requested access and who have requested data dumps.

2. Always Obfuscate Data

Clearly rule and policies are broken. We all know that happens. So what if basic data policy is attached with some basic processes. What if data can be moved around? What if we do need to make a copy of the production data for testing or development? Clearly we should not be developing software against a production database and using developer generated data is never very reliable.

The answer to this is data obfuscation. Identify all fields in your database that contain data that can be classified as sensitive or personal data. Examples of this are:

  • Social Security Number (Though I see no reason for most systems to ever have this)

  • Name, first and last.

  • Telephone

  • Address etc.

Any time a copy of the production data is made change all this data into some gibberish. Ah! Such a time consuming practice you say. I have a crisis to deal with and I need the data right now. I know I know, but there is a process for that too. Clearly you are backing up your database every night? No? You should be fired. Stop reading this, go pack your belongings, and leave.

Every night as you create a backup you could also create an obfuscated copy of your production database. Now you have a database that looks like production but reveals no sensitive data.

Clearly not all databases can be replicated and obfuscated on a nightly basis. But I am sure you get the idea.

3. Database Design

Not all issues can be handled just via process and policies. The design of your database is of extreme importance when it comes to protecting data. Also good databases design will make obfuscation a lot easier.

Here are some design guidelines:

  • Do not use natural data as primary keys. For example do not use SS number as the primary key for TBL_PERSON. This will also give you better performance on your queries.

  • Do not use natural data as foreign keys. For example do not have SSN in two tables and do not use SSN as the foreign key. This will also give you better performance on your queries.

  • Do not keep data that you do not need or does not belong to you. A lot of systems need to communicate with other systems within an organization (sometimes outside too) to access some data. If your database is not the system of record for this data, then hold as little of it as possible. Don't become responsible for something you should not have in the first place.

  • Based on requirements and use cases consider moving sensitive data to another schema that has very limited access. Some systems I have worked on very rarely need to use a person's name, address, SS, telephone etc. Most of the code and system is happy using personID ( a numeric sequential field). However, keeping the person's personal data private and safe is of utmost importance. A good strategy is to separate this data from the main schema.


A combination of these three strategies will hopefully protect you from accidentally losing sensitive data and save you from a lot of grief. Yet these strategies provide the flexibility required for software development and day to day operations of your system. To recap, make sure you have:

  • A policy in place regarding copying and distribution of your production database.

  • Have routine audits and reports of all users and all copies of the database.

  • Obfuscate all sensitive data prior to making a copy of the production database.

  • Make an obfuscated database copy on a routine basis so that a copy of the production database as a "near" production quality is always available.

  • Design your database to never use real information or natural primary keys. This will make obfuscation and transfer of data much easier.

Categories:   IT
Actions:   E-mail | Permalink | Comments (1) | Comment RSSRSS comment feed

Retaining EXIF properties after Editing and Saving a JPEG

Saturday, 6 May 2006 11:34 by RanjanBanerji

EXIF (Exchangeable Image File Format) is a format used by most digital cameras to save images and information on these images.  The standard has been created by JEITA (Japan Electronics and Information Technology Industries Association).  The EXIF format turns out to be very beneficial as the image now contains all sorts of valuable information on the image such as date and time the photograph was taken, make and model of the camera, shutter speed, aperture, ISO, and a lots of other such properties.

Now here is the problem.  Most software that allows you to edit photographs will lose the EXIF information when you save the edited image.  I too experienced this while trying to resize and save the new image.  The solution to the problem is simple in .Net and its implementation of GDI+, yet its difficult to seek out and very rarely used as evidenced by a lot of commercial software that I see out there.

Lets say you wish to resize an image and then save it.  You could write code that would be something like:


//This method will resize the image to half its 
//original size and then save it as another image
public void ResizeAndSave( Image imageToResize ) {
    Bitmap newImage = null;

    int originalWidth = imageToResize.Width;
    int originalHeight = imageToResize.Height;
    int newWidth = (int)( originalWidth * 0.5 );
    int newHeight = (int)( originalHeight * 0.5 );

    newImage = new Bitmap( newWidth, newHeight, PixelFormat.Format24bppRgb );
    newImage.SetResolution( imageToResize.HorizontalResolution, imageToResize.VerticalResolution );

    Graphics graphics = Graphics.FromImage( newImage );
    graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

    graphics.DrawImage( imageToResize, new Rectangle( 0, 0, newWidth, newHeight ),
        new Rectangle( 0, 0, originalWidth, originalHeight ), GraphicsUnit.Pixel );

    //Will not save EXIF from original image
    newImage.Save( "c:\\somefilename.jpg" ); 
     //Will not save EXIF from original image
    newImage.Save( "c:\\somefilename.jpg" , ImageFormat.Exif );
    //Transfer properties
    foreach( PropertyItem propertyItem in imgPhoto.PropertyItems ) {
        newImage.SetPropertyItem( propertyItem );


The code above will resize and save the image, however, you will have lost all the EXIF properties.  I then used the ImageFormat argument for the Save method.  So instead of:

//Will not save EXIF from original image
    newImage.Save( "c:\\somefilename.jpg" );

I used:

 //Will not save EXIF from original image
newImage.Save( "c:\\somefilename.jpg", ImageFormat.Exif );

Well this did not help either. This is when I realzed that simply saving an image after modifying it was not enough.  Also, if you look at the code that is used to resize that image, you will notice that the code entails redrawing the image to a new image.  Therefore there is no transfer of non image properties occurring.  The question then is, what needs to be done?  The answer is quite simple, copy the properties from the source image to the destination image.  To do so we modify the the ResizeAndSave methods by querying the source image for its PropertyItems array and then setting each property to the new  image we have created.  BUT!!!, be warned that is not enough.  Saving the image using Save( filename, ImageFormat ) will not be enough.  Despite setting properties the image when saved will not retain the properties.  This is because these properties are specific to the EXIF format and setting the ImageFormat to be EXIF is not enough.  You need the appropriate ImageCodecInfo to save the new image such that all the new properties you have set are saved.  So the code you need to write will be something like:

 //This method will resize the image to half its
//original size and then save it as another image
public voidResizeAndSave( Image imageToResize ) {
    Bitmap newImage = null;

    int originalWidth = imageToResize.Width;
    int originalHeight = imageToResize.Height;
    int newWidth = (int)( originalWidth * 0.5 );
    int newHeight = (int)( originalHeight * 0.5 );

    newImage = newBitmap( newWidth, newHeight, PixelFormat.Format24bppRgb );
    newImage.SetResolution( imageToResize.HorizontalResolution, imageToResize.VerticalResolution );

    Graphics graphics = Graphics.FromImage( newImage );
    graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

    graphics.DrawImage( imageToResize, newRectangle( 0, 0, newWidth, newHeight ),
        newRectangle( 0, 0, originalWidth, originalHeight ), GraphicsUnit.Pixel );

    //First Transfer properties
foreach( PropertyItem propertyItem inimageToResize.PropertyItems ) {
        newImage.SetPropertyItem( propertyItem );
    //Now save the file
System.Drawing.Imaging.Encoder encoder = System.Drawing.Imaging.Encoder.Transformation;
    EncoderParameters encoderParms = newEncoderParameters( 1 );
    EncoderParameter encoderParm;
    ImageCodecInfo codecInfo;
    //Find the right encoder
ImageCodecInfo[] encoders;
    encoders = ImageCodecInfo.GetImageEncoders();
    for( int i = 0; i < encoders.Length; i++ ) {
        if( encoders[ i ].MimeType == "image/jpeg") {
            codecInfo = encoders[ i ];

    encoderParm = newEncoderParameter( encoder, (long)EncoderValue.CompressionNone );
    encoderParms.Param[0] = encoderParm;
    //Now save the image with the correct encoder information and your properties should be saved.
newImage.Save( "c:\\somefilename.jpg", codecInfo, null); //You dont have to specify EncoderParameters

The code above has been pulled out of one of my applications and has been modified for the purpose of this article.  You may need some minor adjustments if it does not compile.  But in principal it does lay out all the steps required to save an image after editing, such that all the EXIF properties of the original are preserved in the new image.

Categories:   .Net
Actions:   E-mail | Permalink | Comments (1) | Comment RSSRSS comment feed