<Simple LSB Steganography and LSB Steganalysis of BMP Images> COMP 4230-201 Computer Vision Final Project, UMass Lowell <Joshua Tracy> <Joshua_tracy1@student.uml.edu> Abstract This document describes a simple method for hiding and retrieving hidden data in BMP images, using a least significant bit (LSB) method of Steganography. It also describes an accurate method for visually detecting LSB embedded of data. learn. It is good for people just starting out, but can also be advanced by more experienced programmers. Keywords Steganography, Steganalysis, LSB, Least Significant Bit, LSB Enhancement Introduction Steganography is the art of hiding data inside other non-secret data. This document will describe how to implement one such method, so that you can hide secret data. It will also show how to detect hidden data in images that someone using a similar method hid data in. If someone wants to prevent someone from seeing a secret message they are sending they could encrypt the message and make it hard to crack. However encrypting a message does not hide the fact that the message is being sent, so someone who intercepts it will know there is a secret message and start decrypting it. What if there was a way to hide the existence of the hidden message? This effect can be achieved using Steganography, so it is often used alongside cryptography for an added level of security. This document will describe how to hide messages, images, and any other type of file, inside a BMP image using Least Significant Bit (LSB) Steganography. This method hides the data inside the least significant bits of the color values of each pixel of the image. Figures 1 and 2 show an example of this method. There are many steganography methods but LSB Steganography is the easiest method to Figure 1: Image with a steganographically hidden image. Figure 2: image extracted from the image in Figure 1. While being able to hide messages can be helpful, it is not when you are the one the message is being hidden from. Being able to detect that a hidden message is being sent can have great benefits. For example it would be beneficial for the U.S. to be able to detect a message of war sent from a terrorist group. While this document will not go over how to decrypt the secret message, it will go over how to detect the existence of the message. This process of detecting hidden message is called Steganalysis, There are many methods out there for steganalysis but the most common, and the one we will be going over, is LSB Enhancement. Although this method only works to detect messages hidden with LSB embedding, it is the easiest method and 1
therefore good for beginners. LSB Enhancement takes advantage of the fact that images have an even distribution of color values throughout the image. When an image has hidden data this even distribution gets messed up and causes a block of color junk to appear in the image when it is run through LSB Enhancement. Figure 3: Image with hidden data after LSB Enhancement. (color junk at top of image) Background There are many methods that others have used to hide data in images. One method is to take advantage of the BMP image format. BMP images first have file information and pixel data. One part of the file information is the distance between the end of the file information and pixel data [1]. This can be manually changed leaving a huge gap for hiding any data [1]. This allows for limitless amount of data to be hidden without changing the look of the image but changes the size of the image, making it easily detected if the size is outside the size of that image format. Another method is JPEG Compression. With this method the JPEG image is first converted from a RGB to a YUV format. This breaks the image into Y (brightness) and U, V (color) which allows brightness to be down sampled and the color values to be halved to decrease the image file size. This works because the human eye is more sensitive to changes in the brightness of a pixel than to changes in its color [3]. The image then goes through a Discrete Cosine Transformation, which transforms a signal from an image representation into a frequency representation, by grouping the pixels into 8 8 pixel blocks and transforming the pixel blocks into 64 DCT coefficients each [3]. Once the image has been compressed, an LSB embedment of the data is used and then finally, a Huffman coding processed is used to further reduce the size. In 2000 Andreas Westfeld and Andreas Pfitzmann developed Chi-Square analysis which can be used to detect hidden data in images [2]. The way this works is to compare Pair-of-Values observed frequencies with their expected frequencies which will give the probability of hidden data being in the image [2]. These pairs of data are the image pixel color values. Approach The method I used and will describe is least significant bit steganography on BMP images. Before we get into the details of how it is done, we need to understand the layout of a BMP image. When you look at an image you are actually looking at a bunch of little colored squares called pixels that arranged in a specific order to make the image. See Figure 4. Each of these pixels are composed of three colors (red, green, and blue) each ranging in value from 0 to 255. This means we have 256 3 or 16,777,216 possible colors for each pixel. Finally, each color value is an 8 bit binary value. For example, if the value is 255 the binary representation of the value would be 1111 1111. It is in this value that we hide the secret message. Figure 4: This image is a close up of a leaf so you can see the image pixels If we want to hide the message HI we first need to get the binary value that corresponds to each letter. The character H has an ASCII value of 72 and a binary value of 1000 1000. And I has an ASCII value of 73 and a binary value of 1000 1001. This method of LSB Steganography requires two pixels to hide one character, so in this case we will need 4 pixels. 2
To start hiding the message we begin by reading in the color values of the first pixel. Let s say the first pixel we read in is white and the second is black, so we get the values (255, 255,255) and (0,0,0) for the two pixels respectfully. We change the right most bit (LSB) of the first pixel s first color value (red) to be the value of the left most bit of the character we are trying hide. We do the same with the second color value (green) of the first pixel changing its LSB to be the value of the second bit of the character. Now because we are hiding eight bits into six color values we hide two bits into the third color value (blue). We then do the same to the second pixel with the last four bits of the character. This whole process of reading in two pixel s values and hiding a character is repeated until the entire message is hidden. and store those two values into the 4 pixels. When we decode we simply read in these two values, multiply the quotient by 256 and adding the remainder, giving back the length; see Figure 6. Figure 6: example of condensing and retrieving file name length of 2570 characters The final step we take before hiding the actual data is to hide the length of the data. Just like with the file name, we need to know how many characters to read in when we decode. To do this we convert the size to four characters that we can hide. Figure 7 show this process. Figure 5: Shows the hiding of the binary value 10001001 into pixels of values (255,255,255) and (0,0,0) To be able to hide and retrieve files we need to not only hide the characters that make up the file, but we also need to hide some information about the file. The first thing we want to do is hide a tag into the image, using the previously mentioned method, so that when go to retrieve the hidden data, if we detect this tag we know how the data is hidden. The next thing we need is the file name and file name length (number of characters). We need to save the file name because without it, when we go to read the data, we have no idea as to what file extension to save it. We first hide the number of characters into 4 pixels and then the file name itself. This way when we go to read the data we can read the 4 pixels to get the file name length so that we know how many pixels to read from to get the file name without reading too far. The way we make sure the length can be hidden in 2 pixels (two 8-bit characters) is we divide the length by 256 and keep track of quotient and the remainder Figure 7: process to condense the file size 380065 to 4 chars By hiding information about the secret data, and not just the data itself, it allows for easy extraction when you know how the data was hidden in the first place. And since we are only changing the color values of the carrier image by up to 1 or 3 in the case of green, the resulting image with hidden data will look identical to the human eye. If someone has hidden data in an image, we can use Steganalysis to detect if the image contains hidden data. The most common form a Steganalysis is Least Significant Bit Enhancement. With this method you iterate through every color value of each pixel and set each bit to the value of the least significant bit. For example, if we read in a pixel and get the binary color values of 1101 1100, 1000 0101, and 1110 1111, we want to set every bit to be the same as its least significant bit. This will result in 0000 0000, 1111 1111, and 1111 1111. When data is hidden in 3
an image it creates a distortion of the of the images least significant bits. So when we enhance those bits we reveal if there is hidden data. The two images below have been enhanced using this process. Notice how the image with hidden data has a distinct patch of color junk at the top of the image, this is the distortion. Figure 8: image without hidden data To determine the success of the hiding process using LSB Steganography, a test was setup where subjects looked at images side by side like above, to see if they could tell which image had the hidden data. Figure 9: image with hidden data TABLE 1 SUMMARY OF SUBMITTED CODE Filename Description Author steganography.cpp LSB Steganography Joshua Tracy steganalysis.cpp LSB Enhancement Joshua Tracy Steganography was just one part of this project. The other part was Steganalysis. I ran the above images, with and without hidden data, through the LSB Enhancement. The results are shown below. Dataset For my tests, I took a bunch of random images from the web and hid different size data inside them. This gave me a wide variety of image that could be compared to the original image (image with no hidden data). This also gave me images I could run through the LSB Enhancement to determine it success. Evaluation The first step I took was to hide data of different sizes into the images collected from the internet. Below are examples of images before and after the data was hidden in them. 4
Once again a test was set up. The first part showed two side by side images. Then subjects were asked to determine which image had hidden data. To really test the success of the Steganalysis method, further into the test it switched to just one image. This image may or may not have hidden data in it and it was up to the subjects to determine if it did or did not have hidden data. After all the data was collected the accuracy of each subject on each test was calculated. This is shown in Figure 10. Before any of the tests were done it was suspected that the subjects would have a 50/50 shot at guessing which image, which had not gone through the steganalysis process, had the hidden data. And it was suspected that their accuracy would climb to 90% after the Steganalysis process had been done. So it was determined that if this was correct, that the LSB Steganography and LSB Enhancement implementation would be determined a success. After calculating the results, the subjects had an average 50% accuracy on the first test and an average 92% accuracy on the second, making my methods a success. Subject's Accuracy existence of hidden data. The easiest method to implement is LSB enhancement. Team roles As the sole member of the team I was responsible for every part of this project. I collected all the images to be used in the tests, and hid the data inside them. I ran all the images through the detection process and then conducted the data collection for the success of both the Steganography and Steganalysis processes. I also wrote the entirety of both programs totaling over 500 lines of code. References [1] Macklin, Paul. EasyBMP Code Sample: Steganography. 20 February 2011 [2] Steganalysis: Chi-Square Attack & LSB Enhancement. 6 December 2011 [3] T. Morkel, J.H.P. Eloff, M.S. Olivier. An Overview of Image Steganography. July 2005 100% 80% 60% 40% 20% 0% 1 2 3 4 5 6 7 8 9 10 11 12 Steganography Steganalysis Figure 10: data collected from the two tests Conclusion There are many methods for hiding data inside images. These methods are not meant to be uncrackable but are meant to hide the existence of the hidden data. The easiest Steganography method to implement is Least Significant Bit Steganography. While Steganography is meant to hide the existence of hidden data Steganalysis is meant for revealing the 5