PROTOTYPE DEVELOPMENT FOR EMBEDDING LARGE AMOUNT OF INFORMATION USING SECURE LSB AND NEURAL BASED STEGANOGRAPHY BASAM N. SALEH

Size: px

Start display at page:

Download "PROTOTYPE DEVELOPMENT FOR EMBEDDING LARGE AMOUNT OF INFORMATION USING SECURE LSB AND NEURAL BASED STEGANOGRAPHY BASAM N. SALEH"

Kenneth Watkins
6 years ago
Views:

1 PROTOTYPE DEVELOPMENT FOR EMBEDDING LARGE AMOUNT OF INFORMATION USING SECURE LSB AND NEURAL BASED STEGANOGRAPHY BASAM N. SALEH A project report submitted in partial fulfillment of the requirements for the award of the degree of Master of Computer Science (Information Security) Centre for Advanced Software Engineering (CASE) Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia APRIL 2009

2 To my mother, to my father s soul, to my brother who raised me, and to the family iii

3 iv ACKNOWLEDGEMENT Having a chance to enroll in this honorable institute is a real gift that made me realize that nothing is more important than knowledge. My entire study in UTM- CASE was an everyday opportunity to acquire fine knowledge. Many thanks to each lecturer in CASE, they were my guidance to achieve my goals, they gave me all the support I need and were always kind. Many thanks to my Supervisor Prof. Dr. Azizah Bt. Abd Manaf, she was always kind, understanding, and supportive. She was always ready to help me with my project and gave me great ideas that helped me a lot to achieve my work.

4 v ABSTRACT The security of information became a very important issue. Steganography is an effective way to hide the desired secret information in seemingly innocent cover files which are mostly multimedia files. Using multimedia files as hosts to hide the information in will avoid the need to secure the communication when sending secret messages. The challenge to Steganography is the amount of information to be embedded in the host file without affecting the properties of that file and to avoid distortion of the image, the video, or the sound host file and as a result, to avoid detection of hidden information existence. The need for new methods, techniques and algorithms to make enhancements regarding increasing the amount the hidden information, preserving the host file quality, preserving the size of the file, and keep it robust against steganalysis. To achieve these goals, the embedding must be in suitable locations in the multimedia file, choosing the proper. A recent approach is using artificial intelligence that teaches the machine to give the best candidate bits to hide the information in. This approach is remarkably theoretically efficient, and this approach is the basis of this project to implement a prototype that uses this approach. In this project, for embedding, neural network with adaptive smoothing error back propagation that keeps trying to refine the Stego file until it reaches the best embedding results besides another adaptive Steganography method using concepts called main cases and sub cases. In this project, four layers of security will be used to secure the hidden information and to add more complexity for steganalysis and another point of focus in this project will be on embedding the maximum amount of information that can be embedded without affecting the other objectives.

5 vi ABSTRAK Keselamatan maklumat merupakan isu terpenting terutamanya kepada pihak berkuasa dalam urusan pentadbiran harian. Steganography merupakan cara yang efektif untuk menyembunyikan maklumat sulit dalam file multimedia yang kelihatan biasa. Setelah embedding maklumat di host file selesai, sebarang kaedah yang digunakan untuk menghantar maklumat akan menjadi selamat dan kukuh kerana host file ini bukan lagi merupakan titik tumpuan utama. Cabaran untuk Steganography adalah jumlah maklumat yang dapat dimuatisikan ke host file tanpa mempengaruhi property file dan menghindari distorsi pada gambar, video, atau suara host file, justeru dapat menyindari dari sebarang deteksi mengenai kehadiran maklumat tersembunyi. Pencarian terus untuk kaedah baru, teknik dan algorithms untuk membuat perangkat tambahan demi meningkatkan jumlah maklumat yang tersembunyi, melestarikan kualiti serta saiz host file, dan tetap kuat terhadap steganalysis adalah sangat penting. Untuk mencapai tujuan tersebut, kesesuaian lokasi untuk embedding di dalam file multimedia adalah penting dengan memilih bait yang paling sesuai di tempat bit bait. Ini juga merupakan cabaran yang sangat besar kepada steganographers. Pendekatan terkini adalah menggunakan kecerdasan buatan yang mengajar mesin untuk mengesan dan memberikan calon bit yang terbaik untuk menyembunyikan maklumat. Pendekatan ini bukan sahaja secara teoritis efisien, malahan merupakan dasar projek ini untuk menerapkan prototype yang menggunakan pendekatan ini. Dalam projek ini, untuk embedding, neural network dan adaptive smoothing error back propagation yang terus berusaha untuk memperbaiki stego file sehingga mencapai hasil yang terbaik untuk embedding selain daripada adaptif Steganography lain yang menggunakan kaedah yang dikenali sebagai main cases dan sub cases. Dalam projek ini, empat lapisan sekuriti akan digunakan untuk meneguhkan maklumat tersembunyi tersebut dan untuk menambahkan kerumitan untuk steganalysis lain, titik focus dalam projek

6 vii ini akan embedding jumlah maklumat yang maksimum tanpa mempengaruhi tujuan yang lain.

7 viii TABLE OF CONTENTS CHAPTER TITLE PAGE DECLARATION...ii DEDICATION... iii ACKNOWLEDGEMENT...iv ABSTRACT...v ABSTRAK...vi TABLE OF CONTENTS...viii LIST OF TABLES...xii LIST OF FIGURES...xiii LIST OF ABBREVIATIONS...xvi 1 INTRODUCTION Overview Background of the Problem Problem Statement Project Aim Project Objectives Project Scope Summary LITERATURE REVIEW Introduction Image File Formats Image Parameters Most Common Used Image Format...12

8 ix Red Green Blue (RGB) Images Steganography MSE and PSNR Formulas Different Forms of Steganography Steganographic Methods Least Significant Bit ( LSB ) Insertion Steganography LSB in BMP LSB in PNG Image Steganography in GIF image Steganalysis Steganalysis Methods Steganalysis Against LSB Choosing the Best Location in the Cover Image to Hide Information Coding Framework Intelligent Data Embedding Method for LSB Steganography Neural Networks Learning System Summary RESEARCH METHODOLOGY Introduction Requirements Specifications Prototype Architectural Design Prototype Development Testing the Results PROTOTYPE DESIGN Introduction Design Challenges The Prototype Architecture...48

9 x First Security Layer (AES Encryption) Second Security Layer (Adaptive Segmentation) Third Security Layer (Main Cases and Sub Cases) Fourth Security Layer (Neural network) Extraction and Decryption Layer Operational Phases Summary PROTOTYPE IMPLEMENTATION Introduction Implementation Phases Prototype Code Structure and UML Diagrams Summary TESTING THE RESULTS AND CONCLUSION Introduction The Benchmark Prototype Usage Limitations Testing Approaches And Methods Program Performance Results Listing and Analyzing Experiment to Hide Very Small Amount of Information (459bytes) Experiment to Hide ( 9 K.B ) of Information Embedding Experiment Using This Prototype and Maximum Embedding Ability of S-Tools Embedding Experiment Using Maximum Embedding Ability of Both This Prototype and S-Tools Meeting The Objectives...101

10 xi 6.6 Summary And Conclusion REFERENCES...103

11 xii LIST OF TABLES TABLE NO. TITLE PAGE 4.1 Narrowing down sub cases selection into 3 groups of sub cases Choosing the particular suitable sub case for the current pixel Results of the experiment to hide very small amount of information (459 bytes) Results of the experiment to hide (9 K.B) of information using this prototype and S-Tools Results of embedding experiment using this prototype and maximum embedding ability of S-Tools Results of embedding experiment using maximum embedding 99

12 xiii LIST OF FIGURES FIGURE NO. TITLE PAGE 2.1 CT Image Example Framework for Secret Key Passive Warden Steganography Least significant bit Steganography insertion Steganography process to embed secret message in an image The embedding method and its detection A Multi-Layered Perceptron (n-p-n) Neural Networks Neural based Steganography training system architecture Workflow sequence chart The overall flowchart of how the proposed prototype works Prototype layers First security layer (AES encryption) flowchart Second security layer (adaptive segmentation) Third security layer (main cases and sub cases) which is the first Steganography layer The Neural Network layer The extraction and Decryption Layer Main cases and sub cases layer operation during the extraction process 61

13 xiv 4.9 Neural Network operation during extraction and decryption layer Main components (functions) of the prototype code Text file handling diagram Finding the Cover Image Properties Segmentation and Data Hiding The Neural Network Computations of Visual and Statistical Measures Information Extraction handling Information Extraction The User Interface for the prototype (the embedding and measures) User Interface for the prototype (Information Extraction) Pop up message informing the user about using the neural network Pop up message informing the user to select another text file Visual and statistical measures comparison for Stego Images from this prototype and Stego images from S-Tools in the experiment to hide very small amount of information (459 bytes) Visual and statistical measures comparison for Stego images from this prototype and Stego images from S-Tools in the experiment to hide (9 K.B) of information Visual and statistical measures comparison for Stego images from this prototype and maximum capacity Stego images from S-Tools for the embedding experiment using this prototype and maximum embedding ability of S-Tools 97

14 xv 6.6 Visual and statistical measures comparison for Stego images from maximum embedding Capacity of both this prototype and S-Tools 100

15 xvi LIST OF ABBREVIATIONS AES - Advanced Encryption Standard ASE - Adaptive Smoothing Error BMP - Bitmap BP - Back Propagation DLL - Dynamic Linked Library DOS - Disc Operating System FAT - File Allocation Table GIF - Graphic Interchange Format GUI - Graphical User Interface HVS - Human Visual System JPEG - Photographic Experts Group KB - Kilo Byte LSB - Least Significant Bit MB - Mega Byte MC and SC Main Cases and Sub Cases MSE - Mean Squared Error PDF - Probability Density Function PNG - Portable Network Graphics PRNG - Pseudo Random Number Generator PSNR - Peak Signal-to-Noise Ratio QIM - Quantization Index Modulation RA - Repeat-Accumulate RGB - Red Green Blue TIFF - Tagged Image File Format UML - Unified Modeling Language VB - Visual Basic

16 CHAPTER 1 INTRODUCTION 1.1 Overview 1. INTRODUCTION Steganography is the art of passing information in a manner that the very existence of the message is unknown. The goal of Steganography is to avoid drawing suspicion to the transmission of a hidden message. If suspicion is raised, then this goal is defeated. [21] Steganography is defined also as it is the art and science of communicating in a way which hides the existence of the communication. In contrast to Cryptography, where the enemy is allowed to detect, intercept and modify messages without being able to violate certain security premises guaranteed by a cryptosystem, the goal of Steganography is to hide messages inside other harmless messages in a way that does not allow any enemy to even detect that there is a second message present".[21] Steganography applications conceal information in other, seemingly innocent media. Steganographic results may masquerade as other file for data types, be concealed within various media, or even hidden in network traffic or disk space. We are only limited by our imagination in the many ways information and data can be exploited to conceal additional information. [46] Redundant or noisy data can be removed from the original image and replaced with a hidden message. Steganographic technologies are a very important part of the future of Internet security and privacy on open systems such as the Internet. Steganographic research is primarily driven by the lack of strength in the

17 2 cryptographic systems on their own, and the desire to have complete secrecy in an open-systems environment. [47] There are a number of uses for Steganography besides the mere novelty. One of the most widely used applications is for so-called digital watermarking. A watermark, historically, is the replication of an image, logo, or text on paper stock so that the source of the document can be at least partially authenticated. A digital watermark can accomplish the same function; a graphic artist, for example, might post sample images on her Web site complete with an embedded signature so that she can later prove her ownership in case others attempt to portray her work as their own. [17] Steganography can also be used to allow communication within an underground community. There are several reports, for example, of persecuted religious minorities using Steganography to embed messages for the group within images that are posted to known Web sites. [17] Hiding the information in an image is known as the Embedding process, It can be done using various techniques of Steganography, taking in consideration lossless information or image quality, and also it is very important to keep the original file size so that the detection of hidden information will be harder, and the image will not be suspicious. 1.2 Background of the Problem There are several techniques for Steganography, some of which become very complicated to understand. One simple method is LSB (Least Significant Bit), or Least Significant Bit Steganography. The concept of LSB Embedding is simple. It exploits the fact that the level of precision in many image formats is far greater than that perceivable by average human vision. Therefore, an altered image with slight variations in its colors will be indistinguishable from the original by a human being, just by looking at it.

18 3 By using the least significant bits of the pixels color data to store the hidden message, the image itself will seem unaltered. An image is nothing more than strings and strings of bytes, each byte representing a different color. The last few bits in a color byte, however, do not hold as much significance as the first few. This is to say that two bytes that only differ in the last few bits can represent two colors that are virtually indistinguishable to the human eye. For example, and can be two different shades of red, but since it is only the last bit that differs between the two, it is impossible to see the color difference. LSB Steganography, then, alters these last bits by hiding a message within them. [38] As important as the Steganographic technique is, equally important is the choice of the cover image. In LSB Embedding, a poor choice of cover image can lead to a Stego-image that is easily differentiable from the original. Current image formats can be divided into two broad categories, lossy and lossless. Lossy images are those formats, which loses some of the image s data when stored. An example would be JPEG. The plus side of lossy images, in particular JPEG, is that it achieves extremely high compression, while maintaining fairly good quality. However, due to the very nature of lossy formats, it is not suitable for LSB Embedding. [34] Since LSB Embedding spreads the hidden message throughout the image s data, the loss of the image s data by compression would lead to the lost of parts of the hidden message. On the other hand, lossless images are suitable for LSB Embedding, since the integrity of the image data is preserved. However, they do not have the high compression ratio that lossy formats do. Not all lossless images are good candidates as a cover image. 24-bit bitmaps, as well as grayscale images and other color images with small variations in its palette are good candidates as cover images. [34] The main advantage of the LSB coding method is a very high watermark channel bit rate and a low computational complexity of the algorithm, while the main disadvantage is considerably low robustness against signal processing modifications. Increasing Robustness of LSB Audio Steganography by Reduced Distortion LSB Coding. Further More, LSB Embedding has the advantage that it is simple to

19 4 implement. This is especially true in the 24-bit bitmap case. It also allows for a relatively high payload, carrying one bit of the secret message per byte of pixel data. In addition, it is also seemingly undetectable by the average human if done right. However, the assumption has been that the Stego-image is indistinguishable from the original cover image by the human eye. There have been many statistical techniques developed to determine if an image has been subjected to LSB Embedding. [34] It s noticed that almost all of the current LSB algorithms for RGB (Red Green Blue) color scheme are not considering an intelligent method or the use of artificial intelligence to perform the embedding process choosing the best candidates (pixels) to embed the data, even the existing automated implementations are not used a trained machine to embed the data, but the need for such kind of systems is growing due to the new techniques to detect the LSB hidden information, so the LSB Steganography needs a fast reliable method to embed the information in the host image. 1.3 Problem Statement How can we reduce the lack of existence of an intelligent method to choose the best candidate pixels in the cover image file to embed the information in? What method we can suggest for LSB Steganography to solve the problem of choosing the best location in the image to hide large amount of information in a fast accurate reliable way? How can the suggested method be able to avoid detection of the existence of hidden information and avoid the failure of secrecy of the desired communication?

20 5 1.4 Project Aim The aim of this proposed project is to develop a prototype of an intelligent method to choose the best candidate pixels locations in any RGB bitmap image file to hide large amount of information in those pixels. 1.5 Project Objectives The objectives of this project are: i. Developing and implementing a prototype that trains the machine to give the candidate pixels in an RGB image file for the best location to hide information using LSB Steganography. ii. Comparing the resulted Stego images from this prototype to Stego images resulted by S-Tools (the benchmark). iii. Preserving the size of the cover image in the Stego image produced by this prototype from that particular cover image. iv. Preserving the quality of the cover image in the Stego image produced by this prototype from that particular cover image. v. Producing Stego image from this prototype that is robust against specific visual and statistical measures by increasing the complexity of the statistical and visual steganalysis. 1.6 Project Scope The scope of this project will be working on the true 24-bits color RGB bitmap images, and the embedding algorithm will be the LSB algorithm. In testing phase of this project, only specific basic types of visual and statistical measures will be considered regarding the steganalysis complexity that might be performed on the Stego images produced by this prototype. Those measures are comparing the

21 6 brightness difference, the neighbor pixels difference, and the Euclidian norm. The desired file to be hidden in the cover image will be of the format of a Microsoft Windows notepad text file (.txt). The project will consider the most important success factors for Steganography like preserving the file size and keep the size change very small that can be hardly noticed, lossless information, maximum extraction of the hidden information, and preservation of the image quality. 1.7 Summary In this chapter we discussed the aim and objectives of this project and what is the background of the problem that was the reason to choose this topic of the project. The scope was identified for our work and the problem statement was declared.

22 CHAPTER 2 LITERATURE REVIEW 2. INTRODUCTION 2.1 Introduction To achieve the objectives of the proposed project, in this chapter we will list the literature about the necessary information about the purpose of the proposed project and any related experiences. It would lead to a good understanding which will be a primary factor to achieve the project objectives with compliance with the scope and the problem background mentioned in chapter 1. The first section of the literature review (2.2), is an overview about the image types and image file formats, it is necessary to mention the different file formats first because it will be used to clarify what is the difference between them when different types of images will be used as cover images to implement the Steganography on. Image parameters of the different file formats will be discussed as well, and the compression types of the Images, as the compression type will have a major effect on the Steganography process on the image itself. The digital images has different color schemes, the scheme to be discussed in this chapter will be the Red- Green-Blue (RGB) scheme, as the scope of the proposed project is limited by RGB color scheme for the cover image, and also the image palettes will be discussed. Next will be section (2.3), this section will be a brief explanation about the Steganography in general, what is meant by it, what are the requirements of Steganography, different forms of Steganography, methods of Steganography, and

23 8 then after this general idea about Steganography and its methods, we will discuss in more details the Least Significant Bit (LSB) Steganography method, this method is the one which will be used in the proposed project. We will list the implementation of Steganography using the LSB method for different image formats and types and show the difference in its implementation in each of those formats. Section (2.4) will be about the steganalysis, the purpose of mentioning steganalysis is to understand what the commonly used attacks against Steganography are, so that we can propose a robust Steganography method against these attacks, and we will show briefly two examples of steganalysis against LSB Steganography. Section (2.5), will be about how to choose the best location in the cover image to embed the secret message in. We will list the different embedding approaches and show the best location to hide the secret message. And the last section (2.6) will be about an intelligent data embedding method for LSB Steganography and how to use a neural network system to achieve that. 2.2 Image file formats The Internet storage, usage, and speed have been grown day by day since it was founded. Although, the Available Image types (formats) are still relatively limited. The Image format is standardization for specifications to distinguish each image type by encoding information about the image into bits of data for storage. By using different methods of encoding, the image will identify itself to be different from the other types. This encoding will provide information about the matrix size, bit depth, to ease the interaction with the file and dealing with it [1]. There are two basic image format types: Raster images and vector images (some formats are sometimes mixture of the two). Raster Images: A raster image, known also as bitmap, is the more commonly used representation form. It represents an image in a

24 9 matrix or grid of pixels, in this matrix the spatial location and color of the pixel is defined. Raster images is seen by the human eyes as analogous by light sensitive cells in the retina. Each cell has a specific spatial location and measures the frequency (color) and intensity (brightness) of the light at that spot just like the pixel concept. Most known Raster image types are: BMP, GIF, JPEG, and TIFF [2] Vector Images: Also known as geometric image represents an image mathematically by using geometrical concepts such as points, lines, curves and polygons, a vector image can also be considered as a form of storing information about the shapes in an image rather than the raw image itself. The brain appears to represent images in the same way; it recognizes an image by identifying image pattern and shape for the individual objects in the image. Good example of a vector image is textual font like Arial. Fonts satisfy one of the biggest advantages of vector images properties, by being scale independent, because fonts can be scaled to any size without loss in sharpness or detail. Each character in the font is described by a series of geometric curves and lines, so fonts fulfill all the criteria of a vector image. Examples of vector formats are WMF, AI, EPS, and SVG. [2] Image Parameters The image file stores information about the image like the raw image pixel data and the metadata. This metadata, and how to store it, depends on the used format of the image. Usually most formats store information about the matrix size, color space, and bit depth. Other important metadata include: i. Compression type: Most image formats allow data compression for the image data. The two essential types of data compression are lossless compression and lossy compression.[1]

10 a) Lossless compression: its main aim is to reduce mathematical redundancy, in this type of compression the compression algorithm will search for repeating patterns or sequences in the images, and

25 10 a) Lossless compression: its main aim is to reduce mathematical redundancy, in this type of compression the compression algorithm will search for repeating patterns or sequences in the images, and make the reduction of them to a compact form. For example, in a typical passport image, the surrounding edges are usually totally one color for the background (assume it black), this background color would be represented by a long sequence of zeros ( ). The lossless algorithm would detect the sequence and replace it with an encoded form of repeat zero ten times'. In (Figure 2.1) see another good example which is a CT image.[3] Figure 2.1: CT Image Example Lossless compression works by taking advantage of patterns and repetitions in data; The long stream of zeros (black) in line profile A A will easily yield huge compression savings, but the seemingly random profile of B B will see little benefit When it's time to decompress the compressed image to view it, the resulting image is exactly identical to the original source image, and no information would be lost in the compression process. Lossless compression algorithms typically achieve approximately 1:2 space savings, and it gives a better performance with relatively simple images such as diagrams, line art, or images with wide areas of flat colors [3]. b) Lossy compression: lossy compression methods' main aim is to reduce perceptual redundancy. This means the algorithm will consider the limitations of the human eye, and discards data that is deemed nonessential to

26 11 the perceptual quality of the overall image, because the human eyes are not capable of distinguishing of recognizing the very small details. The lossy compression algorithm might then reduce the spatial resolution of the color channels, and smooth the parts of the image that are very bright and very dark. The final decoded lossy image is not identical to the original source, unlike the case of lossless compression. Always more information is discarded and more details will be eliminated and thus more noticeable difference between the compressed image and the original when the compression ratio is increased. For this reason lossy compression usually gives better results in storage size wise, at around 1:10 space savings. Lossy compression works best with photographic images, or images comprised of gradients and tones with few sharp edges. In Steganography, it's strongly recommended to use the lossless compression methods for the cover image as we want to keep the hidden information safe and we don't want to lose any when the decompression time comes.[2] I. Dimensions: in some image formats it's possible for multiple images to be stored in the same file. Like what happens in animated images the sequential images are shown rapidly one after the other. II. Layers: layers are sort of similar to dimensions in a way, they allow for multiple images to be stored in the same file. But the difference is that layers are merged and viewed as a single image, but it has the ability to hide individual layers, this is useful for applications such as overlays, where the textual information is saved as a separate layer from the actual image of interest. III. Others: there are some other types of metadata like date and time of creation, copyright information, comments and others. The metadata is very important and cannot be overstated, as it is vital for proper reconstruction of the image.

27 Most common used image formats The most common and widely used image formats are: i. Png Images: Portable networks graphics (PNG) format is widely used and it has many advantages over the older Internet standard image file formats that make it an attractive Option for digital teaching files. It is possible to repeatedly open, edit, and save files with lossless compression along with gamma and chromic correction. Metadata can be incorporated into files. The PNG format provides a network-friendly, patent-free, lossless compression that is useful for multimedia and Web-based radio logic teaching. ii. GIF Images (Graphic Interchange Format): It is CompuServe's standard for defining generalized color raster images. this 'Graphics Interchange Format' (tm) allows high-quality, high-resolution graphics to be displayed on a variety of graphics hardware and is intended as an exchange and display mechanism for graphics images.[2] The GIF file format uses a relatively basic form of file compression (Lempel Zev Welch, or LZW) that squeezes out inefficiencies in the data storage without losing data or distorting the image. A GIF graphic cannot have more than 256 colors but it can have fewer colors, down to the minimum of two (black and white). Images with fewer colors will compress more efficiently under LZW compression [3]. The format uses a palette of up to 256 distinct colors from the 24-bit RGB color space. It also supports animations and allows a separate palette of 256 colors for each frame. The color limitation makes the GIF format unsuitable for reproducing color photographs and other images with continuous color, but it is well-suited for more simple images such as graphics or logos with solid areas of color [1]. iii. JPEG images: JPEG is a compression algorithm developed by the people the format is named after, the Joint Photographic Experts Group [4].

28 13 The JPG file is wonderfully small, often compressed to perhaps only 1/10 of the size of the original data, which comes so useful when the usage of internet and modems will be involved. A JPEG bit-stream is a sequence of data chunks, each chunk starts with a marker value. A marker is a 16 bit integer value, stored in big Endean byte order, with the most significant byte set to 0xff. The lower byte of the marker value determines its type. A marker is followed by a 16 bit integer value for the size [4]. JPEGs can store image data in 24bit color. Each pixel in an image is represented by 3 bytes representing the red, green and blue values used to generate the final pixel color. Consequently JPEGs are really good for images containing lots of color data [11]. The difference between the 1% and 50% compression is not too bad, but the drop in bytes is impressive [5]. However, this fantastic compression efficiency comes with a high price. JPG uses lossy compression (lossy meaning "with losses to quality"). Lossy means that some image quality is lost when the JPG data is compressed and saved, and this quality can never be recovered [6]. iv. Tagged Image File Format (TIFF): TIFF was developed by Microsoft and Aldus in TIFF is a trademark that was originally registered to Aldus, which subsequently merged with Adobe Systems (San Jose, Calif). Adobe now controls the TIFF specifications copyright. TIFF was created primarily by imaging developers of input and output devices such as printers, monitors, and scanners; as a result, it is specifically designed to be compatible with different image processing devices. The word Tagged in TIFF refers to this format s complicated file structure. The initial header of the file data is followed by chunks of data called tags, which convey the image information to the program displaying the file [7]. TIFF 5.0 was released in 1988 and incorporated support for the LZW compression technique. Although the LZW technique is one of the most popular compression algorithms, its use may be restricted due to proprietary limitations as discussed earlier. Another useful feature of TIFF files is that each file can contain more

29 14 than one image. The primary weakness of TIFF is the large file size that results from the use of lossless compression techniques and tags for conveying image data [8]. TIFF is an image file format. A file is defined to be a sequence of 8-bit bytes, where the bytes are numbered from 0 to N. A TIFF file begins with an 8-byte image file header that points to an image file directory (IFD). An image file directory contains information about the image, as well as pointers to the actual image data [9] Red Green Blue (RGB) Images An RGB image, sometimes referred to as a true color image, is stored in MATLAB as an m-by-n-by-3 data array that defines red, green, and blue color components for each individual pixel. RGB images do not use a palette. The color of each pixel is determined by the combination of the red, green, and blue intensities stored in each color plane at the pixel's location. Graphics file formats store RGB images as 24-bit images, where the red, green, and blue components are 8 bits each. This yields a potential of 16 million colors. The precision with which a real-life image can be replicated has led to the commonly used term true color image [10]. An RGB array can be of class double, uint8, or uint16. In an RGB array of class double, each color component is a value between 0 and 1. A pixel whose color components are (0,0,0) is displayed as black, and a pixel whose color components are (1,1,1) is displayed as white. The three color components for each pixel are stored along the third dimension of the data array. For example, the red, green, and blue color components of the pixel (10,5) are stored in RGB(10,5,1), RGB(10,5,2), and RGB(10,5,3), respectively [10]. To determine the color of the pixel at (2,3), you would look at the RGB triplet stored in (2,3,1:3). Suppose (2,3,1) contains the value , (2,3,2) contains , and (2,3,3) contains The color for the pixel at (2,3) is [10].

30 15 How many numbers are used to specify the color of each pixel is the number of channels each pixel has. In RGB as described above, an image has three numbers for each pixel that directly correspond to the three R, G and B elements in the computer display. Such RGB images have three channels [11]. When it is taken in consideration that an image that is 1000 pixels wide by 1000 pixels high contains a million pixels overall, if we have three numbers for each pixel (one number each to control the R, G and B dots) that can add up to very many bytes of data. If each number is just one byte, then a one million-pixel image will take three megabytes of space. Not surprisingly, there have been many clever software schemes invented to reduce the amount of space required for an image [12]. Using one number per pixel in a million-pixel image reduces the size to only one megabyte, but at the price of seeing the image in shades of gray as a monochrome ("black and white") or grayscale image. Ordinary RGB images can have invisible pixels but this is a simple ON/OFF effect for each pixel. RGB images can have a different percent transparency for each individual pixel in the image. This is called pixel transparency [13]. Palette images which is also known as Indexed Color mode images or color maps sometimes, the reason to use palette images is to reduce the size of images that will be published on Internet. Even with fast connections, Internet is so slow that it is very important to reduce the size of images used on web sites. Reducing the number of colors in an image allows web graphics formats such as.gif and.jpg the best possibilities of compressing the image to a small size [14]. Palette images save space in the image file by using one number per pixel (one channel) to specify the color for each pixel and reducing the number of colors used in this image to only 256 colors. Each color number corresponds to a color in a palette of 256 colors. The palette colors are true Color RGB colors out of a possible range of millions of colors [15].

31 Steganography Steganography refers to the science of invisible communication. Unlike cryptography, where the goal is to secure communications from an eavesdropper, Steganographic techniques strive to hide the very presence of the message itself from an observer [16]. Alice is wishing to send a secret message m to Bob. In order to do so, she "embeds m into a cover-object c and using a Stego key k, to obtain the Stego object s. The Stego-object s is then sent through the public channel. The following formula is the general form of Steganography implementation Cover object + secret message + Stego key = Stego object: c + m + k = s. In a pure Steganography framework, the technique for embedding the message is unknown to Wendy and shared as a secret between Alice and Bob. However, it is generally not considered as good practice to rely on the secrecy of the algorithm itself. In private key Steganography Alice and Bob share a secret key which is used to embed the message. The secret key, for example, can be a password used to seed a pseudo-random number generator to select pixel locations in an image cover-object for embedding the secret message (possibly encrypted). Wendy has no knowledge about the secret key that Alice and Bob share, although she is aware of the algorithm that they could be employing for embedding messages. In public key Steganography, Alice and Bob have private-public key pairs and know each other s public key. We restrict our attention to private key Steganography [17]. Wendy s warden; who is free to examine all messages exchanged between Alice and Bob can be passive or active. A passive warden simply examines the message and tries to determine if it potentially contains a hidden message. If it appears that it does, she suppresses the message and/or takes appropriate action, else she lets the message through without any action. An active warden, on the other hand, can alter messages deliberately, even though she does not see any trace of a

32 17 hidden message, in order to foil any secret communication that can nevertheless be occurring between Alice and Bob. The amount of change the warden is allowed to make depends on the model being used and the cover objects being employed. For example, with images, it would make sense that the warden is allowed to make changes as long as she does not alter significantly the subjective visual quality of a suspected Stego-image (See figure 2.2). Figure 2.2: Framework for Secret Key Passive Warden Steganography MSE and PSNR formulas It is important to understand two terms in Steganography, the mean-squared error (MSE) between two images I1(m,n) and I2(m,n), which is: MSE M N =, [ I1 ( m, n) I 2 ( m, n)] M N 2 M and N are the number of rows and columns in the input images, respectively. Mean-squared error depends strongly on the image intensity scaling. A mean-squared error of for an 8-bit image (with pixel values in the range 0-255) looks dreadful; but a MSE of for a 10- bit image (pixel values in [0,1023]) is barely noticeable. And the other term is Peak Signal-to-Noise Ratio (PSNR) which avoids this problem mentioned above for MSE by scaling the MSE according to the image range:

33 18 2 R PSNR = 10log 10 MSE PSNR is measured in decibels (db). PSNR is a good measure for comparing restoration results for the same image, but between-image comparisons of PSNR are meaningless [18] Different forms of Steganography It is important to know the various forms of Steganography to be aware of the useful methods, the weakness in each form, and hence the choice of the method will be for clear logical reasons to work on in this project. With the continued growth of strong graphics power in computers and the research being put into image based Steganography, this field will continue to grow at a very rapid pace. Coding secret messages in digital images is by far the most widely used of all methods in the digital world of today. This is because it can take advantage of the limited power of the human visual system (HVS). Almost any plain text, cipher text, image and any other media that can be encoded into a bit stream can be hidden in a digital image [19]. As Duncan Sellars [20] explains "To a computer, an image is an array of numbers that represent light intensities at various points, or pixels.. These pixels make up the images raster data." When dealing with digital images for use with Steganography, 8-bit and 24-bit per pixel image files are typical. Both have advantages and disadvantages, 8-bit images are a great format to use because of their relatively small size. The drawback is that only 256 possible colors can be used which can be a potential problem during encoding. Usually a gray scale color palette is used when dealing with 8-bit images such as (.GIF) because its gradual change in color will be harder to detect after the image has been encoded with the secret message. 24-bit images offer much more flexibility when used for Steganography.

34 19 The large numbers of colors (over 16 million) that can be used go well beyond the human visual system (HVS), which makes it very hard to detect once a secret message, has been encoded. The other benefit is that a much larger amount of hidden data can be encoded into a 24-bit digital image as opposed to an 8-bit digital image. The one major drawback to 24-bit digital images is their large size (usually in MB) makes them more suspect than the much smaller 8-bit digital images (usually in KB) when sent over an open system such as the Internet. Digital image compression is a good solution to large digital images such as the 24-bit images mentioned earlier. There are two types of compression used in digital images, lossy and lossless. Lossy compression such as (.JPEG) greatly reduces the size of a digital image by removing excess image data and calculating a close approximation of the original image. Lossy compression is usually used with 24-bit digital images to reduce its size, but it does carry one major drawback. Lossy compression techniques increase the possibility that the uncompressed secret message will lose parts of its contents because of the fact that lossy compression removes what it sees as excess image data. Lossless compression techniques, as the name suggests, keeps the original digital image in tact without the chance of loss. It is for this reason that it is the compression technique of choice for Steganographic uses. Examples of lossless compression techniques are (.GIF and.bmp). The only drawback to lossless image compression is that it doesn't do a very good job at compressing the size of the image data [21] Steganographic Methods Embedding a large amount of data into the picture can modify its visible properties. It is important that the embedded data size should be minimized. There are different requirements depending on the purpose of Steganography: Capacity: it is an important factor in captioning applications, when a lot of information should be embedded into a cover image, what is usually related

35 20 to the current picture. For example when transmitting medical images, the personal data, and the diagnosis could be embedded into the same picture. Imperceptibility: it is important when a secret communication occurs between two parties and the fact of a secret communication is kept to be secret. Robustness: watermarking, fingerprinting and all copyright protecting applications demand robust Steganographic method, i.e. Where the embedded information cannot be removed without serious degradation of the image [22]. The main Steganography method is Public-Key Steganography. A public-key Steganography protocol allows two parties, who have never met or exchanged a secret, to send hidden messages over a public channel so that an adversary cannot even detect that these hidden messages are being sent. As another possible way the algorithm requires the pre-existence of a shared secret key to designate pixels which should be tweaked. In this case both the sender and the receiver must have this secret. Suppose that the communicating parties do not have the opportunity to agree a secret key, but one of them (e.g. Bob) has a private/public key pair, and his partner knows the public key. In the case of a passive warden Alice knowing Bob s public key encrypts her message with this key, embeds it in a known channel (known position in the cover media), and sends it to Bob. Bob cannot be sure whether the channel contains a hidden message, but he can try to decrypt the random-looking string-sequence with his private key, and check whether it is a message or not [23]. A public-key stegosystem is a triple of probabilistic algorithms S= (SG,SE,SD). SG(1k) generates a key pair ( PK, SK) pk sk. SK takes a (public) key PK pk, a string * m {0,1} (the hidden text), and a message history h. SE also has access to a channel oracle for some channel C, which can sample from Ch for any h. SE(PK, m, h) returns a sequence of documents s1, s2,.., si (the stegotext) from the support of i C h. SD takes a (secret) key SK sk, a sequence of documents s1,

36 21 s2,.., s i, and a message history h, and returns a hidden text m. Additionally, for every polynomial p there must exist a negligible µ such that: m {0,1} p( k) : Pr k ( PK, SK) SG(1 ) [ SD( SK, SE( PK, m, h), h = m] 1 µ ( k) Where the randomization is also over any coin tosses of SE, SD, SG and the oracle to Ch. The secret message that Alice wants to send to Bob is called the hidden text; documents from the channel are called cover texts, and documents that are output by SE are called Stego texts. It must also be stressed that SE need not know the exact probabilities of documents in Ch. This is important to mention, as it is unreasonable to assume that the probabilities in Ch are known, whereas anybody communicating can be thought of as an oracle for the channel distribution Ch [23] The destination extraction algorithms can be divided into two groups: spatial/time domain and transform domain techniques. In the former case information is embedded in the spatial domain in the case of images, and in time domain in the case of audio materials. The transform domain methods operate in the Discrete Cosine Transform, Fourier or wavelet transform domains of the host signal. The Patchwork algorithm (developed at the MIT) selects random pairs of pixels, and increases the brightness of the brighter pixel and decreases the brightness of the other. This algorithm shows a high resistance to most non-geometric image modifications. In the case of a transform domain operation the embedding process can cause visible changes if the embedded data size is too big, and the limit where a given embedded data size does not change the visual properties of the image is image dependent [24].

37 Least Significant Bit (LSB) Insertion Steganography Usually 24-bit or 8-bit files are used to store digital images. The former one provides more space for information hiding; however, it can be quite large. The colored representations of the pixels are derived from three primary colors: red, green and blue. 24-bit images use 3 bytes for each pixel, where each primary color is represented by 1 byte. Using 24-bit images each pixel can represent 16,777,216 color values. We can use the lower two bits of these color channels to hide data, then the maximum color change in a pixel could be of 64-color values, but this causes so little change that is undetectable for the human vision system. This simple method is known as Least Significant Bit insertion. Using this method it is possible to embed a significant amount of information with no visible degradation of the cover image. Figure 2.3 shows the process [25], [26]. Figure 2.3 : least significant bit Steganography insertion Several versions of LSB insertion exist. It is possible to use a random number generator initialized with a Stego-key and its output is combined with the input data, and this is embedded to a cover image. For example in the presence of an active warden it is not enough to embed a message in a known place (or in a known sequence of bits) because the warden is able to modify these bits, even if he can t decide whether there is a secret message or not, or he can t read it because it is

38 23 encrypted. The usage of a Stego-key is important, because the security of a protection system should not be based on the secrecy of the algorithm itself, instead of the choice of a secret key. Figure 2.4 : Steganography process to embed secret message in an image [27] The LSB inserting usually operates on bitmap images. Steganos for Windows and Wbstego are LSB inserting software products which are able to embed data (in clear or encrypted format) in a bitmap image. The embedded data cannot be considered as a watermark, because even if a small change occurs in a picture (cropping, lossy compression, color degradation) the embedded information will be lost although the change which is occurred during the embedding process is invisible [28]. The original bitmap picture which was used during the test was a picture pixel in size, with 16M colors (it is a standard test picture in image processing). We made a test using bitmap images [27]. There are a various methods to implement the LSB Steganography itself; the following are some commonly used methods: LSB in BMP The LSB Steganography in a Bitmap image can take different forms, the most common forms are:

39 24 i. Stego one bit: The message can be stored in the LSB of one color of the RGB value or in the parity of the entire RGB value. Changing the LSB will only change the integer value of the byte by one. This will not noticeable alter the visual appearance of a color and hence the image itself. Changing a more significant bit would cause a proportionately greater change in the visual appearance of a color. The main objective of Steganography is to pass a message to a receiver without an intruder even knowing that a message is being passed which means that there should be no discernable change to the carrier. This should have very little effect on the appearance of the image. This process will most likely result in the formation of new colors for the palette. Therefore the image used must have a palette size of 128 colors or less. This will allow for a doubling of the colors in the palette (the creation of a new color for every existing color in the palette) which is the maximum number of colors that could be produced by this method. It may be found that if the palette is ordered by luminance that there will be pairs of very similar colors. How noticeable that is depends on the color profile used in the image to start with. Practical methods should allow for the use of the full image size, thus the amount of data that can be hidden is proportionate to the number of pixels in the image rather than to the colors in the palette. The only restriction is then the size of the image. Using the image data for embedding is less restrictive on capacity compared to storing the data in the palette itself. Using a 128 palette image should not result in too much distortion to the original image [29]. ii. Stego two bits: Using this method two LSBs of one of the colors in the RGB value of the pixels will be used to store message bits in the image. This will involve using an image which has a palette with a maximum of 64 colors allowing for the production of a possible 192 new colors, i.e., three new colors for each existing color. Fewer colors will be available to represent the starting image and hence it will be more degraded than the image used in the method Stego One Bit. The advantage of this method is that twice as much information can be stored here than in the previous method. This method could instead have involved the use of the LSB of

40 25 two colors in the RGB value which would have resulted in the same amount of storage space. The starting image would still have to have a palette containing 64 colors [30]. iii. Stego three bits: Using this method three LSBs of one of the colors in the RGB value of the pixels will be used to store message bits. This will involve using an image which has a palette with a maximum of only 32 colors allowing for the production of a possible 224 new colors, three new colors for every existing color in the image. The data hiding capacity is three times the storage capacity of Stego One Bit but the image will be even more distorted than if a 128 color palette was used.[31] iv. Stego four bits: Using this method four LSBs of one of the colors in the RGB value of the pixels will be used to store message bits. This will involve using an image which has a palette with a maximum of only 16 colors allowing for the production of a possible 240 new colors. The colors are now very restricted but an area of one particular color in the image may have 16 variations distributed through it which could result in a certain amount of texture mitigating the effects of such a restricted palette.[31] v. Stego color cycle: In order to make the detection of the hidden data more difficult it was decided to cycle through the color values in each of the pixels in which to store the data. This also means that the same color was not constantly being changed. For example the first data bit could be stored in the LSB of the blue value of the pixel, the second data bit in the red value and the third data bit in the green value, the alpha value will be skipped and the next color used will be blue again. This is because changing the alpha value which is generally 255 would look too suspicious unless the image used contained different transparency levels [31]. vi. Stego1bitprng: A pseudorandom number generator (PRNG) can be used to choose random pixels in which to embed the message. This will make the message bits more difficult to find and hopefully reduce the existence of patterns in the image. Most importantly it means that if an attacker removed the LSBs from one of the colors and tried to read them it would

41 26 make no sense as they would not be in order. A pseudo random number generator (PRNG) will be created and will be used to select the pixels in which to hide the data. Data will then be hidden in the LSB of the blue value. If the message is much smaller than the capacity of the image a problem may occur whereby the information will be packed into one part of the image for example the top half. This is solved by using a PRNG which will spread the message all over the image. Hide and Seek arranges it so that the message bits will not be beside one another but instead randomly dispersed throughout the image. Hence the noise will also be randomly distributed. A user chosen key can be inserted into a pseudo random number generator which will determine a sequence of random numbers. These numbers will indicate the pixels in the image where the least significant bit is to be changed. This makes the system more secure because the reader of the message must know the key in order to determine in which bytes the message bits are hidden. The key must remain unknown to the attacker. If the cover image was known to the attacker, embedding the message in a random way would improve its security.[30] LSB in PNG Image When images are used as the carrier in Steganography they are generally manipulated by changing one or more bits of the byte or bytes that make up the pixels of an image. The message can be stored in the LSB of one color of the RGB value or in the parity bit of the entire RGB value.a PNG is capable of hiding quite a large message. LSB in PNG is most suitable for applications where the focus is on the amount of information to be transmitted and not on the secrecy of that information. If more number of bits is altered it may result in a larger possibility that the altered bits can be seen with the human eye. But with the LSB the main objective of Steganography to pass a message to a receiver without an intruder even knowing that a message is being passed is being achieved [31].

42 Steganography in GIF images Since GIF images only have a bit depth of 8, the amount of information that can be hidden is less than with BMP. Embedding information in GIF images using LSB results is almost the same results as those of using LSB with BMP. LSB in GIF is a very efficient algorithm to use when embedding a reasonable amount of data in a grayscale image. GIF images are indexed images where the colors used in the image are stored in a palette. It is sometimes referred to as a color lookup table. Each pixel is represented as a single byte and the pixel data is an index to the color palette. The colors of the palette are typically ordered from the most used color to the least used colors to reduce lookup time. Some extra care is to be taken if the GIF images are to be used for Steganography. This is because of the problem with the palette approach. If the LSB of a GIF image is changed using the palette approach, it may result in a completely different color. This is because the index to the color palette is changed. The change in the resulting image is noticeable if the adjacent palette entries are not similar. But the change is not noticeable if the adjacent palette entries are similar. Most applications that use LSB methods on GIF images have low security because it is possible to detect even moderate change in the image Solutions to these problems could be by sorting the palette so that the color difference between consecutive colors is minimized. Add new colors, which are visually similar to the existing colors in the palette. Use Gray scale images. In an 8 bit Gray scale GIF image, there are 256 shades of gray. This results in gradual changes in the colors and it is hard to detect [31]. 2.4 Steganalysis In addition to embedding and extracting topics, Steganography has another topic "Steganalysis", steganalysis is the art and science of detecting messages hidden using Steganography; this is comparable to cryptanalysis applied to cryptography. Steganalysis analyzes multimedia files (e.g., image/sound files) if it contains a Stegofile (a file embedded with some secret data) [32].

43 28 The objective of steganalysis is to identify suspected packages, determine whether or not they have a payload encoded into them, and, if possible, recover that payload, in simple terms, to detect the evil Internet files (that could be embedded with terrorists' communication messages) and prevent vicious crimes from happening to us. Steganalysis sessions in recent international conferences on Information Technology are quite popular. Some software company has already released steganalysis software on a commercial basis [33]. It is practically impossible to develop Steganography-detective software that can give a result assuring that there is a hidden message in the tested Stego image. The program only outputs a message shows that the image file can possibly be embedded with some secret data. But the program developer does not guarantee the detection result. As for the detection accuracy, the user's demand may change case by case. There is no definite accuracy-standard for all purposes. From the practical point of view we believe that the detection accuracy needs to be very high (e.g., more than 99%). However, we would say that there is no steganalysis software on the market which is worth buying at the moment [32]. Steganalysis is complicated primarily by four things: The suspect files may or may not have any data encoded into them in the first place. The payloads, if any, may have been encrypted before being encoded into the carriers. Some of the suspect files may have had noise or irrelevant data encoded into them (which reduces stealth but can make analysis very time-consuming). Unless you can completely recover, decrypt, and inspect the payload, you often can't be sure whether you really have a file used for transport or not, all you have is a probability [35]. Unlike cryptanalysis, where it is obvious that intercepted data contains a message (though that message is encrypted), steganalysis generally starts with a pile of suspect data files, but little information about which of the files, if any, contain a

44 29 payload. The steganalist is usually something of a forensic statistician, and must start by reducing this set of data files (which is often quite large; in many cases, it may be the entire set of files on a computer) to the subset most likely to have been altered [35]. One case where detection of suspect files is straightforward is when the original, unmodified carrier is available for comparison. Comparing the package against the original file will yield the differences caused by encoding the payload-- and, thus, the payload can be extracted. However, this is only part of the problem, as the payload has often been encrypted first. Encrypting the payload is not always done solely to make recovery of the payload more difficult. Many encryption techniques have the desirable property of making the payload appear much more like well-distributed noise, which can make detection efforts more difficult, and save the Steganographic encoding technique the trouble of having to distribute the signal energy evenly [35] Steganalysis Methods There are two main types of Steganalysis: visual analysis and statistical (algorithmic) analysis. Visual analysis tries to reveal the presence of hidden information through inspection with the naked eye or with the assistance of a computer, which can separate the image into bit planes for further analysis. Statistical analysis is more powerful and successful, because it reveals the smallest alterations in an image s statistical behavior another type is called the histogram analysis [36]. One additional type which has to be mentioned here is Structural Detection View file properties/contents like size difference, date/time difference, contents modifications and checksum [37]. There are several statistical tests which can be run on an image: average bytes, variations of the bytes, skew, kurtosis, average deviation and differential

45 30 values [36]. There are various methods of analysis depending on what information is available: i. Stego-only attack: Only the Stego-object is available for analysis. ii. Known cover attack: The Stego-object as well as the original medium is available. The Stego-object is compared with the original cover object to detect any hidden information. iii. Known message attack: The hidden message and the corresponding Stego-image are known. The analysis of patterns that correspond to the hidden information could help decipher such messages in future. iv. Known Stego attack: The Steganography algorithm is known and both the original and Stego-object are available. v. Chosen Stego attack: The Steganography algorithm and Stego-object are known. vi. Chosen message attack: The steganalist generates a Stego-object from some Steganography tool or algorithm of a chosen message. The goal in this attack is to determine patterns in the Stego-object that may point to the use of specific Steganography tools or algorithms.[39] Steganalysis against LSB Many attacks are performed against LSB Steganography, the most common attacks are Pairs and RS Steganalysis After converting the hidden message into a stream of bits, one simply goes through the image replacing the least significant bits of pixel values with the hidden message. The hidden message contains less bits than the cover image has pixels, it is best to spread the modifications randomly around the cover image either by scanning through the image and leaving random gaps, or (better) by generating a random permutation of the image and using the permutation to decide the order of pixels to modify. In either case a key for generating the random gaps or permutation is presumed shared with the intended recipient of the Stego image. The methods of

46 31 Pairs Analysis3 and RS Analysis1 both due to Fridrich, are the two which we examine here. They have some features in common in both cases there is a function of images which can be shown to be quadratic in the amount of embedded data when LSB replacement is used, and by making one assumption it is possible to obtain sufficient information to solve for that parameter. Pairs Analysis first splits an image into a color cut, scanning through and selecting only pixels which fall into each pair of values (0,1), (2,3), and so on. Concatenating the color cuts into a single stream, one measures the homogeneity of the LSBs. Repeating with the alternatives pairs of values (255,0), (1,2), (3,4), etc, one can show that the function defined by the difference between the two homogeneity measures is quadratic in the amount of embedded data. Under the assumption that natural images have no difference in homogeneity, one can obtain enough information to deduce the amount of embedded data in an image, and this estimate forms the statistic we will use to distinguish the cases of hidden data present and absent. However the method is not reliable for images for which the assumption of equal homogeneity does not hold [40]. Pairs Analysis was designed with palette images in mind, but there is no theoretical reason why it should not work for grayscale images and we will show it can be made to work well in this case also [41]. In RS Analysis the image is partitioned into groups of a fixed shape. Each group is classified as regular or singular depending on whether the pixel noise within the group (as measured by the mean absolute value of the differences between adjacent pixels) is increased or decreased after flipping the LSBs of a fixed set of pixels within each group (the pattern of pixels to flip is called the mask ). The classification is repeated for a dual type of flipping. Some theoretical analysis and some experimentation show that that the proportion of regular and singular groups form curves quadratic in the amount of message embedded by the LSB method. Under a similar assumption to above, this time about the proportions of regular and singular groups with respect to the standard and dual flipping, sufficient information can be gained to estimate the proportion of an image in which data is hidden. The

47 32 estimate can be extremely accurate (often within 1%), but fails when this assumption does not hold [42]. Other methods for the detection of LSB Steganography exist (a notable early method was the Chi-square statistic due to Pfitzman and Westfeld, 4 which can be shown much less reliable than the above) but it is fair to say that RS and Pairs are the leading methods at the present time. There are other methods of Steganographic embedding too, often much more sophisticated than simple LSB replacement. 2.5 Choosing the best location in the cover image to hide information To achieve a successful Steganography which is robust against the popular methods of steganalysis, there will be many criteria to be considered to declare the robustness of the Stego method. The Stego method should consider that for the Stego image, the embedding of data should not be noticeable as much as possible, it has to preserve the hidden data so that in the extraction process, it should be found in the same form without any loss in the information, another issue to be considered is that if the cover image is using compression (like JPEG) then the Stego image should not be affected and the hidden message should not be harmed by the compressiondecompression process after embedding. On the other hand, the security of the information is a high consideration for Steganography. There are many approaches to achieve the goal of choosing the best location in the image to hide the information, the first example is the following approach In the conflict between the steganographer and the steganalist, the advantage with the steganographer is that he or she is informed of the cover signal statistics. Thus, he or she can be assured of perfectly secure communication simply by sending a composite signal whose statistics resemble that of the original cover. A natural way to accomplish this is to spend a part of the allocated distortion budget to restore the statistics. In the statistical restoration framework, the host symbols are divided into two streams: an embedding stream, and a compensation stream. The goal is to match the continuous probability density function (pdf) of the cover signal.

48 33 Quantization Index Modulation QIM is used with dithering to embed the data into host symbols in the embedding stream, thus making sure that we do not leave any gaps in the Stego pdf. Next, the host symbols in the compensation stream are modified to match the original, while incurring minimum mean-squared error. This design ensures that the robustness properties of the employed embedding algorithm remain intact. In real-world systems, the steganalist does not have the perfect knowledge of the cover signals (i.e., the continuous pdfs). Moreover, only a finite number of host samples are available for analysis. From the available host samples, the steganalist must calculate a histogram approximation of the cover distribution, using a bin size w. The data hiding is secure if it's possible to match the Stego histogram to the cover histogram with the bin size, w [43]. This approach is Y.A.S.S. (Yet another Steganographic Scheme) That Resists Blind Steganalysis [44], In order to enable secure communication in the presence of blind steganalysis, the steganographer must embed information into host signals in such a way that no image features are significantly perturbed during the embedding process. However, it must not be forgotten that the steganalist must depend on the Stego image to derive the approximate cover image statistics via some sort of self-calibration process. The steganographer can, instead of (or along with) trying to preserve the feature vectors, embed data in such a way that it distorts the steganalist s estimate of the cover image statistics. This can practically be achieved using the following approaches: i. Hiding with high embedding strength: By embedding data with high strength, the cover image is distorted so much that the cover image statistics can no longer be derived reliably from the available Stego image. This is indeed found to be true and reported in recent works. ii. Randomized hiding: By randomizing the embedding approach, the algorithm to estimate the cover statistics can be effectively disabled. Things that can be randomized include the spatial location of hiding, the transform coefficient to hide, the choice of transform domain, or even the embedding method. In this manner, the steganalist cannot make any consistent assumptions about the hiding process even if the embedding algorithm is known to everyone as per the Kerckhoff s principle.

49 34 There are some obvious disadvantages of using the first approach of hiding with high strength. First, the likelihood of perceptual distortion is high. Second, the data can possibly be detected by a steganalist evaluating the Stego image against a universal image model even if it is not that precise. The second approach of hiding in a randomized manner is quite appealing: embedding data in randomized locations within an image. One issue with hiding data in random locations is the possibility of encountering errors in the hidden bits due to the fact that the Stego image must be shipped or advertised in a standard format such as JPEG. This is dealt with by the use of erasures and error correction coding framework we now present a JPEG Steganography scheme, YASS that embeds data in 8 8 blocks whose locations are chosen randomly so that they do not coincide with the 8 8 grid used during JPEG compression. Let the host image be denoted by an M N matrix of pixel values. For simplicity, assume that the image is grayscale (single channel); if it is not, we extract its luminance. Below we describe the main steps involved in this randomized block hiding method. Divide the image into blocks of size B B, where B, which we call big block size, is always greater than 8, the size of a JPEG block. M Thus we have M B N B big blocks in the image where M B = B and N N B = B For each block (i, j) (0 i < M B, 0 j < N B ), we pseudo randomly select an 8 8 sub-block in which to hide data. The key for the random number generator is shared between the encoder and the decoder. The pseudorandom number generator determines the location of the smaller 8 8 block within the big block. This process is illustrated in Figure (2.5(a)) where four example blocks are shown, whose top leftmost corner (sx, sy) is randomly chosen from the set {0, 1,...,B 8}. Figure (2.5(b)) shows the blocks as seen by the steganalist who gets out-of-sync from the embedding blocks, and cannot resynchronize even if the embedding mechanism is known.

35 Figure 2.5 : the embedding method and its detection [44] For every 8 8 block thus chosen, we compute its 2D DCT and divide it by a JPEG quantization matrix at a design quality factor QFh.

50 35 Figure 2.5 : the embedding method and its detection [44] For every 8 8 block thus chosen, we compute its 2D DCT and divide it by a JPEG quantization matrix at a design quality factor QFh. Data is hidden in a predetermined band of low frequency AC coefficients using quantization index modulation. For maintaining perceptual transparency, there is no hiding in coefficients that quantize to zero by the JPEG quantizer, Note that using this approach, it's effectively de-synchronize the steganalist so that the features computed by him would not directly capture the modifications done to the image for data hiding (see Figure 2.5). It should be noted that with this embedding procedure, the embedding rate is reduced in two ways. First, some real estate of the image is wasted by choosing bigger blocks from which an 8 8 block is chosen to hide data. Note that the above framework can be further generalized to enable lesser wastage, by using larger big blocks and putting more 8 8 blocks into them. For example, we can use big blocks of size and embed in sixteen 8 8 blocks within. The second cause of decrease in rate is that since the embedding grid does not coincide with the JPEG grid, there are errors in the received data which must be corrected by adding redundancy.

51 Coding Framework In order to deal with the errors caused in the image due to JPEG compression, a coding framework using repeat-accumulate (RA) codes will be used, This framework also allows to hide in an adaptive fashion, avoiding coefficients that quantize to zero so as to control the perceptual distortion to the image. For every block, consider an embedding band comprising of first n low frequency coefficients which forms the candidate embedding band. Data bits are hidden in a coefficient lying in the band if it does not quantize to zero using the JPEG quantizer at QFh. Before the hiding process, the bit stream to be hidden is coded, using a low rate code, assuming that all host coefficients that lie in the candidate embedding band will actually be employed for hiding. A code symbol is erased at the encoder if the local adaptive criterion (of being quantized to zero) for the coefficient is not met. A rate 1/q RA encoder is employed, which involves q-fold repetition, pseudorandom interleaving and accumulation of the resultant bit-stream. Decoding is performed iteratively using the sum-product algorithm. The use of this coding framework for YASS provides the following advantages: i. Protection against initial JPEG compression: Use of the coding framework provides error-free recovery of the hidden data after the initial JPEG compression so that the image can be advertised in the JPEG format. ii. Flexibility in choosing hiding locations: The coding framework allows us to dynamically select the embedding locations in order to limit the perceptual distortion caused to the host image during hiding. It is well known that embedding in DCT coefficients that quantize to zero can lead to visible artifacts in the Stego image. iii. Enabling active Steganography: The use of error correcting codes also provides protection against several distortion constrained attacks that an active warden might perform. The attacks that can be survived include a second JPEG compression, additive noise, limited amount of filtering,

52 37 and so on. This provides a significant advantage over most other Stego methods available in the literature. 2.6 Intelligent Data Embedding Method for LSB Steganography There is a Steganography algorithm [45] based on learning system to hide a large amount of information into color BMP image. In this algorithm, an adaptive image filtering and adaptive non-uniform image segmentation with bits replacement on the appropriate pixels is used. These pixels are selected randomly rather than sequentially by using new concept defined by main cases (MC) with sub cases (SC) for each byte in one pixel. According to the steps of design, we have been concluded 16 main cases with their sub cases that cover all aspects of the input information into color bitmap image. In any RGB bitmap image, each pixel consists of 3 bytes, each byte represents a color, and those are Red, Green, and Blue respectively. The change of the color will not be easily noticed if the changes are to be made on the least significant 4-bits (nibble), while any change in the most significant nibble will make a huge difference to the color value. The main cases are the possible values of the most significant nibble of the color byte in a pixel, and the index of the MC is determined by the following formula: Where, ByteBlue }, the MC index will have the value of 1 to 16. High security layers have been proposed through four layers of security to make it difficult to break the encryption of the input information and confuse steganalysis too. Learning system has been introduces at the fourth layer of security through neural network. This layer is used to increase the difficulties of the statistical attacks. This algorithm can embed efficiently a large amount of information that has been reached to 75% of the image size (replace 18 bits for each

53 38 pixel as a maximum) with high quality of the output this approach will be the basis of this project, and the full explanation will be placed in the following chapters. This approach is used by the present work with new modifications. The next interesting application of Steganography is developed by Miroslav Dobsicek, where the content is encrypted with one key and can be decrypted with several other keys, the relative entropy between encrypt and one specific decrypt key corresponds to the amount of information. Because of the continual changes at the cutting edge of Steganography and the large amount of information involved, steganalists have suggested using machine learning techniques to characterize images as suspicious or non- suspicious. Used entropy based technique for detecting the suitable areas in the document image where information can be embedded with minimum distortion. Hides indirectly the secured binary bits along with some selected graphical image bits, based on the neural network algorithm, to get cipher bits. This approach is used by the present work. When using a 24 bit color image, a bit of each of the red, green and blue color components can be used, so a total of 3 bits can be stored in each pixel. Thus, an pixel image can contain a total amount of bits ( bytes) of secret information. But using just 3 bit from this huge size of bytes is wasting in size. So the main objective of the present work is how to insert more than one bit at each byte in one pixel of the cover-image and give us results like the LSB (message to be imperceptible). This objective is satisfied by building new Steganography algorithm based on an intelligent system to hide large amount of any type of information through bitmap image by using maximum number of bits per byte at each pixel. The process for embedded information works well against two types of attacks. The first is visual attacks to make the ability of humans is unclearly discern between noise and visual patterns, and the second is statistical attacks to make it much difficult to automate.

54 Neural Networks learning system This algorithm uses a learning system through a neural network that includes (n-p-n) Perceptron layers architecture. That is it has n neurons in the first (input layer), p neurons in the second (hidden layer) and n neurons in the third (output layer) with full connection (Figure 2.6). Figure 2.6: A Multi-Layered Perceptron (n-p-n) Neural Networks [45] In (Figure 2.6), the solid arrow means many to one or one to many transition, whereas dotted arrow refers to one to one transition and dashed arrow shows the send action for adjustment process. This algorithm uses back-propagation algorithm with adaptive neural network to apply training through three stages: the feedforword of the input training pattern, the back-propagation of the associated error, and the adjustment of the weights. In addition, we add adaptive smoothing error ASE to speed up training process. The main objective of learning system is to add additional complexity for the statistical and visual attacks as in (Figure 2.7).

55 40 Figure 2.7 : Neural based Steganography training system architecture [45] 2.7 Summary In this chapter many articles and papers where discussed and summarized to support the literature that will be considered in this project to strengthen the idea of this project and to gain more understanding about Steganography and the problem itself. More explanation about the image types and formats were discussed in this chapter. The different methods of Steganography to embed information in the various types of image formats were discussed, and also the steganalysis and its methods and techniques all were discussed. The amount of literature about this project s concepts was enough to proceed to the next chapters without any confusion.

56 CHAPTER 3 RESEARCH METHODOLOGY 3. INTRODUCTION 3.1 Introduction In this chapter, the research methodology will be discussed. The headlines of the research phases and the steps of the research will be explained and the methods to develop and implement this prototype. This project will be implemented according to the following process and chart for the workflow sequence (Figure 3.1) Requirement specification Prototype tool architectural design Prototype tool development Testing the results Figure 3.1: Workflow sequence chart

57 Requirements Specifications Defining requirements to establish specifications is an essential step in the project life cycle. This difficulty in establishing good requirements often makes it more of an art than a science. The difficulty arises from the fact that establishing requirements is a tough abstraction problem and often the implementation gets mixed with the requirements. Requirements analysis is the first step in the system design process, where system requirements should be clarified and documented to generate the corresponding specifications. The requirement should be clearly defined for the intelligent data embedding using LSB Steganography project. Obviously, our goal is to provide a Steganography method that uses neural networks for the embedding process to choose the best location in the image to hide the secret information. To get this project done, the following basic requirements are needed: Steganography embedding software. Personal Computer with Pentium processor or higher and Microsoft Windows operating system (2000 or higher). Visual and Statistical Steganography analysis tool. Microsoft visual studio (Visual Basic). AES (Advanced Encryption Standard) tool. 3.3 Prototype Architectural Design At this step of the project, all necessary and useful information has been collected in the previous steps of the process; this information will be used to design architecture for the intelligent embedding tool. In the design phase, the requirements are transformed into definitions of components, to establish the framework. In this chapter the requirement of an ideal architectural design for the prototype tool will be considered. After that the prototype tool will be designed with compliance with the

58 43 specified requirement as much as possible. Moreover, whenever one part will be chosen to be added to the architecture adequate reasoning will be provided. In this phase, we try to overview most of the embedding methods to choose the best location to hide the data in the cover image and compare them to find the best design. The neural based Steganography training algorithm which will be the design for the prototype tool is shown through the following steps: Step1: Input: Cover image and Secret message. Step2: Implement the present Steganography algorithm to hide a secret message and produce a Stego-image. Step3: Find statistical and visual measures for each Stego and cover images (Euclidian Norm, Brightness difference, and Difference between neighbor pixels). Step4: Extract all bits with their locations which are not used by the present Steganography algorithm and saved into temporary buffer which is called free bits buffer. Step5: Use neural networks with back propagation algorithm and adaptive smoothing error BPASE. The input layer contains (statistical and visual measures for Stego image and free bits buffer) while the output layer produces (new free bits buffer and new statistical and visual measures). Step6: Check matching between statistical-visual measures of Stego image and statistical-visual measures of cover images. If matching is satisfied, then build new Stego image by adding new free bits buffer to Stego bits buffer, else adjust (ADJ) weights values Vij, Wjk. (Figure 2.6), then goto step 5 (Back propagation BP).

59 Prototype Development A prototype is an original model on which something is patterned. A prototype can range from a crude mock-up developed by the inventor to professionally designed virtual prototypes and/or fully-functioning working program. The process of taking your idea and turning it into a tangible product is called reducing the invention to practice and the first step in this process is the development of a prototype. Due to limited time that can be spent to accomplish the project, it is not possible to implement the design totally. Therefore, the prototype will be developed to be implemented in the next project phase. However, the prototype will include the most important component of design. 3.5 Testing The Results The prototype tool output will be tested with the most commonly used attacks against the LSB Steganography, and for that, well known trusted steganalysis tool will be used to prove the robustness of the method used in this prototype, and other standard measuring tools and methods will be used to test the quality of the Stego image and to test the amount of the extractable information whether it is within the acceptable range or not, to prove the efficiency of the prototype tool and that the information is hidden in the best location in the cover image.

CHAPTER 4 PROTOTYPE DESIGN 4.1 Introduction 4. INTRODUCTION In the proposed prototype, information will be hidden in the cover image using 4-security layers (Figure 4.

60 CHAPTER 4 PROTOTYPE DESIGN 4.1 Introduction 4. INTRODUCTION In the proposed prototype, information will be hidden in the cover image using 4-security layers (Figure 4.1), and the Stego image will be sent via insecure channel to the receiver who will retrieve the hidden information using inverse Steganography Figure 4.1 : The overall flowchart of how the proposed prototype works In this proposed prototype design, the maximum effort was made to reach the goal of having an effective way to obscure information and hide information. This prototype is introduced to do that and to hide a large amount of information into carrier bitmap image. The four layers of security used in this prototype is a very effective way to add more complexity for steganalysis work, and the neural based Steganography using neural network with back propagation and adaptive smoothing error correction

61 46 (ASE) can be taken as a very serious way to hide information compared with other familiar algorithms. Working against the statistical and visual attack is a big challenge for any Steganography algorithm, it needs adaptive algorithms in each step of embedding process, in this adaptive algorithm for this proposed prototype, the results that has been reached are very efficient. Another approach of adding more complexity to this prototype is used by using 2 different algorithms of Steganography to embed information, these algorithms are used together which means that the embedding process is done on two phases for the same session. One of the main issues that are taken in consideration for any new design or modification of Steganography algorithms is the amount of information to be hidden versus the size of the cover image. Embedding a large amount of information in relatively small size cover image will usually be inefficient due to the unacceptable results of noise and distortion that will be found in the Stego image, and the more information to be hidden, the more possibilities for the attacks to succeed to discover the existence of the hidden information, and this is according to the concept of the Steganography is a failure. In this prototype, using the neural network was the key to achieve the goal of embedding a large amount without fearing the steganalysis attacks. 4.2 Design Challenges The 4.1 section shows how useful is the proposed prototype is, and how efficient it is going to be, but it didn t show the challenges and the issues that appeared in the designing phase. Many issues where challenges against achieving the complete design of the prototype, and the main issues were the following:

62 47 In Steganography, the discovery of the existence of hidden information is a failure, but extracting the hidden information and reveal its contents is a disaster. To have a secure design that integrates the use of Steganography and cryptography will be a great advantage to the design. This was a motivation to add the first layer of security in this prototype s design, which is encrypting the data to be hidden with AES algorithm before the embedding phases start. The complexity of the design is a main goal, as mentioned in section 5.1 to make it difficult to the steganalysists to attack the prototype, in the second security layer, adaptive segmentation method was used; this method divides the cover image into unequal sized blocks to hide the information in. And although the segments sizes are not identical, which is a good way to confuse the steganalysis attacks, but another idea of complexity is implemented by making the sizes of the segments dynamic. The sizes will be determined by the Key of the AES encryption used in the first security layer of the prototype, i.e., whenever the key is changed, the sizes of the segments will be changed, thus, the mapping and the sequence of pixel choosing to hide information in will be changed. In the second security layer (adaptive segmentation), more complexity had been added for the parsing algorithm over the blocks and the pixels of the cover image, but the need for the mapping of the pixels where the encrypted data to be hidden in by choosing the best location and the suitable pixels to hide the information in is not satisfied in the adaptive segmentation layer. Therefore, a recently presented new Steganography method was used in this prototype. This method is called main cases and sub cases. It selects the best locations to hide information in. The first Steganography layer (main cases and sub cases) is an efficient method to embed information, but it is still capable of embedding only a small amount of information in the cover image. It will parse the whole image depending and perform the embedding process once, and this will be the maximum result it can achieve. This makes it efficient but limited, which will be unsuitable in cases of large amount of embedding large amount of data, and at that point when the main

63 48 cases and sub cases method finishes its work without being able to hide the whole amount of information in the cover image, the neural based Steganography (second phase of Steganography) will take place and perform the embedding process for the rest of the information. And by using the neural network the problem of larger amounts of data is solved. The neural network will use the free bits that were not used earlier in the first Steganography phase, which makes it difficult to choose the best locations of pixels and bits inside the pixels to embed the rest of the information, therefore, the way to achieve the goal is by training the network and adjusting the weights using back propagation with adaptive smoothing error correction (ASE) was used to keep the resulted Stego image robust against attack in spite of the large amount of hidden information. Another challenge in the designing process of this prototype was how to make it reject unsuitable files to be embedded in the cover image due to the unreasonable size of the information to be hidden, which leads to questioning the acceptable amount of information to be hidden. By trying many samples on this prototype, and using standard benchmarks to measure the robustness of the produced Stego image and comparing the results to the results produced by other standard Steganography tools over the same samples of cover images as it is explained in chapter The Prototype Architecture This project employed the use of neural networks in Steganography to embed a large amount of information in bitmap true color images (RGB). And its integrated components of security aspects, adaptive Steganography algorithms, and cryptography, make its performance noticeable. The success of this prototype in giving better results was due to the design that used multi-layers of security and Steganography. Choosing the best locations to hide information in is achieved and the aim of this project was satisfied.

64 49 The architecture of this prototype is multiple layered. It consists of 4 security layers, in which the third and the fourth layers are the embedding layer and the fifth layer is the information extraction and decryption layer (Figure 4.2) Figure 4.2: prototype layers First Security Layer (AES Encryption) This layer is responsible of encrypting the information to be hidden in the cover image using AES cryptography algorithm, a key for encryption must be entered and the file that is going to be hidden in the cover image. This layer will encrypt this file using the entered key. It will pass the same key to the second layer, and pass the resulted encrypted file to the third layer; this is shown in Figure 4.3. This layer is useful to add more security to the hidden data, in case that the attacker gets access to the system, then he might be able to extract the information from the Stego image, but only in its encrypted form, and AES is a very strong algorithm that is not yet broken by the current cryptanalysis works.

50 Figure 4.3: First security layer (AES encryption) flowchart This layer will work to the following algorithm: Step1: input Key and desired file to be embedded.

65 50 Figure 4.3: First security layer (AES encryption) flowchart This layer will work to the following algorithm: Step1: input Key and desired file to be embedded. Step2: Use AES encryption algorithm to encrypt the File. Step 3: pass the encryption key to security layer 2 (Adaptive segmentation). Step 4: pass the encrypted file to security layer 3 (main cases and sub cases first embedding layer) Second Security Layer (Adaptive Segmentation) This layer is responsible of dividing the image into blocks (segments), in which the blocks sizes are not equal to each other. In this technique, it s assured that the complexity of the mapping of the Steganography will increase much more. It will make the estimation of how is the parsing is done on the image s pixels very weak and unpredictable, which will lead to steganalysis failure. Furthermore, using the encryption key from the first layer, it will make the blocks sizes dynamic, and will change in each session depending on the ASCII value of the key, and this is another approach to complicate the calculations for the steganalysis works. Another

51 complication is added by using the length of the key to determine the number of segments to be generated. The operation of this layer is shown in Figure 4.4 Figure 4.

66 51 complication is added by using the length of the key to determine the number of segments to be generated. The operation of this layer is shown in Figure 4.4 Figure 4.4: Second security layer (adaptive segmentation) The following is the algorithm used in this layer: Step 1: receive the encryption key used in AES, and the cover image in which information will be hidden. Step 2: compute the length of the number of segments that will be generated from this layer using the length of the Key. Step 3: compute the sizes of the vertical and horizontal segments in the cover image. Step 4: pass the results of step 1 and step 2 to layer Third Security Layer (Main Cases And Sub Cases) This layer is the first embedding layer in which a new Steganography technique is used to embed the information in the cover image. This layer of the prototype will be responsible of choosing the best locations to hide information in, the right pixels and the right bits in those pixels will be used to embed the data in the

67 52 cover image using the main cases and sub cases technique. The operation of this layer is shown in Figure 4.5. The idea behind the main cases and sub cases technique is to choose the best location for embedding, which pixel to hide in and which bits in those pixels to use for hiding the desired information. The main cases are the 4 most significant bits of each byte in each pixel, so the value of any main case could be somewhere in between , which means that there are 16 possible main cases. The parser will go through the 16 possible Main Cases, starting from 0000 (main case 1), and for each main case, it will go through every segment in the image, and deeper into every pixel in each segment respectively. The index of the MC is determined by the following formula: The parsing priority inside the segments will be horizontal, and vertically for choosing the next segment. The reason of choosing the 4 most significant bits is because in this prototype, Least Significant Bit embedding algorithm is used, therefore, to avoid using the most significant bits, we set the main cases to be like that. Embedding in those bits will cause a severe distortion in the image and a clear change in the color, and this will lead to failure in the Steganography. Another reason to choose the high significant bits as main cases is considering the extraction process later, if any change would be made in those bits then the parsing in the extraction process will defer, which leads to inaccurate results. According to the steps of design, we have been concluded 16 main cases with their sub cases that cover all aspects of the input information into color bitmap image. The process starts with a loop of 1 to 16 for the possible main cases, and for each of those main cases, the loop will go through all the segments of the image horizontally, and for each segment the scanning will be column by column parsing all the pixels in the segments.

68 53 For the current pixel, if the current main case in the loop equals to any of the colors main case in this pixel, then the color byte whose main case is equal to the current Main case will be CurMC and the color byte whose MC value is the least amongst the rest of the colors in the pixel will be SelColor. After that will be starting narrowing down the selection process for a suitable sub case to the current Main Case into 3 groups of Sub cases (SCGroup), and this depends on the main case of the SelColor, and according to the following lookup table (Table 4.1). Table 4.1: Narrowing down sub cases selection into 3 groups of sub cases MC(SelColor) SCGroup Sub Cases in SCGroup 1 1 SC 2,SC 3,SC SC 1,SC2 3,SC 3,SC SC 1,SC 2 The next step will be choosing a sub case (SC) from the selected SCGroup, and this is done by another lookup table (Table 4.2). Table 4.2: Choosing the particular suitable sub case for the current pixel Sub Case (SC) Condition 1 CurMC > X,Y 2 CurMC=X,Y 3 (CurMC > X and CurMC=y) or (CurMC=X and CurMC < Y) 4 (CurMC < X,Y) After selecting the suitable sub case, the embedding will take place in the suitable bits in the proper bytes in the pixels. Notice that if from the start of the process, if the current main case of the loop does not match any main case of the pixels main cases, then the scanning will proceed to the next pixel. After finishing embedding in the current pixel, the next pixel will be processed and so on, until finishing all the pixels in the segment, then proceeding to the next segment and the whole process will be applied on the current segment, until

69 54 the last pixel in the last segment of the image. By finishing the entire image scanning and embedding, the first layer of Steganography is complete. The following pseudo code is showing the main cases and sub cases algorithm. For MC: = 1 to 16 do For Segment: =1 to KeyLength do For PixelCount: = 1 to Number of pixels do If MC=any ColorMC in CP then Begin CurMC: = ColorMC SelColor: = argmin(mccolor in CP) Case Mc(SelColor) : 1 then SCGroup:=1 : 2 to 15 then SCGroup:=2 :16 then SCGroup:=3 End case Case SCGroup:( 1 ) If CurMC=X,Y then SC:=2 If (CurMC < X and CurMC=Y) or (CurMC=X and CurMC < Y) then SC:=3 If (CurMC < X,Y) then SC:=4 :(2 ) If CurMC > X,Y then SC:=1 If CurMC=X,Y then SC:=2 If (CurMC > X and CurMC=Y) or (CurMC=X and CurMC < Y) then SC:=3 If (CurMC < X,Y) then SC:=4 : (3 ) If CurMC > X,Y then SC:=1 If CurMC=X,Y then SC:=2 End Case Case SC: (1) If CP=NP then Begin Hide 2 bits in CP.SelColor Hide 2 bits in NP.SelColor End if : (2) Hide 1 bit in CP.Red

70 55 Hide 1 bit in CP.Green Hide 2 bits in CP.Blue : (3) Hide 2 bits in CP.SelColor If MC(RestColor1 ) MC(SelColor) then ResCol:= RestColor1 If MC(RestColor2 ) MC(SelColor) then ResCol:= RestColor2 Hide 2 bits in CP.ResCol : (4) Hide 2 bits in CP.SelColor If MC(RestColor1 ) MC(SelColor) then ResCol:= RestColor1 If MC(RestColor2 ) MC(SelColor) then ResCol:= RestColor2 Hide 2 bits in CP.ResCol End Case; Next PixelCount; Next Segment Next MC; End. By using this embedding algorithm, the bits selection will be random and also efficient, this randomness in embedding together with the randomness caused by the adaptive segmentation will add much complexity to the steganalysis.

71 56 Figure 4.5: Third security layer (main cases and sub cases) which is the first Steganography layer

72 Fourth security layer (Neural Network) This layer is the second Steganography layer in this prototype design. In this layer, the remaining information that could not be embedded by the first Steganography layer (main cases and sub cases layer) will be embedded. This layer receives the pending Stego image from the previous layer, the locations of the free bits which are the bits that were not used by the previous layer to embed information in, and it also receives the remaining information to be hidden. The neural network is using back propagation with adaptive smoothing error correction (ASE). It also receives initial values of probabilities and Euclidean norm measures for the pending Stego image. The process starts with choosing random bits from the free bits buffer and embeds the data in them. Then the neural network will calculate the statistical and visual measures for the resulted Stego image and test the result by comparing it to a standard benchmark for acceptable measures tolerance ranges. If the measures are within the acceptable range, then the neural network will produce the final Stego image and save it. Otherwise, the neural network will use the back propagation property and the ASE to train the network and adjust the weights of the hidden and the output layers of the network. After adjusting the weights, it will try to choose different set of random bits from the free bits buffer after discarding the changes on the free bits in the previous round of the neural network process, and it will embed the information again but in the current set of free bits which are selected during the current round. The training process and the weight adjustment are very useful as they speed up the embedding operation because it prevents selecting the same free bits again in different rounds of the neural network s operation. The secret behind the ability of embedding vast amount of information using neural network is the excellent choices of the best locations to hide the information

73 58 in. In this layer, because of the randomness of choosing the free bits to hide information in, there is a possibility of embedding even in most significant bits of the pixels, without effecting the image, and the reason is that the neural network, during its weights adjustment and error corrections, it will choose some bits that contains the same value of the bit that it wants to hide. For example, it would hide the value 0 in a bit location which its content is also 0. This will not affect the image in any way, because there was no change made to the contents. The operation of this layer is shown in Figure 4.6 1

74 59 1 Figure 4.6: The Neural Network layer

75 Extraction and Decryption Layer The fifth layer is applied in the prototype as an important component which is responsible for extracting the hidden encrypted information hidden in the Stego image by implementing inverse Steganography. Figure 4.7 shows the operation of this layer. Figure 4.7: The extraction and Decryption Layer

61 The process starts with performing the adaptive segmentation using the same encryption key that has been used earlier in the encryption process, the key will determine how many segments were

76 61 The process starts with performing the adaptive segmentation using the same encryption key that has been used earlier in the encryption process, the key will determine how many segments were generated earlier in the hiding session, and the sizes of those segments. This is another advantage of using the encryption key in the segmentation process; this assured that the extraction process will be accurate in the segmentation aspect. The next step will be using the third layer (main cases and sub cases), but in this session, this layer will be used to extract information, not to perform embedding. The concept and the process are similar, but the result will be producing back the encrypted file (Figure 4.8). Figure 4.8: Main cases and sub cases layer operation during the extraction process

62 In the embedding process, when the third layer (main cases and sub cases) finish its process, if the information are not completely embedded in this layer, then the neural network would start

77 62 In the embedding process, when the third layer (main cases and sub cases) finish its process, if the information are not completely embedded in this layer, then the neural network would start embedding the remaining of the information. But the main concern at this point is about the extraction phase when the third layer completes the extraction process of the information that was already embedded using the same layer, the concern was how to determine whether the file is completely extracted or the neural network was used in the embedding session. To solve this problem, in the image header, there are few empty entries that can be used without affecting the image itself, in one of these entries; the size of the encrypted information file will be saved. When the main cases and sub cases layer finishes its work during the extraction session, the extraction layer will compare the resulted encrypted information file s size with the size saved in the Stego image header, if the two values are equal then the information file is completely extracted, otherwise, the neural network will operate to retrieve the rest of the file (Figure 4.9). After the encrypted information file is completely extracted, the first security layer (AES layer) will decrypt the file, and produce plain text. Figure 4.9: Neural Network operation during extraction and decryption layer

78 Operational Phases In this prototype there are two main operational phases: i. Securing phase: in which the first two layers (AES encryption, and Adaptive segmentation layers) secure the information to be embedded by encrypting the file with AES cryptography algorithm, and perform the adaptive segmentation method to produce non uniformed segments sizes and unfixed number of segments in each session whenever the encryption key is changed. ii. Embedding phase: in this phase there are two different Steganography layers (Main cases and Sub cases layer, and on the other hand is the neural network based Steganography layer). 4.5 Summary In this chapter, we discussed the prototype s components and the overall architectural design. The layers of the prototype were discussed thoroughly and explained in details the contents of each. The operational phases were also clarified in details in this chapter, and the challenges that faced the design process and the appropriate solutions for those challenges. We discussed the reasons of making the design in this shape and the reasons of building each component in this way. And why we chose the techniques that were used in this prototype, what are the benefits of each components and how it improves the design and later on the performance of the prototype.

79 CHAPTER 5 5. INTRODUCTION PROTOTYPE IMPLEMENTATION 5.1 Introduction Prototype is an easily modified and extensible model (representation, simulation or demonstration) of a planned software system, likely including its interface and input/output functionality. It should be iterative that is progressively refined until it becomes the final system [48]. The implementation phases will be explained according to the research methodology in Chapter 3. Then the prototype tool development will be explained in details. The UML (Unified Modeling Language) diagrams of the code will be shown and each component will be discussed, and also the relations between those components. We will use neural based Steganography algorithms provide reliable results with ability to hide large amount of information without sacrificing the image quality, size, or even robustness against attacks. The last part of this Chapter will be the summary of what was discussed in it. 5.2 Implementation Phases The prototype implementation went through phases according to the research methodology in chapter 3. Those phases are :

80 65 a) Requirements specifications The implementation of this prototype must satisfy a collection of specific requirements 1. Coding language used in this project is MS Visual Basic 6.0. A programming language and environment developed by Microsoft. Based on the BASIC language, Visual Basic was one of the first products to provide a graphical programming environment and a paint metaphor for developing user interfaces. Instead of worrying about syntax details, the Visual Basic programmer can add a substantial amount of code simply by dragging and dropping controls, such as buttons and dialog boxes, and then defining their appearance and behavior[50]. Visual Basic was designed to be easy to learn and use. The language not only allows programmers to create simple GUI (Graphical User Interface) applications, but can also develop complex applications. Programming in VB (Visual Basic) is a combination of visually arranging components or controls on a form, specifying attributes and actions of those components, and writing additional lines of code for more functionality. Visual Basic can create executables (EXE files), ActiveX controls, and DLL files. VB has strong integration with the Windows operating system and the Component Object Model. The easiness of understanding and using this language, with the popularity of it provide easiness of code changing in the future works [35]. Linking open source codes written in other languages such as C language is possible in VB, and in this project, the AES encryption tool that was used is an example for linking such open source codes. Furthermore, the neural network was easy to be implemented using VB.

81 66 2. AES encryption tool is required to encrypt the text file that will be embedded in the host image. The encryption is adding an additional security feature to the prototype, and the encryption Key that will be used, will specify the number and sizes of non uniformed segments in the later on the phase of adaptive segmentation. The chosen tool was an open source tool written in C language, and was integrated with the prototype code. This tool is used as well in the decryption process, in the extraction phase, the same key will be used again to do the segmentation to extract the encrypted text, and then the AES tool will be used to decrypt the text file to plane text. 3. Steganography embedding software is another part of the requirements. This software will be used in the testing phase of the prototype and to measure the efficiency and the performance of the prototype by comparing the resulted Stego images produced using the prototype to those Stego images produced using the embedding software. This software must be well known, popular, and proved to be efficient. The Stego images that this software produce must be proved to be robust against basic attacks. The software that was chosen in this project is (S-Tools), it s an open source standard Steganography software whose resulted Stego images are proved to be robust. More explanation about this software will be placed in chapter Visual and Statistical Steganography analysis tool is another important requirement to compare the results of this prototype with the software mentioned in the previous point and test the efficiency of the resulted Stego images resulted by the prototype. For the standard measures that was chosen to be performed in this project, there is no available open source ready to use tool or software to

82 67 do that, therefore, we designed a tool that performs those measures, this tool s input will be a host image, a Stego image resulted from using the same host image with this project s prototype to embed text file, and a Stego image resulted from the same host image using S-Tool to embed the same text file which was embedded in the other Stego image.the measures computations will be implemented on the Host image and the Stego images of the input. The results will appear on the screen in a form of graphs and numbers of the statistical and visual measures. 5. To implement and run this prototype, a personal computer with Intel Pentium processor is needed with windows 2000 (or higher) operating system. This requirement is because of the coding language used in this prototype (MS Visual Basic 6.0) needs this kind of system to run it, as well as the C code of the AES encryption/decryption code used in this prototype. The dynamic library files (dll) that are needed by the VB are included in windows 2000 and above. b) Prototype architectural design After specifying the basic requirements for this prototype implementation, the implementation must be done according to the architectural design that was explained and discussed in chapter 4. The flow of the work, the components and the structure of the code, the functionality of the prototype in general and the functions and procedures in particular all must be implemented according to the architectural design in chapter 4. c) Prototype tool development The development of the prototype must take in account that the prototype development process included Iteration (re-specify, re-design, reevaluate) until the team, both users and developers, agree that the completeness of the evolving prototype is sufficiently high [48].

83 68 d) Testing the results In this phase, the following characteristics of the prototype must be satisfied Executability, that the prototype must be runnable in the very loose sense that the prototype allows a walkthrough to be performed, runnable in the strict sense that it executes on the computer and responds to user input in real time, and performs the expected computations. Maturation, it is that the prototype can evolve, given sufficient refinement, improved by stages and later in the future into the final product. Representation, the resulted prototype after the development process must have the look and the performance of the planned system. If those characteristics are not satisfied, then the iteration in the previous point (c) must take place to refine and enhance the development of the prototype. If all these characteristics are satisfied then the resulted prototype will described as a good non-disposable prototype [48]. 5.3 Prototype code structure and UML diagrams In this section, the structure and the components of the code will be explained in details, the functions used in the code, the relationship between those functions and other components. The best way to describe this in a graphical way is to use UML Diagrams. The Unified Modeling Language (UML) is a standard language for specifying, visualizing, constructing, and documenting the artifacts of software systems. The UML represents a collection of best engineering practices that have proven successful in the modeling of large and complex systems.

84 69 The UML is very an important part of developing object oriented software and the software development process. The UML uses mostly graphical notations to express the design of software projects. Using the UML helps project teams communicate, explore potential designs, and validate the architectural design of the software [51]. The primary goals in the design of the UML are: Provide users with a ready-to-use, expressive visual modeling language so they can develop and exchange meaningful models. Provide extensibility and specialization mechanisms to extend the core concepts. Be independent of particular programming languages and development processes. Provide a formal basis for understanding the modeling language. Integrate best practices. The UML diagrams for this prototype are the following: a) The Main Component This UML diagram shows the structure of the embedding layers, the main components (functions) of the code, the link among the main functions. Figure 5.1 shows this diagram.

70 Figure 5.1: Main components (functions) of the prototype code To understand this diagram (figure 5.1), we must explain each box of it, and each box represents a function in the code.

85 70 Figure 5.1: Main components (functions) of the prototype code To understand this diagram (figure 5.1), we must explain each box of it, and each box represents a function in the code. Those functions are: The Do_SteganoGraphy function is the essential function in this part of the code, it will appear as a button in the UI, clicking this button will execute this function, it will start the whole system of this prototype work, it will call the functions Read_Password, Select_Image, Read_Text_File, Segmentation_And_Hide_Data, Computations, Find_Properties_of_Image.By calling these functions, the whole embedding will be performed; in the next diagrams more explanation will clarify the exact detailed steps for that. Read_Password, is the function that will let the user to input the Key for the AES (Advanced Encryption Standard). It will let a text box in the UI appear to let the user type in the Key, the key must be 16 characters. Select_Image, is the function that will let the user to select a cover image to input to the prototype, this is done by showing a windows file browser to select the image.

71 Read_Text_File, this function is to read text file to be encrypted, it will show a file browser window to select a text file to be encrypted later and embedded in the cover image.

86 71 Read_Text_File, this function is to read text file to be encrypted, it will show a file browser window to select a text file to be encrypted later and embedded in the cover image. Find_Properties_of_Image, is to find the properties of the selected cover image, those properties are finding the Main Cases (MC) and Sub Cases (SC) and the minimum of three values of the bytes in a pixel to choose minimum MC. Segmentation_And_Hide_Data, this function is to do segmentation of image into non-uniform segments and hide data using MCs and SCs and starts the neural network if needed. Computations, is to compute statistical and visual measures for the image. b) Text file Handling This UML diagram shows the way that the text file that contains the desired information to be embedded should be handled and the steps to do that. Figure 5.2 is showing this. Figure 5.2: Text file handling diagram

72 The components of figure 5.2 are the following: Read_Text_File, this is the same function explained in (a). This function is the starter for this component of the code.

87 72 The components of figure 5.2 are the following: Read_Text_File, this is the same function explained in (a). This function is the starter for this component of the code. It will call the Convert_text_to_binary, and Encryption functions. Convert_text_to_binary, this function will convert the text characters into binary bits to be encrypted later. Encryption, is the linking function between this VB project and the external C AES tool to do the encryption for the text file. aes_en, this is a function that execute an external tool that is doing the encryption for the text file using AES encryption algorithm, this tool is a C open source code and was linked to the project to increase the security. It is also responsible of decryption later in the extraction layer. c) Finding the cover Image properties This UML diagram shows the steps to find the properties for the cover image that was already selected by the user via the UI; those properties are finding the Main Cases (MC), Sub Cases (SC), and the minimum of three values of the bytes in a pixel to choose minimum MC. This will be useful in the segmentation of the image. This will be shown in figure 5.3. Figure 5.3: Finding the Cover Image Properties

88 73 The functions in figure 5.3 are: Find_Properties_of_Image, this function is explained in (a), it is the starter function of this part of the code, and it will call the functions Find_MC, Find_SC, min. Find_MC, is the function that is responsible of finding the Main cases (MC) of the current pixel. Find_SC, is the function that is responsible of select the suitable Sub Case (SC) of the current pixel. Min, this function finds the minimum of three values of the bytes in the current pixel to choose minimum MC. d) Segmentation and Hiding Data This UML diagram is to explain how the segmentation of the cover image is done, the data hiding via both phases of embedding (MC and SC, and starting the neural network if needed). Figure 5.4 shows that in details.

89 74 Figure 5.4: Segmentation and Data Hiding This part of the code contains the following functions: Segmenation_And_Hide_Data, this function is the starter function in this UML diagram, it is explained in (a), and it calls the functions, Hide_Data, Do_Neural_Net, and Display_NewImage. Hide_Data, this function hides data of the encrypted file (part of it or completely depends on whether the neural network is needed or not) using Main Cases and Sub Cases. This is done by calling the functions Compute_Size, Cut, and Hidee. Cut, this function is used to cut a number of bits from string and returns the rest of the bits. Compute_Size, is a function to compute the number of bits that can be stored in the image in the first layer of Steganography (MC and SC).

90 75 Hidee, this function hides values of bits in a pixel. Do_Neural_Net, this function is to start the work of the neural network to embed the rest of the encrypted text if the first phase of Steganography (MC and SC) fails to embed the complete amount of the encrypted text in the pending Stego image resulted from the MC and SC phase. Display_NewImage, displays Stego image on picture box in the UI. This function also saves the final Stego image by calling the function Save_Picture. Save_Picture, is a function to save the final Stego image, it is executed in either one of two cases, the first case is when the first phase of Steganography (MC and SC) finish its work and completely embed the information of the encrypted text file in the image, or else if the neural network was used then this function (Save_Picture) will be executed by the neural network. e) The Neural Network This UML diagram explains the work and the functions of the neural network in this prototype. The neural network is the second layer of Steganography, i.e. the second layer of embedding. It only starts whenever the first layer of Steganography (MC and SC) finish its work without being able to hide the complete amount of information of the text file in the image. Figure 5.5 shows the details of this part of the code. Figure 5.5: The Neural Network

91 76 This part of the code contains the following functions: Do_Neural_Net, this function is explained in (d), it is the starter function of this part of the code, it calls the functions Extract_All_Unused_Bits, Hide_Last_Bits, Compare_Computations, Save_Adjusted_Weights, and Save_Picture. Extarct_All_Unused_Bits, this function finds and extracts all the free bits that were unused by the first layer of Steganography (MC and SC) to be used in the neural network layer to hide the rest of the information in those free bits. Compare_Computations, this is the function that compare the computations of the visual and statistical measures of the Stego image after each iteration of embedding and weight adjustment with the previous status of the image, it will only return the stop signal when the statistical and visual measures reach acceptable values that are preserve the image quality and satisfy the robustness against attacks. This function calls the function Computations to compute those measures. Computations, this function is called by the function Compare_Computations to compute statistical and visual measures for the image. Save_Picture, is the function to save the resulted Stego image resulted from the neural network as explained in (a). f) Computations of Visual and Statistical Measures This UML Diagram shows the Computations part of the code that is responsible of finding the visual and statistical measures of the image. Those measures are used in two cases in the code, one is to show those measures on screen if the first layer of embedding was enough to embed all the data of the text file. The second case is when the neural network is used, this function will be executed to decide that the Stego image has reached to the acceptable level of tolerance regarding

92 77 the Euclidian norm, the brightness difference, the difference between the neighbor pixels. Figure 5.6 shows the details of this part of the code. Figure 5.6: Computations of Visual and Statistical Measures This part of the code contains the following functions: Computations, as explained in (b), it calls the functions Find_Draw_Noise, Euclidian_Norm, Diff_btw_Neighbor_Value, and Brightness_Info. Find_Draw_Noise, this function computes the noise difference between cover and Stego image and this is done by calling the function Cumpute_Noise. Compute_Noise, is a function to compute the amount of Noise in an Image..

93 78 Euclidian_Norm, is a function to compute the visual measures of image (ecludian norm). Diff_btw_Neighbor_Value, is a function to compute the statistical measure of image (the difference btw neighbors), this is done by calling the function Diff_Value. Diff_Value, is a function to compute the difference value of the neighbor pixels to a certain pixel. This measurement is one of the basic statistical measures. Brightness_Info, this function performs another visual measure of the image by measuring the difference in brightness of cover and Stego image. This is done by calling the function Brightness. Brightness, is a function to compute the Brightness of an image. g) Information Extraction handling In this part of the code, the information extraction will take place, it will let the user to select the Stego image that contains the hidden information via the UI, and then will do the inverse Steganography to extract the encrypted text then decrypt this text into pane text to retrieve the hidden information. The functions of this part are as shown in Figure 5.7. Figure 5.7: Information Extraction handling

94 79 In the above UML diagram, the following functions are included: Do_Inverse_Steganography, this function is the starter function for retrieving the information from the Stego image, it will let the user will select the Stego image by calling the function Select_Image, then it will call the function Read_Password to input the Decryption Key which is the same Key used in the encryption earlier, then finally it will call the function Extract_Data_From_Image to extract the embedded data from the Stego image. Select_Image, this function lets the user to select a Stego image via the UI by popping up a file browser window on the screen. Read_Password, is a function that lets the user to input the Decryption Key via the UI by popping up a text box on the screen. Extract_Data_From_Image, this function will extract all the embedded information in the Stego image, it will perform the inverse Steganography and extract the data that was embedded by the two embedding phases (MC and SC, and the neural network), after the extraction is complete, this function will give the signal to start the decryption, and finally displays the plane text on the screen. h) Information Extraction This part of the code is responsible of the actual extraction process of the information and the decryption into a plane text that is the final goal. It will first extract the information that was hidden using the first phase of embedding (MC and SC). Then it will compare the size of the extracted information to the size of the original text, if the sizes are equal, then the extraction is complete, or else it will start the inverse function of the neural network to extract the rest of the information that was embedded using the second phase of embedding (Neural Network). After all, it will start the decryption process to give the final plane text. Figure 5.8 shows the work of this part and the functions included.

95 80 Figure 5.8: Information Extraction The functions of this part of the code are the following: Extract_Data_From_Image, is the starter function of this part, it will call the functions Find_Properties_of_Image, Extarct_from_Stego_Image, Convert_To_Char, Retrieve_Last_Bits, and Decryption. Find_Properties_of_Image, is to find the properties of the selected Stego image, those properties are finding the Main Cases (MC) and Sub Cases (SC) and the minimum of three values of the bytes in a pixel to choose minimum MC. Extarct_from_Stego_Image, is to extract from the Stego image the hidden information that was embedded by the first phase of

96 81 embedding (MC and SC). This function will call the functions Retrieve, and Concate_Bits. Retrieve, is a function that retrieves number of bits from a pixel. Concate_Bits, is a function to do concatenation for the extracted bits into bytes. Convert_To_Char, is a function to convert the extracted bits to characters. Retrieve_Last_Bits, this function is to start the work of the inverse neural network to extract the information that was hidden using the second phase of embedding (Neural Network), this is done by calling the functions Do_Neural_Net_Inverse, Convert_To_Char, and Con_Bits. Do_Neural_Net_Inv, this is the function that extracts the information that was embedded by the neural network earlier. Con_Bin, is to convert the decimal values into binary values. Decryption, is the function that decrypts the extracted encrypted information from the Stego image into plane text, this is done by calling the external C code of the AES decryption. aes_de, is the function that gives the AES tool the encrypted information to decrypt it. After the prototype implementation, the executable code will have the following User Interface for the information embedding and the visual and statistical measures for the resulted Stego image (figure 5.9).

will also have the following User Interface for the information extraction and

97 82 Figure 5.9: The User Interface for the prototype (the embedding and measures) The executable code will also have the following User Interface for the information extraction and decryption of the hidden information in a Stego image resulted by the same prototype (figure 5.10). Figure 5.10: The User Interface for the prototype (Information Extraction)

98 Summary In this chapter we discussed the prototype implementation walking through the implementation phases which are the requirements specifications, the prototype architectural design, the prototype tool development and testing the prototype functionality and executability. We also discussed and explained in details the prototype code structure and the UML diagrams of the code, the details of all the user defined functions used in this prototype and how are they linked to each other.

99 CHAPTER 6 TESTING THE RESULTS AND CONCLUSION 6. INTRODUCTION 6.1 Introduction After deciding the objectives, the scope, the research methodology, the structural design and implementing the prototype in the earlier chapters, this chapter will be for testing the results, analyzing them, and making the final conclusion. We will use a benchmark to perform these tests, and compare the results to this benchmark. We will also test the program behavior, the performance and the functionality of the code. After all, we have to prove that we met the objectives of this project within the scope that we are working on. 6.2 The Benchmark In this project a benchmark is needed to test the results, to know whether the values of the visual and statistical measures for the resulted Stego images are acceptable or not, and how efficient they are. The benchmark we will use is a standard widely used Steganography tool whose name is S-Tools (Version 4.0). S-Tools allows users to hide information into BMP, GIF, or WAV files. The basic scheme of the program is straight-forward; you drag an image or audio file into the S-Tools active window to act as the cover medium, drag the hidden data file onto the cover medium, and then provide a Stego key for encryption. The result is the Stego medium [52]. The author of this Steganography shareware for the pc is Andy

100 85 Brown. S-tools hides data in the least significant bits of the BMP images. S-tools can optionally encrypt the data with a key before hiding them, thereby providing an envelope that will not rouse suspicion. (The MD5 hash function is used to transform the key to 128 evenly distributed bits). S-tools can hide multiple files of secret data in one cover file. The files can optionally be compressed and encrypted before they are hidden. The data bits are hidden in the least significant bits of the pixel values, the key is used to spread the bits pseudo-randomly [53]. S-tools employs two techniques for hiding data in a cover image with 24 bits (or three bytes) per pixel. One technique is used when the image can have the maximum number of colors (2 24 or approximately 16.7 million). The program simply embeds three bits of data in each pixel, one bit in each of the three bytes of the pixel. The other technique is used when the number of colors is limited to 256 (even though each pixel is still three bytes). S-tools applies a palette optimization algorithm to reduce the number of colors to 32. It follows this by embedding the data (again, each three bytes, so changing the least significant bits of those bytes can change the color at most 2 3 =8 different ways. Thus, the total number of new colors can be at most 32X8=256. S-tools has an FDD (Feature Driven Development) module that can hide data in the free space of floppy disks. To understand how this works, we start with a short discussion of how the DOS (Disk Operation System) manages files on a disk. When the disk is formatted, it is divided into concentric circles called tracks and each track is further divided into several arc segments called sectors. Data are written in the disk (and also read from the disk) in sectors, so each piece of data has a two-part address, its track and sector numbers. The sector is the smallest addressable unit on the disk. Before writing a file on the disk, the DOS computes the number of sectors needed for the file and writes this information, together with the start address of the file, in a special table, called the file allocation table (FAT) in a special area at the start of the disk [54]. The FDD module checks the FAT to determine which sector are still unused and hides the data by writing them in those. It selects unused sectors pseudo-

101 86 randomly and starts by writing the size of the hidden data and the seed of the pseudorandom number generator. When all the data have been hidden, the module stores random bits in any of the remaining unused sectors, to confuse attackers. Notices that the sectors used to hide the data are still declared unused in the FAT [54]. S-tools is well known for its robust produced Stego images, the American military announced a professional security challenge contest, called the digital forensics challenge on 2006, this challenge is about breaking many selected steganographic and cryptographic products, and S-tools was the second of the list. They specified the challenge as the following: Examiners must develop and document a methodology used to determine which files in the Steg S-Tools folder contain steg. You will also be expected to identify the carrier file and payload, in addition to recovering the password (where applicable) for each file you identify as containing Steganography. Points will be awarded for each successfully accomplished task. [55]. S-Tools is currently the only reliable open source free Steganography tool available in the worldwide web. This tool would not embed the information file in the cover image unless the resulted image is assured to be robust against the visual and statistical attacks. If the user tries to embed a very large amount of data (more than 10% of the cover image size), S-Tools will reject to process it. The reason is that if S-Tools will try to embed this amount of information, the resulted Stego image will fail against visual and statistical attack. The way that the resulted Stego images from this prototype will be tested is to compare the visual and statistical measures of the Stego image to the visual and statistical measures of the S-Tools Stego images. These measures are the Euclidian norm, the difference of neighbor pixels, and the difference of brightness. All those measures will be performed on the cover and Stego images and compare the results of the measures for cover and Stego images.

102 Prototype Usage Limitations The usage of this prototype has limitations according to the scope and the structure design of the project. These limitations are: The dimensions of the image must be (2 n, where n 4) or i.e., any multiple of the number 16. This is because of the encryption Key size, for AES, it must be 16 or any multiple of 16. This size of the Key will decide the number of blocks in each direction (vertical and horizontal) in the image adaptive segmentation layer. The text file size must not exceed 63% of the cover image size. Otherwise the sample will be rejected. Some of the isolated cases of selected cover images will not be suitable samples for the neural network, and this depends on the nature of the image, the colors distribution and the pixels values. But if the desired text file to be embedded is small in size that does not need the usage of the neural network and could be completely embedded by the main cases and sub cases, then those same samples of cover images will not be rejected. 6.4 Testing Approaches And Methods For the testing, we selected 25 images as samples of cover image and performed the various tests with different text files sizes, and we used the same samples of the cover images and text files to produce Stego images using S-Tools as well to compare the results. We have 2 approaches to test this prototype:

88 6.4.1 Program Performance This is to test the program s behavior, the functionality, making sure that the program is executable and it will achieve the tasks given to it.

103 Program Performance This is to test the program s behavior, the functionality, making sure that the program is executable and it will achieve the tasks given to it. We have to make sure that the user interface buttons are working and the results will properly appear on the screen, and also having appropriate messages popping up on the screen for certain events during the running of the program. The program interface is shown in figures 5.9 and 5.10 in Chapter 5. On the screen after the after the embedding process, we will be able to see the cover image and the resulted Stego image, a graph for the noise of the images, a graph for the difference between neighbor pixels for both the cover and the Stego image. Furthermore, the value of the Euclidian Norm and the values of the brightness for both the cover and Stego images will be displayed. And finally the text file contents that are embedded will be also shown. After the extraction process and the decryption to plane text from the Stego image is completed, we will be able to see the Stego image that we extracted the information from and the text that was extracted and decrypted. If the neural network will be needed to embed a large amount of information, then it will automatically start with showing a message that the previous phase of Steganography couldn t completely embed the text file into the image and the neural network will be used (Figure 6.1). Figure 6.1: Pop up message informing the user about using the neural network

89 If the size of the selected text file is larger than 63% of the cover image, the program will refuse to start the process and will show a message informing the user to select another text file

104 89 If the size of the selected text file is larger than 63% of the cover image, the program will refuse to start the process and will show a message informing the user to select another text file (figure 6.2). Figure 6.2: Pop up message informing the user to select another text file In Some cases for some selected images, the neural network will not be able to perform the embedding, this is due to the nature of the picture, and the neural network will not be able to find additional suitable pixels to hide the information in, although the size of the text file is not exceeding the limit of 63%. In such case, the neural network will keep trying to select different locations for the embedding until and finally stop showing an error message of failure. This failure does not mean that the neural network has a problem in its functionality, but it is because of the cover image itself which is not a suitable sample for the program. After the tests were performed on the selected images and text files, 6 images out of 25 were rejected by the neural network when we tried to embed large amount of information. From above, it is assured that the behavior of the program normal, its functionality is as it is expected to be, and its readiness for unexpected events is quite high. The program is executable and it can perform the tasks that it is expected to achieve Results listing and analyzing The second approach of testing which is the most important is to check whether the prototype meets the objectives or not. This can be done by testing the

Chapter 3 LEAST SIGNIFICANT BIT STEGANOGRAPHY TECHNIQUE FOR HIDING COMPRESSED ENCRYPTED DATA USING VARIOUS FILE FORMATS

44 Chapter 3 LEAST SIGNIFICANT BIT STEGANOGRAPHY TECHNIQUE FOR HIDING COMPRESSED ENCRYPTED DATA USING VARIOUS FILE FORMATS 45 CHAPTER 3 Chapter 3: LEAST SIGNIFICANT BIT STEGANOGRAPHY TECHNIQUE FOR HIDING