Detection and Localization of Image and Document Forgery: Survey and Benchmarking

Detection and Localization of Image and Document Forgery: Survey and Benchmarking Anurag Ghosh Dongmian Zou Maneesh Singh Verisk Analytics {anurag.ghosh, dongmian.zou, maneesh.singh}@verisk.com Abstract Modern technology makes it easier to manipulate images. It is important to detect image forgery effectively since it can be malicious and has severe consequences. Different algorithms have been designed to detect various image tampering operations. In this paper, we summarize some of recently developed algorithms and available databases. We construct a dataset from the available raw images and benchmark the algorithms. Moreover, we conduct preliminary fusion according the output maps of the algorithms. We further train a convolutional neural network classifier using our dataset. The performance on the CNN classifier highly promising and state-of-the-art. 1. Introduction Images can be manipulated in various ways. Different touch-ups and image manipulation techniques are applied to augment or enhance a given image. For instance, mobile applications such as Instagram are very popular where users apply filters to improve the presentation of their images. Furthermore, images are regularly re-sized and recompressed, such that they can be more easily exchanged over the Internet due to the proliferation of cloud-based photo sharing and editing websites like Flickr and Picasa, spurred by the social media applications like WhatsApp, Instagram and Snapchat. These manipulations are typically not recognized as Image Tampering as the intent to manipulate the information content of the images is minimal. However, there exists many instances where nefarious intent is the sole purpose of manipulating images, such as manipulating the dollar amount on receipt images. Figure 1 illustrates an example of a forged image: the left picture is a manipulated picture showing a dent on a car, while the testing result on the right shows that it is highly possible that the dent is not authentic. The proliferation of image processing software, such as Photoshop and GIMP, provides the necessary tools to (a) The manipulated image (b) The likelihood map Figure 1: Manipulated image of a car and testing result achieve malicious manipulation of images with ease. This makes the reliable detection of tampering instances an important task. The process of detecting such manipulations is termed Image Forensics. An image forensics algorithm shall output information indicating whether the image has been tampered with, as well as more importantly, identify the portions of the image that have been altered. Image tampering and manipulation is typically detected with two kinds of techniques. First, active techniques which involve embedding a watermark to an image when it is taken and authenticating the image by checking the watermark. These techniques have not gained much momentum as they require sophisticated hardware Birajdar and Mankar [1]. Second, passive techniques that asses the integrity of the image based on the content and structure of the data representing the image. Blind image forensics techniques do not take image semantics in to consideration while identifying image forgery, i.e. there is no attempt made at understanding the content of the image. Although tampering methods may not have visibly manipulated the image, they very commonly leave some traces of the manipulation in terms of changes in image statistics, artifacts and inconsistencies. These inconsistencies are leveraged to identify when an image has been tampered and to localize the tampered region of the image. The blind image forensic algorithms examine various kinds of traces as we will see in subsequent sections. The popularity and high interest in these blind and passive techniques stems from the fact that they require computational resources as

opposed to active techniques Birajdar and Mankar [1]. Most of the widely referenced surveys, including [2, 3, 1], list algorithms proposed for both active and passive image forgery detection. However, they are not focused on nefarious manipulations, and they do not benchmark the performance of the surveyed algorithms. It is important for us to be able to distinguish between image enhancements from image tampering. It has been noted that image enhancement mostly do not involve local manipulations (notable exception would include Bilateral Filtering for touch-ups) while tampering at times involves local manipulations [3]. For instance, contrast and brightness adjustments are some operations which are normally not useful per se for image tampering, whereas sharpening and blurring operations may aid in image tampering to some extent and copy-move, splicing operations make up the malicious end of the spectrum. A universal definition of malicious intent in image manipulations is not straight forward. The definition of malicious intent would differ with the use case at hand. For example, a bank which allows photo-based check transactions by accepting customer check images, would accept manipulations caused by the process of sending those images, such quality reduction from compression. However, the manipulation of digits written in the image would be catastrophic and must be detected. In another instance, a cosmetics advertisement involving photos of models usually contain various manipulations and touch-ups, which may be considered malicious from a consumer s perspective. Various methods have been developed in recent years to tackle these image manipulations. Along with identifying malicious intent and different forms of manipulations, it is equally important to identify algorithms which are complementary in nature. These complementary algorithms can be integrated together to produce evidence of tampering covering different types of manipulations. In this paper, we deliver a survey of the different techniques and setup a new benchmark for future research. 2. Manipulation Classes Malicious image manipulation is typically done by applying local manipulations.localizing these manipulations provides a forensics expert with a lot of insight and some degree of confidence about the nature and extent of the tampering. We briefly describe the most common ways of tampering with images in such circumstances: Image Splicing is a very common type of manipulation where two or more images are spliced together, normally to create an impression that a foreground object is part of the background taken from the other image. This may involve blurring and other kinds of additional touch-ups to make the image look authentic. This can be used by well trained forgery experts to tamper documents where letters or whole words may be removed or manipulated, changing the entire meaning of the document. Another instance would be moving a person beside another in an unlikely image, say a politician beside a terrorist to create a rumor. Copy-moving is another kind of alteration where parts of the same image are used to retouch the image or hide objects (such as changing a crime scene photograph). An overview and evaluation of the methods used to detect copymove forgery has been documented by Christlein et al. [4] very well. Other techniques, such as seam carving are also used to remove objects from images and can be detected, for instance Sarkar et al. [5] discusses this manipulation. Rendered imagery due to advances in computer graphics are at times photo-realistic and thus indistinguishable to the naked eye. However, because they are not acquired from a digital device, it s seen that they can be distinguished from natural images through differences in noise residuals and artifacts that arise from the camera acquisition process. Steganography is another field of manipulation which attempts to convey hidden messages by hiding information in cover images without affecting the covers statistical properties, which might be used for detection. 3. Passive and Blind Localization Techniques Passive detection techniques normally do not involve the study of the contents of the image and only concentrate on various image statistics that can be used to discern from non-tampered regions from tampered regions. Some of the techniques involve exploiting the artifacts and inconsistencies that are created due to JPEG compression used widely as an Image format. Some techniques exploit the inherent noise present in the image due to difference in Color Filter array interpolation in different cameras or inconsistencies in the local noise pattern caused due to splicing. Yet another class of algorithm looks at the lighting inconsistency, but as these are primarily semi-automatic techniques, we will not be evaluating them. The biggest defining characteristics of the algorithms we review is that they do not involve any prior knowledge about the content of the image and only try to detect tampering through statistical means. For a list of algorithms that work only for copy-move operations, [4] provides a survey and benchmark. In general, a copy-move tampering does not need sophisticated techniques. We thus focus more heavily on splicing. For the names of the algorithms, we will use abbreviations according to Zampoglou et al. [21] for convenience. 3.1. DCT and Block Level Artifacts JPEG images are compressed according to 8 8 Discrete Cosine Transform (DCT) blocks. Algorithms use this fact to detect tampering operations under various principles.

Bianchi et al. [7] Amerini et al. [8] ADQ2 Double quantization effect. Probabilistic model for detecting tampering of block ADQ3 Double quantization effect, DCT coefficient first digit features (Benford s Law) Bianchi and NA Non aligned double Piva [9] JPEG compression (grid shifted tampering) Bianchi and NADQ Double JPEG compression Piva [10] (both aligned and not aligned) Krawets [11] ELA Error Level analysis (Resaving JPEG to get compression noise) Farid [12] GHO Error Level analysis, then finding ghosts Wang et al. [13] PCA (local minima) Error Level analysis, PCA to discriminate high frequency noise Li et al. [14] BLK Blocking artifact grid detection, marking abnormal grids Ye et al. [15] BLK2, Blocking artifact DCT measure / Quantization table estimation Method Abbr. Basis Strengths Weaknesses Other Remarks Lin et al. [6] ADQ1 Double quantization Works for splicing, Resolution of Widely used, first effect. Probabilistic inpainting etc. Re- DCT block automatic method. model for detecting tampering of block size. Probability sulting map has direct probabilistic interpretation Works for splicing, inpainting etc. Separates two conditional probabilities Uses simple but effective features Works for cropping and shifting, Threshold based detector Likelihood map for tampering, can be extended to use other evidences Very simple Works for splicing, inpainting etc, Fast and simple Works for splicing, inpainting etc, Fast and simple Works for splicing, inpainting, cropping and copy move Very fast (uses MLE for estimation) inaccurately defined Resolution of DCT block size. The probabilistic map looks noisy Doesn t work for heavily compressed images. The first-digit statistics is suitable mainly for natural images Not useful for splicing, copy move etc Doesn t work if image is resized Not very discriminating Only lower quality region are detected Tampered regions should have high frequency component The block artifact grid maps need to be made clearer JPEG recompression reduces efficacy Table 1: Tamper Localization approaches which exploit JPEG related artifacts The authors train an SVM for classification. Can be adapted to JPEG2000 format Automatic. The authors claim improvement on [6] when the second quantization factor is small Automatic. The authors train an SVM for classification Semi-automatic. Detailed math model Automatic. Detailed math model Semi-automatic Semi-automatic Automatic Semi-automatic Semi-automatic, One of the earliest approaches 3.1.1 Double Quantization JPEG images which are tampered suffer from a double phenomenon known as double compression, with inconsistencies between DCT histograms of singly and doubly compressed regions. DCT coefficients of unmodified areas undergo a double JPEG compression thus exhibiting double quantization (DQ) artifacts, while DCT coefficients of tampered areas will result from a single compression and very likely present no artifacts.

Method Abbr. Basis Strengths Weaknesses Other Remarks Ferrara CFA1 Tampering alters artifacts Gives a tampering Strongly affected by Same as Dirik et al. [16] due to demo- probability of 2 2 JPEG compression and Memon [17], saicking algorithm block, very fine and resizing High false positive grained likelihood rate for very map sharp regions Dirik and CFA2 Tampering alters artifacts One feature and sim- Strongly affected by Automatic, Some Memon due to demople thresholding clas- JPEG compression sensors don t ex- [17] saicking algorithm sifier and resizing hibit CFA artifacts Yerushalmy PFA Inconsitancies in Works for cropped Contrast dependent Automatic, PFA and Hel- Chromatic abberation images (sensitive to very low events may not Or [18] artifacts (Purple Fringing) contrast) be found in every image Mahdian NOI1 Segmentation based Simple median Assumption might Automatic and Saic on noise level obtained based estimator not hold. The authentic [19] from Wavelet transform (MAD) and merging algorithm part may have noise variance Lyu et al. NOI2 Estimating Noise Efficient, complementary Assumption might Automatic [20] characteristics (Kurtosis to JPEG not hold. The au- in band pass artifacts at high thentic part may domain) quality levels have noise variance. Heavy compression affects efficacy Table 2: Tamper Localization approaches which exploit Local Noise related features Lin et al. [6], Bianchi et al. [7] both try to identify tampered blocks by considering DCT coefficients and computing a likelihood map indicating the probability for each 8 8 DCT block of being double compressed. Low frequency coefficients within each block are used in practice to determine the probability that a block is tampered assuming that the DCT coefficients within a block are mutually independent. The likelihoods are computed based on a Bayesian approach according to the evaluated periodicity of the DCT coefficient histograms. The difference lies in the choice of assumed distribution. Further, Bianchi and Piva [9] and [10] improve and extend on the aforementioned methods by identifying two kinds of double JPEG artifacts. A-DJPG and NA-DJPG artifacts depend on whether the second JPEG compression adopts a discrete cosine transform (DCT) grid aligned with the one used by the first compression or not. The authors proceed by describing two accurate models (and their simplified versions) to characterize A-DJPG and NA-DJPG artifacts, based on previous work by Lin et al. [6]. Amerini et al. [8] describes a method which identifies a specialized feature set to classify blocks using SVM as being double compressed or not. The authors claim that DCT coefficients of single quantized images follow Benford s distribution, while it is not the case for double quantized images. 3.1.2 Error Level Analysis Error level analysis is another method proposed by Krawets [11]. It works by intentionally resaving the JPEG image at a known error rate and then computing the difference between the images. Any modification to the picture will alter the image such that stable areas become unstable. Differently compressed versions of the image are compared with the possibly tampered one by Farid [12]. When the same quality factor of the tampered area is adopted, a spatial local minima, christened as JPEG ghosts by the author, appear and can be used to discern tampered regions. Wang et al. [13] extend the analysis by extracting the high frequency noise from this noise map using Principal Component Analysis and then characterizing the tampered region based on the high frequency noise. 3.1.3 Block Artifact Grids For manipulated images, when the tampered part is pasted into the background image, the DCT blocks do not match and some block artifacts will be left. Li et al. [14] describe a method that uses second order difference of pixel values

to extract the Block Artifact Grids and then automatically identify the regions which are likely to be tampered. Ye et al. [15] proposes two methods. The first method uses DCT coefficients to estimate the block artifacts. The second method is by first estimating the DCT quantization table and then checking the uniformity of the quantization remainders. 3.2. Camera and Local Noise Residuals 3.2.1 Color Filter Array Image features like Local Noise or Camera Noise arising from the image acquisition process or due to the manufacturing or hardware characteristics of a digital cameras, provide sufficient information to determine an image s authenticity since they are sensitive to image manipulation as well as being difficult to forge synthetically. The methods described in this section are based on the intuition that image regions of different origins may have different noise characteristics introduced by the sensors or post-processing steps of their original source. During acquisition, every pixel receives only a single color-channel value (red, green or blue). To produce the final image, the raw data undergoes an interpolation process, using Color Filter Array (CFA) to obtain a color image with different cameras using slightly different parameters to perform the interpolation. Dirik and Memon [17], Ferrara et al. [16] exploit the artifacts created by Color Filter Array processing in most digital cameras. Both of the techniques involve estimating CFA interpolation pattern and CFA based noise analysis as features and training a classifier based on these features. This line of attack is more robust than the algorithms mentioned above as they can be applied to images other than those saved in JPEG format. A limitation is that the CFA estimation are sensitive to strong JPEG recompression and resizing. 3.2.2 Purple Fringing Aberration Yerushalmy and Hel-Or [18] tamper identification method is based on the effects introduced in the acquired image by the optical and sensing systems of the camera and tries to identify local artifacts arising from chromatic-abberation (Purple Fringing Aberration or PFA) due to image acquisition procedure of a camera lens. The geometric center of the image can be inducted from the PFA events. For localization, the PFA normal flows are used to detect tampered areas. 3.2.3 Local Noise Analysis Mahdian and Saic [19] propose a novel method where they estimate noise levels of blocks from a sub-band generated from a one level wavelet transform and then label and merge blocks by thresholding the difference of the estimated noise levels. The noise levels are estimated by a median based estimator. The main drawback of this method is that authentic images also can contain various isolated regions with totally different variances which can be denoted as inconsistent with the rest of the image. Lyu et al. [20] exploits a regular property of Kurtosis of images in band pass space to estimate noise characteristics. The noise statistics estimation is formulated as an optimization problem with closed-form solution, which is used to estimate local noise statistics to discern between tampered regions of an image. 4. Related Datasets In this section, we present related datasets which are used for image forgery classification and detection: CASIA Image Tampering Detection Evaluation Database [22] is a widely used standard dataset for evaluating forgery detection. It consists of uncompressed images with various resolution as well as JPEG images with different compression quality factors. The images involve splicing (with arbitrary contour) and also post-processing (blurring and filtering). However, this dataset does not provide the ground truth masks for localization of the tampering operations. Furthermore, in [23], the authors argue that there exist statistical artifacts in the way the dataset is built, and might produce unfair results for many forgery detection algorithms. MICC Image Databases [24] is a dataset which aimed at copy move forgery detection and localization. The databases can be further divided into there datasets: F2000, F600, F220, which all contains high resolution images. In each of these datasets, around half of the images are tampered. Only the F600 provides ground truth masks for the tampered images. The type of processing on the copy-move forgeries is limited to rotation and scaling. Dresden Image Database [25] is constructed with the aim of evaluating and developing methods for detecting image tampering as well as identifying the type of device for the acquisition of an image. It contains images taken using 73 digital cameras in 25 different models. They use various camera settings when the authors take the pictures. Columbia Uncompressed Image Splicing Detection Evaluation Dataset [26] provides tampered and original images with image splicing without various post processing techniques applied. It also provides edge ground truth masks for evaluation of the localization of the tampered images. However, the resolution is low and the size of the set is small (e.g., 363 images with 180 tampered and 183 authentic. RAISE Raw Image Dataset [27] consists of 8156 highresolution uncompressed images. The images contain various categories, including outdoor images, indoor images,

Dataset Type Size Remarks CASIA Tampered Image Image Spliced and 7491 tampered and 5123 au- No ground truth masks provided for Detection Evaluation Post Processed thentic localization Database (CASIA TIDE) MICC F2000 Image Copy Move Forgery 700 tampered and 1300 authentic No ground truth masks provided for database localization MICC F600 Image database Copy Move Forgery 160 tampered and 440 au- Ground truth masks provided thentic 180 tampered and 183 authentic Columbia Uncompressed Image Spliced, Uncompresseverted Edge masks provided, can be con- Image Splicing Detection to ground truth Evaluation Dataset RAISE Raw Image dataset Uncompressed 8156 raw images Can be used to create artificial sets with various JPEG qualities and tampering classes Uncompressed Color Image Database (UCID) Dresden Natural Images Database Christlein et al. [4] s Dataset Uncompressed 1300 raw images Low resolution, Same as RAISE Uncompressed 1492 raw images Aimed at Camera Identification, Same as RAISE Uncompressed, 48 authentic images Constructs a large number of manipulations Copy-move along with other artifacts Table 3: An overview of the datasets commonly used in image forgery detection Landscape and Nature scenes along with People, Objects and Buildings. They have also provided smaller subsets, RAISE-1k, RAISE-2k, RAISE-4k and RAISE-6k. Uncompressed Colour Image Database [28] was originally a benchmark dataset for image retrieval with the goal of understanding the effects of compression on contentbased image retrieval (CBIR). Table 3 gives a summary of the datasets. 5. Results 5.1. Our Dataset From the existing datasets we build our dataset for benchmarking and learning. The Dresden uncompressed image dataset and the RAISE image dataset (see detail in Section 4) have a relatively larger number of raw images of high resolution and thus are a good source for carrying out operations such as splicing and copy-moving. We cut a rectangular part of size 720-by-600 as the authentic image. We build an image forgery simulation system which take input raw images and automatically generates tampered images and produces ground truth masks as well. Note that in this way it might be easy for a human to distinguish a tampered image from an authentic one, but the difficulty does not reduce for the machine and the dataset is appropriate for benchmarking. Specifically, the tampered images used in our experiments are spliced ones. For splicing, the system randomly picks one JPEG image and one uncompressed (TIFF) image from the authentic images. A polygonal part is taken from the TIFF image and used to replace part from the JPEG image. Then we filter our image with Gaussian and save the image as a compressed JPEG file. The simulation system is capable of handling other types of tampering operation as well. For copy-move, it takes an authentic image from the dataset and copies a polygonal part and paste to another region on the same image and we carry out the same post-processing as we do for splicing. The benchmarking tasks in Section 5.2 are done with 20000 authentic images and 20000 tampered images (thus 40000 in total). The algorithms to be benchmarked all work for splicing and thus we take the spliced images. The same set of images are used for fusion in Section 5.3: there, the 40000 images are further split into 29980 training images and 9990 testing images (we had to discard 30 maps as the algorithm outputs weren t uniform). The details for training will be given in the specific section. 5.2. Benchmarks We take thirteen of the algorithms described in Section 2 and perform a benchmarking using authentic images and spliced images as described in the last section. We collect the results both for the classification (i.e. whether an image has been tampered) and localization / detection (i.e. which part of an image has been tampered). In [21] the authors perform a Kolmogorov-Smirnov test and compare the

statistics of the tampered region and the untampered region. They then put a threshold on that difference and draw an ROC curve. However in this way it is not clear how accurate the detected region is localized. Moreover it does not work if the algorithm only outputs a classification result and does not have the likelihood map of tampering. We propose two benchmark processes for classification and localization respectively. The expected output of a classification algorithm is a single value. The expected output of a localization algorithm is normally a likelihood map of pixels or blocks. Our performance characterization system takes both cases. In case the result does not contain a classification result, we use a uniform decision rule and compare the performance. From a classification algorithm we can easily compute the true positive (TP) and false positive (FP) rates since the output is binary. In case we have a localization algorithm, we first resize the output likelihood map and the ground truth to the same dimension, say m-by-n. Let p(x, y) denote the (x, y)-pixel value of an images, 1 x m, 1 y n. Then the value of the pixels, p(x, y) are normalized to be in the range [0, 1]. A threshold h is introduced to distinguish tampered and untampered pixels. In particular, we generate a binary decision map P with P (x, y) = 1 if p(x, y) > h and P (x, y) = 0 otherwise, with the understanding that 1 means tampered and 0 means untampered. Also, denote the groundtruth mask by N(x, y). We count the number of pixels N 1 = {(x, y) : N(x, y) = P (x, y) = 1}, N 2 = {(x, y) : N(x, y) = 1, P (x, y) = 0}, N 3 = {(x, y) : N(x, y) = 0, P (x, y) = 1}, N 4 = {(x, y) : N(x, y) = 0, P (x, y) = 0}. Consequently we compute the intersection-over-union (IOU) metric, which is given as IOU = N 1 /(N 1 + N 2 + N 3 ). We use another threshold h for IOU: for an image, the output is 1 if IOU > h and 0 if otherwise, with the understanding that 1 means detect and 0 means not detected. Then we can compute the TP and FP rates according to the respective number of images. Note that for a fixed h, if we adjust h, we get a series of pairs of TP-FP rates. We plot them and connect to draw an ROC curve. In this case, if we take a number of values for h, we get a bunch of ROC curves. Note that each TP-FP plot corresponds to a threshold, and thus a decision rule. Therefore, each point on the line segment connecting any two plots corresponds to an available decision rule. Therefore, from a reasonable user s point of view, it is sufficient to consider the convex hull of an ROC curve. In our case we have both h and h ; therefore we take the convex hull of all the curves. Although it is always the case for users, in general a larger area under curve (AUC) implies a preference for the corresponding algorithm. If we are only provided with the likelihood maps, it is important that we have an automatic decision rule. An easy approach is that we count the number of tampered pixels. Algorithm Classification Detection Dectection (Area) ADQ1 0.8877 0.8717 0.6234 ADQ2 0.8129 0.9488 0.5965 ADQ3 0.6667 0.7009 0.5404 BLK 0.8420 0.9515 0.5303 BLK2 0.6452 0.8521 0.6610 CFA1 0.6718 0.7592 0.5000 CFA2 0.5882 0.7649 0.5570 DCT 0.6882 0.8463 0.5319 ELA 0.8362 0.9497 0.6019 GHO 0.6841 0.9067 0.6117 NADQ 0.6262 0.5485 0.5111 NOI1 0.5499 0.7476 0.5451 NOI2 0.5179 0.8645 0.5251 Table 4: Area-Under-the-Curve (AUC) for classification and detection. The algorithm names corresponds to the abbreviations in Table 1 and 2. In this case, we again use h to distinguish tampered and untampered pixels, as discussed above. We then count the number n of pixels with p(x, y) = 1. The output is tampered if n > h, and untampered if otherwise. Again fix h we draw an ROC curve for h, and then we get a number of curves by adjusting h. The ROC curves for the classification and detection for the 13 algorithms are put in the supplementary file. The AUC s are given in Table 4. While the above benchmarks are standard in the computer vision community, people may use different methods. In some papers (e.g. Lyu et al. [20]) people draw ROC curves based on the area of correctly detected region. This idea can be practiced by counting the number of pixels. Still we use threshold h to distinguish tampered and untampered pixels. Then we count the total number of pixels over all the images. Similar to above we calculate N 1,, N 4. Then the average TP rate is N 1 /(N 1 + N 3 ) and the average FP rate is N 2 /(N 2 + N 4 ). 5.3. Fusion When making a decision on image forgery, it is sometimes useful to consider combining the algorithms and their fused result. According to Fontani et al. [29], there are three basic approaches to fusion: at the feature level, the output level and the abstract level. In our case, the easiest way to fusion is through the decision rule from all the algorithms. Since we have the 13 algorithms and their ROC curves, we can draw the convex hull of the ROC curves altogether, which represents a decision rule based on thresholds of all the algorithms. The resulting curve is as drawn in Figure 2. This is more like on an abstract level, that is, a fusion

based on the thresholded results. It is the simplest way but may not represent the best result as for accuracy of classification. 5.4. Experiments using Fully Connected Networks To classify images by fusing outputs from the above benchmarked algorithm, we build a simple fusion architecture and also benchmark decision making using neural networks for each individual algorithm. We divided our dataset into a training set of 29980 images and test set of 9990 images, over which we benchmarked optimizerour networks (we had to discard 30 maps as the algorithm outputs were not uniform). Firstly, We train a separate 4-layer fully connected network for the algorithms. The input to the individual networks is taken as the flattened output of the algorithms (scaled to dimension 75 90 for ELA, GHO and NOI2, other algorithms were either of size 75 90 or 38 45). The intermediate layers are activated using the softmax activation function while the ultimate layer is activated using a sigmoid function. Mean-squared error and Adam optimizer( Kingma and Ba [30]) are taken as the loss function and optimizer respectively. Secondly, to build an end-to-end fusion architecture, we extend the networks we described above. We connect the penultimate layers of the individual networks to a merge layer which simply concatenates the outputs of all the algorithms. We then further connect a fully connected layer activated by the softmax function and the last layer is activated using the sigmoid function. Our initial results are very encouraging as can be seen from the results presented in Table 5 (we compare with the best accuracy derived from the ROC curves). It should be noted that the training and testing setups are different while computating both the columns but we do not anticipate the performance to change. Algorithm Accuracy (NN) Accuracy (ROC) ADQ1 0.7853 0.8113 ADQ2 0.6532 0.7317 ADQ3 0.7356 0.6574 BLK 0.8390 0.5960 BLK2 0.5000 0.7985 CFA1 0.5000 0.6251 CFA2 0.6313 0.5684 DCT 0.5000 0.6322 ELA 0.5332 0.8018 GHO 0.5000 0.6406 NADQ 0.8070 0.6174 NOI1 0.5000 0.5498 NOI2 0.5000 0.5172 Fusion 0.9402 0.8113 Table 5: Accuracy for classification: the second column is from the neural network, the third column is from the best result according to the ROC curve. 6. Discussion In this paper, we summarized state-of-the-art algorithms for image forgery detection. We prepared a dataset for benchmarking algorithms with outputs in the form of a likelihood map. We performed benchmarking with respect to three different measures and got the ROC curves respectively. We also trained a neural network to get a classifier for image forgery from the output maps. For benchmarking, it may appear anomalous that the detection results are better than the classification ones. However, detection curves are computed only for positive images, i.e. for those images where manipulations exist. It is worth noting that our two benchmarks for detection have different meanings. The one according to IOU is based on the number of correctly and incorrectly detected images, while the one according to the area is based on the number of correctly and incorrectly detected pixels. The latter one has much worse result since the requirement is more strict. The fusion result for the neural network is promising. Note that individual results may not exceed the accuracy according to our simple decision rule discussed in Section 5.2. Nevertheless when we have a combined network we have a classifier that is outperforming. Our future work includes creating a larger dataset that covers a wider range of manipulations and building networks accordingly. It would also be interesting to extend the architecture to detect the manipulated regions more effectively apart from just classifying. Figure 2: Fused ROC curves

References [1] G. K. Birajdar and V. H. Mankar, Digital image forgery detection using passive techniques: A survey, Digital Investigation, vol. 10, no. 3, pp. 226 245, 2013. [2] H. Farid, Image forgery detection a survey, 2009. [3] A. Rocha, W. Scheirer, T. Boult, and S. Goldenstein, Vision of the unseen: Current trends and challenges in digital image and video forensics, ACM Computing Surveys (CSUR), vol. 43, no. 4, p. 26, 2011. [4] V. Christlein, C. Riess, J. Jordan, C. Riess, and E. Angelopoulou, An evaluation of popular copy-move forgery detection approaches, Information Forensics and Security, IEEE Transactions on, vol. 7, no. 6, pp. 1841 1854, 2012. [5] A. Sarkar, L. Nataraj, and B. S. Manjunath, Detection of seam carving and localization of seam insertions in digital images, in Proceedings of the 11th ACM workshop on Multimedia and security. ACM, 2009, pp. 107 116. [6] Z. Lin, J. He, X. Tang, and C.-K. Tang, Fast, automatic and fine-grained tampered jpeg image detection via dct coefficient analysis, Pattern Recognition, vol. 42, no. 11, pp. 2492 2501, 2009. [7] T. Bianchi, A. De Rosa, and A. Piva, Improved dct coefficient analysis for forgery localization in jpeg images, in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011, pp. 2444 2447. [8] I. Amerini, R. Becarelli, R. Caldelli, and A. Del Mastio, Splicing forgeries localization through the use of first digit features, in Information Forensics and Security (WIFS), 2014 IEEE International Workshop on. IEEE, 2014, pp. 143 148. [9] T. Bianchi and A. Piva, Detection of nonaligned double jpeg compression based on integer periodicity maps, Information Forensics and Security, IEEE Transactions on, vol. 7, no. 2, pp. 842 848, 2012. [10], Image forgery localization via block-grained analysis of jpeg artifacts, Information Forensics and Security, IEEE Transactions on, vol. 7, no. 3, pp. 1003 1017, 2012. [11] N. Krawets, A picture s worth: Digital image analysis and forensics, 2007. [12] H. Farid, Exposing digital forgeries from jpeg ghosts, Information Forensics and Security, IEEE Transactions on, vol. 4, no. 1, pp. 154 160, 2009. [13] W. Wang, J. Dong, and T. Tan, Tampered region localization of digital color images based on jpeg compression noise, in Digital Watermarking. Springer, 2010, pp. 120 133. [14] W. Li, Y. Yuan, and N. Yu, Passive detection of doctored jpeg image via block artifact grid extraction, Signal Processing, vol. 89, no. 9, pp. 1821 1829, 2009. [15] S. Ye, Q. Sun, and E.-C. Chang, Detecting digital image forgeries by measuring inconsistencies of blocking artifact, in Multimedia and Expo, 2007 IEEE International Conference on. IEEE, 2007, pp. 12 15. [16] P. Ferrara, T. Bianchi, A. De Rosa, and A. Piva, Image forgery localization via fine-grained analysis of cfa artifacts, Information Forensics and Security, IEEE Transactions on, vol. 7, no. 5, pp. 1566 1577, 2012. [17] A. E. Dirik and N. Memon, Image tamper detection based on demosaicing artifacts, in Image Processing (ICIP), 2009 16th IEEE International Conference on. IEEE, 2009, pp. 1497 1500. [18] I. Yerushalmy and H. Hel-Or, Digital image forgery detection based on lens and sensor aberration, International journal of computer vision, vol. 92, no. 1, pp. 71 91, 2011. [19] B. Mahdian and S. Saic, Using noise inconsistencies for blind image forensics, Image and Vision Computing, vol. 27, no. 10, pp. 1497 1503, 2009. [20] S. Lyu, X. Pan, and X. Zhang, Exposing region splicing forgeries with blind local noise estimation, International Journal of Computer Vision, vol. 110, no. 2, pp. 202 221, 2014. [21] M. Zampoglou, S. Papadopoulos, Y. Kompatsiaris, R. Bouwmeester, and J. Spangenberg, Web and social media image forensics for news professionals, in Social Media In the NewsRoom, SMNews16@CWSM, Tenth International AAAI Conference on Web and Social Media workshops, 2016. [22] National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Corel Image Database, and photographers, Casia tampered image detection evaluation database. [23] G. Cattaneo and G. Roscigno, A possible pitfall in the experimental analysis of tampering detection algorithms, in Network-Based Information Systems (NBiS), 2014 17th International Conference on. IEEE, 2014, pp. 279 286. [24] I. Amerini, L. Ballan, R. Caldelli, A. Del Bimbo, and G. Serra, A sift-based forensic method for copy move attack detection and transformation recovery, Information Forensics and Security, IEEE Transactions on, vol. 6, no. 3, pp. 1099 1110, 2011. [25] T. Gloe and R. Böhme, The dresden image database for benchmarking digital image forensics, Journal of Digital Forensic Practice, vol. 3, no. 2-4, pp. 150 159, 2010. [26] Y.-F. Hsu and S.-F. Chang, Detecting image splicing using geometry invariants and camera characteristics consistency, in International Conference on Multimedia and Expo, 2006. [27] D.-T. Dang-Nguyen, C. Pasquini, V. Conotter, and G. Boato, Raise: a raw images dataset for digital image forensics, in Proceedings of the 6th ACM Multimedia Systems Conference. ACM, 2015, pp. 219 224.

[28] G. Schaefer and M. Stich, Ucid-an uncompressed colour image database. [29] M. Fontani, T. Bianchi, A. D. Rosa, A. Piva, and M. Barni, A framework for decision fusion in image forensics based on dempster shafer theory of evidence, IEEE Transactions on Information Forensics and Security, vol. 8, no. 4, pp. 593 607, April 2013. [30] D. Kingma and J. Ba, Adam: A method for stochastic optimization, arxiv preprint arxiv:1412.6980, 2014.