Image Forgery Forgery Detection Using Wavelets
Introduction Let's start with a little quiz...
Let's start with a little quiz... Can you spot the forgery the below image?
Let's start with a little quiz... Can you recognize these identical patches? Existed in the original image Pasted here
Let's start with a little quiz... This is the original image: The following detail was hidden
Let's start with a little quiz... Can you spot the forgery the below image?
Let's start with a little quiz... Can you recognize these identical patches? Existed in the original image Pasted here Pasted here
Let's start with a little quiz... This is the original image: The following detail was hidden
Let's start with a little quiz... Can you spot the forgery the below image?
Let's start with a little quiz... Can you recognize these identical patches? Pasted here Existed in the original image
Let's start with a little quiz... This is the original image: The following detail was hidden
Introduction The Problem So, as you can probably guess from the quiz we started with, we want to deal with the problem of image forgery by copying and pasting patches from the original image into itself, which is commonly used for hiding image details... Forge...
Let's go on...
Introduction Our Goal So, our goal is to detect the image forgery i.e. we want to detect and spot 2 or more image patches which are very similar. We base on the fact that in a natural image, the probability of patches to be that similar tends to 0
Introduction Our Goal What is the probability to have these 2 identical leaves and their water drops in the same position in one natural image? You're right the probability tends to 0 Some farmer might have wanted to clear an insect from his logo?
Introduction Challenges We want to support detection also with rotation Copy-paste with rotation
Introduction Difficulties 1) We want to compare each patch to each other patch in the image, in various sizes many comparisons 2) As we also want to support comparison under geometrical transformations, the amount of comparisons becomes enormous and not practical
Introduction Former Works There were former works trying to solve the same problem (Like with PCA). The advantages of the solutions we'll see today: 1) Run faster and computationally easier 2) Detection after JPEG Compression or Gaussian noise 3) Detection of rotated duplication
Let's go on...
Introduction Wavelets Let's start with an example...
Introduction Wavelets Let's have a look at the following frequency-domain filters: A horizontal high-pass filter A horizontal low-pass filter And of course some classic image... A vertical high-pass filter A vertical low-pass filter
Introduction Wavelets Now, let's take the image and compute the following components: Apply horizontal high-pass filter and vertical high-pass filter
Introduction Wavelets Now, let's take the image and compute the following components: Let's call this component HH
Introduction Wavelets Now, let's take the image and compute the following components: Apply horizontal low-pass filter and vertical high-pass filter
Introduction Wavelets Now, let's take the image and compute the following components: Let's call this component LH
Introduction Wavelets Now, let's take the image and compute the following components: Apply horizontal high-pass filter and vertical low-pass filter
Introduction Wavelets Now, let's take the image and compute the following components: Let's call this component HL
Introduction Wavelets Now, let's apply low pass filter in both directions and down-scale the image (like in Gaussian pyramid): Let's call this component LL
Introduction Wavelets Now, let's continue computing the components recursively, using the down-scaled image as the source image: Continuing recursively...
Introduction Wavelets We decided to stop the pyramid computations after two iterations, but we could continue to compute more levels... The image, again low-pass-filtered and down-scaled
Introduction Wavelets Wavelets features: We name the wavelet components according to their type and hierarchy level: LL2 LH1 HL2 LH2 HH2 HH1 HL1
Introduction Wavelets Wavelets features: We can compute any LLi component, including the original image, LL0, by up-scaling the LLi component to its original size and combining it with the HLi, LHi and HHi components We save only the smallest LL component in the pyramid, but, we also save all the HL, LH and HH components, thus we can recursively compute any LLi
Introduction Wavelets Wavelet reconstruction of LLi from (i-1) level components : Taken from Matlab documentation: http://www.mathworks.com/help/wavelet/ref/idwt2.html
Introduction Wavelets Wavelets benefits: While a regular image gives us only very detailed spatial information (We have no frequency index ):
Introduction Wavelets Wavelets benefits:...and while FFT gives us only very detailed frequency information (We cannot see the image):
Introduction Wavelets Wavelets benefits: A wavelet pyramid tries to give us both of them:
Introduction Wavelets Wavelets spatial vs frequency tradeof: 1) We can get a high resolution spatial information from the wavelet's lower levels (those that are closer to the original image) 2) We can get various frequencies information when going up and down the pyramid levels, where upper levels show lower frequencies
Introduction Wavelets More about wavelets: Discrete Wavelet Transform (DWT), also commonly called Wavelets, has huge number of applications in engineering, computer science and other sciences. A remarkable and very known usage of DWT is for JPEG 2000 image compression... We will now see how it helps with detecting copy-paste image forgeries
Last steps before the real thing...
Introduction Phase Correlation Again, let's start with an example... Take a look at these 2 noisy images:
Introduction Phase Correlation If you look deeply, you might be able to see that they are not just translated, but also each of them has its own diferent Gaussian noise...
Introduction Phase Correlation Let's apply phase-correlation using the following operation: F is the Fourier Transform conj is the complex conjugate
Introduction Phase Correlation The location of this bold white dot tells us that the image are translated according to its location from top-left: (x1, y1) Input Phase Correlation Output IFFT
Introduction Phase Correlation Usage for image matching: For 2 similar images, we expect to see a significant peak in the phase-correlation
Introduction Phase Correlation Usage for image matching: The most significant advantage of Phase Correlation, in contradiction to other correlations, is the resilience to Gaussian noise, JPEG compression and other noises and defects. It enables to efficiently find similarity while other correlations are mostly good for finding identity
Let's start with the interesting stuff...
Idea We can split the image to overlapping blocks in some size that we choose, and then we can try to somehow compare them and to look for similar ones. The main problem is that on large image, this process could take very long time
Idea An optimization: Let's look for similar blocks in the smallest image in the pyramid. Then, we'll each time continue to a larger level, checking only the candidates found in the former level x x This process might save us a lot of time
Idea An example: Forged Image
Idea An example: Detection in smallest level LLk
Idea An example: Detection in next level LLk-1
Idea An example: Detection in next level LLk-2
Idea An example: The Original Image
Let's go on...
Flow Input Image Convert to Grayscale
Flow Input Image Convert to Grayscale Compute Wavelet Transform
Flow Input Image Convert to Grayscale Compute Wavelet Transform Matrix of overlapping blocks We divide the smallest LL image in the pyramid, into all possible (bxb)-sized overlapping blocks
Flow Input Image Convert to Grayscale We turn each block into a b^2 sized vector and assign each vector as a row into a (M-b+1)x(N-b+1)-sized matrix Compute Wavelet Transform Matrix of overlapping blocks Block 1 Block (M-b+1) In another matrix, we save each vector's real coordinates in the image
Flow Input Image Convert to Grayscale Compute Wavelet Transform Matrix of overlapping blocks We remove all blocks with low contrast. We tell that a block has low contrast if the difference between its maximum intensity pixel and its minimum intensity pixel is lower than some predefined threshold T. This helps to prevent noisy results, such as 2 identical blue patches of the sky
Flow Input Image Convert to Grayscale Compute Wavelet Transform Matrix of overlapping blocks Then, we sort the remaining matrix rows lexicographically
Flow Input Image Convert to Grayscale We compute phase-correlation between each block and its p matrix neighbors above and below the current row: Compute Wavelet Transform Matrix of overlapping blocks Compute blocks phase-correlation F is the Fourier Transform conj is the complex conjugate
Flow Input Image Convert to Grayscale Compute Wavelet Transform If the maximum value of the phase-correlation of 2 blocks exceeds our defined threshold T, we mark the coordinates of these blocks as candidates that should be checked more deeply in the larger wavelet pyramid's LL images Matrix of overlapping blocks Max Compute blocks phase-correlation
Flow Input Image Convert to Grayscale Compute Wavelet Transform Matrix of overlapping blocks Compute blocks phase-correlation Repeat checking candidates down the pyramid (larger images) We will now take the coordinates of the candidate blocks we found similar in the current level, and will check them in the next, more detailed level, LLk-1
Flow Input Image Convert to Grayscale Compute Wavelet Transform The process now is much simpler: we already have candidates blocks to compare using phase correlation Candidates for matching Matrix of overlapping blocks Compute blocks phase-correlation Repeat checking candidates down the pyramid (larger images) We add m pixels on each side of the matching regions, so our Coordinates fit to the larger image
Flow Input Image Convert to Grayscale Compute Wavelet Transform Matrix of overlapping blocks Compute blocks phase-correlation Repeat checking candidates down the pyramid (larger images) Again, we compute phase-correlation of the more detailed blocks, and if the largest value exceeds the threshold T, we mark the blocks as candidates for the next level
Flow Input Image Convert to Grayscale We repeat this step down the pyramid, until finally we report the matching blocks in LL0 (the original image), as duplicates LLk Compute Wavelet Transform candidates LLk-1 Matrix of overlapping blocks Compute blocks phase-correlation Repeat checking candidates down the pyramid (larger images)... candidates LL0
Flow Input Image Convert to Grayscale Finally, the LL0 computed candidates are reported as the found duplicates Compute Wavelet Transform LL0 Matrix of overlapping blocks Compute blocks phase-correlation Repeat checking candidates down the pyramid (larger images) Report Duplicates! Duplicates!
Working Examples Original Image Forged Image Output
Working Examples Original Image Forged Image Output
Working Examples Original Image Forged Image Output
Working Examples Original Image Forged Image Output
Let's see another algorithm...
Idea The following algorithm is going to start in a flow that is very similar to the algorithm we've just seen. The following steps are repeated in the same way: Image conversion to Grayscale Calculation of Wavelet transform Dividing the smallest LLk image into all possible bxb blocks
Idea After we divided the smallest LLk image into all possible bxb blocks, we are going to continue in a diferent way: Calculating 9-sized feature vectors for each block Sorting the vectors lexicographically in a matrix Looking for similar adjacent matrix rows Checking similarity of candidates' neighbors Let's look deeper...
Flow Input Image Convert to Grayscale Compute Wavelet Transform We already have the wavelet pyramid of the grayscale-converted image, like we had in the previous algorithm
Flow Input Image Convert to Grayscale Compute Wavelet Transform Matrix of blocks' features Like we did in the previous algorithm, we divide the smallest LL image in the pyramid into all possible (bxb)-sized overlapping blocks
Flow Input Image Convert to Grayscale For each bxb sized block, we calculate the following features: Compute Wavelet Transform Matrix of blocks' features We start with getting 4 sub-blocks of each block B, where the sub-blocks are created by splitting B diagonally 2 times, each time with a different diagonal
Flow Input Image Convert to Grayscale The first feature we calculate is the average of the middle of block B Compute Wavelet Transform Block B Matrix of blocks' features The middle of block B If the size of block B is bxb, then they size of the middle of the block should be (b*i)x(b*i), for some constant I we pre-define to be a value between 0.65 and 0.99 (e.g. 2/3)
Flow Input Image Convert to Grayscale Block B Compute Wavelet Transform Matrix of blocks' features The middle of block B The benefit of taking the center of the block, is that when copying image patches from area to another area, the border pixels are usually smoothed to fit the new position. Taking the center letting us ignore these border pixels This benefit helps us when the block is on the border of the copied patch
Flow Input Image Convert to Grayscale The next 4 features we calculate are the following: Compute Wavelet Transform Matrix of blocks' features We compute the average of the whole block B and of each of the sub-blocks B1-B4
Flow Input Image Convert to Grayscale Compute Wavelet Transform Matrix of blocks' features Now we assign the following values to the feature vector: 2 <-- Average(B1) / Average(B) 3 <-- Average(B2) / Average(B) 4 <-- Average(B3) / Average(B) 5 <-- Average(B4) / Average(B)
Flow Input Image Convert to Grayscale Compute Wavelet Transform Matrix of blocks' features These features help us to deal with Gaussian noise in the image, as if 2 identical blocks differ only because of different Gaussian noise, we still expect in high probability to find similar proportion between respective sub-blocks to the blocks
Flow Input Image Convert to Grayscale The next 4 features we calculate are the following: Compute Wavelet Transform Matrix of blocks' features Now we assign the following values to the feature vector: 2 <-- Average(B1) - Average(B) 3 <-- Average(B2) - Average(B) 4 <-- Average(B3) - Average(B) 5 <-- Average(B4) - Average(B)
Flow Input Image Convert to Grayscale Compute Wavelet Transform Matrix of blocks' features These features help us to deal with the cases when a constant value is added to the intensity level of all the pixels at at least one of the blocks. In such a case, the difference of the averages will be similar, and so these 4 features
Flow Input Image Convert to Grayscale We've computed the whole feature vector of B Compute Wavelet Transform Matrix of blocks' features Now let's normalize all its components To be in the range of integers [0, 255]
Flow Input Image Convert to Grayscale We are now able to assign all these vectors into a matrix M and sort it lexicographically and efficiently using Radix Sort Compute Wavelet Transform 9 columns (the features) Matrix of blocks' features (M-b+1)(N-b+1) rows (one row per block)
Flow Input Image Convert to Grayscale Now, when we have the lexicographically sorted matrix of feature vectors, we check each pair of adjacent rows vi, vi+1 Compute Wavelet Transform Matrix of blocks' features Checking adjacent feature vectors For each such pair of vectors, we compute their euclidean distance. If the computed distance is less than our predefined threshold T1 we continue to another check
Flow Input Image Convert to Grayscale For each pair of rows that passed the Euclidean distance test and found similar enough, we continue to the next step Compute Wavelet Transform Matrix of blocks' features Checking adjacent feature vectors We compute the distance between their corresponding image coordinates. If the computed distance is greater than our predefined threshold T2, then we define this pair as suspected
Flow Input Image Convert to Grayscale Compute Wavelet Transform The first threshold T1 we've just used, helped us to know whether two blocks are similar enough On the other hand, our second threshold T2 we used, helped us to know whether the similar blocks are not too close in the real image, thus we can ignore blocks which are naturally similar Matrix of blocks' features Checking adjacent feature vectors Passed on T1, failed on T2 Passed on T2, Failed on T1 Passed both T1 and T2
Flow Input Image Convert to Grayscale Now, when we have a suspected pair of blocks, we check the similarity of their neighbors, using an efficient way, named Neighbor Shift Compute Wavelet Transform Matrix of blocks' features Checking adjacent feature vectors Comparing neighbors If our suspected blocks are b1 and b2, for each neighbor of b1, we will compute the subtraction of its feature vector and the feature vector of b1 We will do the same for the corresponding neighbor of b2.
Flow Input Image Convert to Grayscale Compute Wavelet Transform Matrix of blocks' features Checking adjacent feature vectors Comparing neighbors Finally, we'll compare the subtraction result we got for b1 with the result we got in b2. If we get the same result, we will mark also the corresponding neighbors as duplicates
Flow Input Image Convert to Grayscale Compute Wavelet Transform Finally, we filter out suspected blocks and their neighbors which cover an area smaller than some threshold we defined in advance. This filtering is done in order to prevent noisy results Matrix of blocks' features Duplicates! Checking adjacent feature vectors Comparing neighbors Report Duplicates!
Working Examples A pretty easy detection... Forged Image Output
Working Examples An example with Gaussian noise... Forged Image Output
Working Examples An example with JPEG Compression... Forged Image Output
Try staying concentrated for just a little more...
Detecting Rotated Duplications Remember this friend? Copy-paste with rotation
Detecting Rotated Duplications We are not going to do many steps diferent than the algorithm we've just seen. One major diference though, is that we're not going to compute any features except for the pixels themselves
Detecting Rotated Duplications As we are going to work with the pixels themselves, and not with any computed features, our algorithm will be more limited: - It will not support detection after Gaussian Noise - It will not support detection after JPEG compression For supporting detection of rotated duplications, we will require the following changes...
Detecting Rotated Duplications Change 1 Block size Here, our overlapping blocks' size is forced to be 3x3. We enumerate the block's pixels according to the following chart:
Detecting Rotated Duplications Change 2 Feature vector size and contents The size of our feature vector is going to be 18 (instead of 9). The first 9 features are the sorted values of all the 9 pixels in the block. The next 9 features is the original location [0-8] of each of the 9 first features (we sorted them, so we want to remember their original location somehow...)
Detecting Rotated Duplications Change 3 Calculation of feature vectors similarity Here, we first compare the first 9 features in the vector and we expect to see at least 7 of 9 such similar values. Else, we ignore this pair of blocks An example for 2 matching blocks The circular order is the same, but it is possible to see the rotation
Detecting Rotated Duplications Change 3 Calculation of feature vectors similarity If we found enough similar features in the first 9 features, we proceed to check whether the next 9 features of the 2 vectors look similar in their circular order. The circular sequence also implies the rotation angle.
Detecting Rotated Duplications Next Steps: After we discovered the similarity of 2 blocks and an angle Θ, we continue with the Neighbor Shif method we used in the regular algorithm, while here we check the similarity of neighbors with the same angle Θ
Working Examples An example with rotation, no noise or compression... Forged Image Output
Failing Examples A Failing example with Gaussian Noise... Forged Image Output
Failing Examples A Failing example with JPEG Compression... Forged Image Output
Bibliography Er. Saiqa Khan Er. Arun Kulkarni An Efficient Method for Detection of Copy-Move Forgery Using Discrete Wavelet Transform (IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 05, 2010, 1801-1806 Vivek Kumar Singh and R.C. Tripathi Fast and Efficient Region Duplication Detection in Digital Images Using Sub-Blocking Method International Journal of Advanced Science and Technology Vol. 35, October, 2011
The End.