Introduction to Video Forgery Detection: Part I Detecting Forgery From Static-Scene Video Based on Inconsistency in Noise Level Functions IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010
Introduction HOW can one be assured of the authenticity of a digital image? For example, when digital photographs are used as testimony in courts of law, how is it possible to distinguish between genuine and falsified evidence? Given the recent progress and development of digital editing techniques that can be used to synthesize realistic images, it is difficult to guarantee the authenticity of digital photographs.
In the past, digital watermarking was the main technology used to ensure authenticity (e.g., preventing illegal copying of images from the Internet). However, it is impractical to embed digital watermarks in all images, and therefore, digital watermarking is limited in its ability to ensure authenticity.
In response to the limitations of watermarking, a number of forgery detecting techniques have been developed that exploit the correlation and the inconsistencies in forged images: Johnson and Farid used inconsistencies in lighting [1] and chromatic aberration (deviation) [2]. Lin et al. estimated a camera response function (CRF)and verified its uniformity across an image [3].
Lukáˇs et al. extracted fixed-pattern noise from an image and compared it with a reference pattern [4]. Fridrich et al. computed the correlation between segments in an image and detected cloned regions [5]. Ye et al. estimated a JPEG quantization table and evaluated its consistency [6].
[1] M. K. Johnson and H. Farid, Exposing digital forgeries by detecting inconsistencies in lighting, in Proc. Workshop on Multimedia and Security, 2005, pp. 1 10. [2] M. Johnson and H. Farid, Exposing digital forgeries through chromatic aberration, in Proc. Int. Multimedia Conf., 2006, pp. 48 55. [3] Z. Lin, R.Wang, X. Tang, and H.-Y. Shum, Detecting doctored images using camera response normality and consistency, in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, 2005, vol. 1, pp. 1087 1092.
[4] J. Lukáˇs, J. Fridrich, and M. Goljan, Detecting digital image forgeries using sensor pattern noise, in Proc. Society of Photo-Optical Instrumentation Engineers Conf., 2006, vol. 6072, pp. 362 372. [5] J. Fridrich, D. Soukal, and J. Lukáˇs, Detection of copy-move forgery in digital images, in Proc. Digital Forensic Research Workshop, Cleveland, OH, 2003. [6] S. Ye, Q. Sun, and E.-C. Chang, Detecting digital image forgeries by measuring inconsistencies of blocking artifact, in Proc. IEEE Int. Conf. Multimedia and Expo, 2007, pp. 12 15.
To provide some context, tampering methods for videos containing static scenes recorded on a surveillance camera can be classified into two approaches: 1) Intra-video forgery replacing regions or frames with duplicates from the same video sequence to hide unfavorable objects in a scene by overwriting these with the background from other segments in the same video.
2) Inter-video forgery clipping objects from other images or video segments and superimposing them on desired regions in the video.
Wang and Farid studied a method for detecting intra-video forgery [7]. Since duplication yields high correlation between original regions and cloned ones, detecting unnaturally high coherence is useful for discovering copy paste tampering. [7] W. Wang and H. Farid, Exposing digital forgeries in video by detecting duplication, in Proc. Workshop on Multimedia & Security Int. Multimedia Conf., New York, NY, 2007, pp. 35 42.
However, their proposed method has a serious limitation in that it can only detect copy paste tampering from the same video sequence. That is, it cannot be used to detect superimposition caused by inserting objects from other videos.
In contrast, this work proposed a method that can detect superimposition generated from video not contained in the original sequence. Specifically, the method uses noise inconsistencies between the original video and superimposed regions to detect forgeries. The nature of photon shot noise mixed into image signals, is exploited, which depends on the camera model and recording parameters.
Photon shot noise results from the quantum nature of photons, where the variance of the number of photons coming into a camera is strongly correlated to the mean following a Poisson distribution. Therefore, this correlation between the variance and the mean (characteristic of photon shot noise) can be used as a powerful clue to detect inconsistencies in forged videos.
A CCD camera converts photons into electrons and finally into bits; therefore, the relationship between the variance and the mean of the number of photons is converted into that between the variance and the mean of the observed pixel value.
This relationship is formulated as the noise level function (NLF) by Liu et al. [8]. The NLF depends on such parameters as inherent parameters of the camera and recording parameters. Consequently, by comparing the relationships of the pixel values in a video clip, we can detect forged regions clipped from another video. [8] C. Liu, W. Freeman, R. Szeliski, and S. B. Kang, Noise estimation from a single image, in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), 2006, vol. 1, pp. 901 908.
Given an input video that contains some forged regions, we first analyze noise characteristics at each pixel. Fig. 1 shows a diagram of the noise characteristics for the forged region and the un-forged region. The solid (e.g., forged region in the video) and the dashed (e.g., un-forged region) lines are the NLFs of the two distributions.
Each dot in the figure represents a noise characteristic, i.e., the variance versus the mean of pixel values, computed for each pixel. Once we obtain the per-pixel noise characteristics, the NLFs are fitted to the distribution using maximum a posteriori (MAP) estimation. Likelihood is defined as the chi-square distribution to deal with the fluctuation in the noise characteristics resulting from a limited amount of sampled data.
The posterior probability of forgery (at every pixel) and the parameters of the NLF are simultaneously estimated using the expectation maximization (EM) algorithm. And an NLF is represented as a linear combination of its basis functions in a similar manner to Liu et al. [8]; moreover, a number of NLFs corresponding to various CRFs and noise parameters are synthesized and a set of linear basis functions is obtained via the principal components analysis (PCA).
II. RELATED WORK A. Forgery Detection in Images Image tampering methods can be classified into two approaches: 1) intra-image forgery replacing regions with others in the same image; 2) inter-image forgery superimposing regions clipped from other images.
Fridrich et al. were the first to attempt to detect forgeries in images [5]. This method targets intra-image forgery that usually yields an unnaturally high correlation between duplicated regions. They introduced a detection method based on robust block matching, which was carried out by using discrete cosine transform (DCT) coefficients in order to deal with lossy JPEG compression.
Subsequent approaches have targeted interimage forgeries by verifying the uniformity of certain characteristics in an image to detect forgery. Johnson and Farid developed a method based on optical clues [1]. They estimated the distribution of light sources illuminating each object by using observed brightness and calculated surface normals along the object s occluding contours, and then investigated the consistency of the estimated illumination distributions.
Johnson and Farid also developed a method for detecting forgeries on the basis of lateral chromatic aberration [2], i.e., a spatial shift of light passing through the optical system due to differing refraction between wavelengths. Global model parameters that determine the displacement of lateral chromatic aberration at each pixel were estimated, and the degree of tampering was evaluated by calculating the average angular error between the displacement vector determined by the global model parameters and the displacement vector computed locally.
Lin et al. developed a method to examine the consistency in camera response functions estimated on the basis of intensity change along edges [3]. The brightness of an edge should be a linear combination of those from the surfaces on either side, a nonlinear camera response will skew the linearity of the mixture of brightness (so it is Image content dependent).
This approach estimates the nonlinear inverse response functions that convert a nonlinear relationship of observed pixel values on the edge into a linear relationship. If the function estimated from an edge does not conform to the rest of the image, the edge is marked as a sign of tampering.
Ye et al. developed a method to detect inconsistencies in an image on the basis of a blocking artifact measure for image compression [6]. If blocks compressed with different quantization tables are combined in an image, the blocking artifact measure of the forged block is much larger than that of an authentic block. They estimated the quantization table from the histogram of DCT coefficients and evaluated the blocking artifact measure of each block.
Lukáˇs et al. developed a method to verify the pattern of the noise distribution [4]. Due to the sensor imperfections developed during the manufacturing process, a CCD camera contains pixels with differing sensitivity to light. This spatial variation of sensitivity is temporally fixed and known as fixed pattern noise. Since this non-uniformity is inherent in a camera, one can exploit it as a type of fingerprint.
They determined the reference noise pattern of a camera by averaging the noise extracted from several images. Given an image, they extracted fixed pattern noise from the image using a smoothing filter and identified the camera that took the image. They also developed a method for detecting forgeries in an image using the same approach [4].
B. Forgery Detection in Videos To detect video forgery, one may think of applying an image forgery detection method to each frame of a given video sequence. However, some types of forgery cannot be detected in this manner due to a lack of consideration of relationship between the frames. For instance, simple duplication is undetectable since each frame appears to be authentic if evaluated independently.
Compared to the image forensic techniques mentioned above, only a few techniques have been developed for videos, but this field of research is certainly growing. Similar to those for an image, forgery detection techniques for a video are classified into two types: inter-video and intra-video approaches.
As mentioned earlier, detection of replacement and duplication in videos has been studied by Wang and Farid [7]. They have also developed an inconsistencybased detection method that checks the consistency of de-interlacing parameters used to convert an interlaced video into a noninterlaced form [9].
Since interlaced videos have half the vertical resolution of the original video, the deinterlacing process fully exploits insertion, duplication, and interpolation of frames to create a full-resolution video. [9] W. Wang and H. Farid, Exposing digital forgeries in interlaced and de-interlaced video, IEEE Trans. Inf. Forensics Security, vol. 2, no. 3, pp. 438 449, Sep. 2007.
In their method, parameters in the interpolation and the posterior probability of forgery are estimated simultaneously by using the EM algorithm. They also suggested that the motion between fields of a frame is closely related across fields in interlaced videos. Evaluating the interference to this relationship caused by tampering allows their system to detect forgeries in an interlaced video.
The correlation of noise in a video has also been explored to detect forgery. Hsu et al. developed a method on the basis of noise characteristics extracted by noise reduction [10]. They exploited block-level correlation of noise residual as the characteristics of a video. [10] C.-C. Hsu, T.-Y. Hung, C.-W. Lin, and C.-T. Hsu, Video forgery detection using correlation of noise residue, in Proc. IEEE 10th Workshop Multimedia Signal Processing, 2008, pp. 170 174.
If a region is impainted by another region in the same video, the correlation between the regions takes an unnaturally high value. In contrast, noise residuals of the synthesized textured region from another video exhibit low coherence with the noise residual of other regions.
However, this approach greatly depends on the noise reduction method. When the noise intensities of the original and tampered regions are significantly different, it fails to reduce the noise accurately and can miss some forgeries because of the calculation error of noise residual.
C. Effective use of Noise in Digital Data Since the early period of digital cameras, various reports have been given on the study of noise in signal processing. The main purpose of this field of research is to remove noise in images and videos. On the other hand, recently some researchers have interestingly attempted to effectively use noise rather than try to remove it from images and videos.
Matsushita and Lin exploited the distribution of temporal noise intensity at each pixel to estimate camera response functions (CRFs) [11]. They made use of the fact that the distribution of noise is symmetric about zero in nature, but is skewed by nonlinear CRFs. They estimated the inverse CRF that converts the distribution of the noise calculated from the observed pixel values into symmetric in the irradiance domain.
Takamatsu et al. exploited the characteristics of noise to estimate the CRFs as well [12]. They focused on the non-affinity relationship between the observed pixel value and noise variance, not the shape of the distribution of the noise.
[11] Y. Matsushita and S. Lin, Radiometric calibration from noise distributions, in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), 2007, pp. 1 8. [12] J. Takamatsu, Y. Matsushita, and K. Ikeuchi, Estimating radiometric response functions from image noise variance, in Proc. Eur. Conf. Computer Vision, 2008 (ECCV), pp. 623 637.
They also developed another method to estimate CRFs on the basis of probabilistic intensity similarity [13]. The probabilistic intensity similarity is the similarity measure of the observed pixel values and represents the likelihood that two pixel values originated from the same scene radiance [14].
[13] J. Takamatsu, Y. Matsushita, and K. Ikeuchi, Estimating camera response functions using probabilistic intensity similarity, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Jun. 23 28, 2008, (CVPR)pp. 1 8. [14] Y. Matsushita and S. Lin, A probabilistic intensity similarity measure based on noise distributions, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Jun. 17 22, 2007 (CVPR), pp. 1 8.