Print Biometrics: Recovering Forensic Signatures from Halftone Images

Print Biometrics: Recovering Forensic Signatures from Halftone Images Stephen Pollard, Steven Simske, Guy Adams HPL-2013-1 Keyword(s): document forensics; biometrics; Gabor filters; anti-counterfeiting Abstract: High-resolution imaging is useful for the forensic identification of unique printed regions. In this paper, we adapt an iris biometric approach to provide significant statistical discrimination between any two imaged halftone areas. The method is based on image registration followed by 2D Gabor wavelet encoding. Hamming Distances (HD) are used for matching. The z-scores for the comparison of like and unlike samples indicate improved identification statistics compared to both iris biometrics and previous methods of print authentication. External Posting Date: January 21, 2013 [Fulltext] Internal Posting Date: January 21, 2013 [Fulltext] Approved for External Publication Copyright 2013 Hewlett-Packard Development Company, L.P.

Print Biometrics: Recovering Forensic Signatures from Halftone Images Stephen Pollard stephen.pollard@hp.com Steven Simske steven.simske@hp.com Guy Adams guy.adams@hp.com Abstract High-resolution imaging is useful for the forensic identification of unique printed regions. In this paper, we adapt an iris biometric approach to provide significant statistical discrimination between any two imaged halftone areas. The method is based on image registration followed by 2D Gabor wavelet encoding. Hamming Distances (HD) are used for matching. The z-scores for the comparison of like and unlike samples indicate improved identification statistics compared to both iris biometrics and previous methods of print authentication. 1. Introduction Counterfeiting, warranty fraud, product tampering, smuggling, product diversion and other forms of organized deception are driving the need for improved brand protection. The potential for security printing and imaging to provide an extremely cost-effective forensic level of authentication is well-recognized [1]. In order to perform a forensic authentication of printed material, it is necessary use an image resolution sufficient to expose unique properties of the print that are extremely difficult to reproduce or copy. For the majority of printing technologies, these result naturally from the stochastic nature of the print process itself and its interaction with the underlying structural properties of the substrate material on which it is printed. As such they represent a unique fingerprint that can be used to authenticate individually printed items such as labels, documents, product packaging and monetary notes. Previously, we have shown that it is possible to derive a model based print signature capable of forensic levels of authentication, from the outline of solid printed material such as character glyphs, company logos or the non-payload indicia of popular 2D barcodes such as the DataMatrix or QR Code [2]. Here, we utilize a methodology borrowed from Iris Recognition [3] to derive a more general area based print biometric that can be applied to halftones images and thus greatly extend the utility and applicability of the forensic print authentication. We build on similar work [4] that related only to substrate materials and required an external method to register samples for analysis, which inhibits its adoption in practice. 2. Methodology We follow the methodology first proposed by Daugman in his 1993 paper on iris recognition [3] and expanded on in subsequent publications [5], [6]. This has become the backbone of many government and commercial biometric recognition systems; offering, as it does, the ability to robustly discriminate many billions of iris patterns. There are 3 essential elements to this form of iris recognition: i) registration of the inner and outer iris boundaries; ii) iris encoding by 2-D Gabor wavelet demodulation over normalized image coordinates; iii) testing of statistical independence between encoded feature sequences. Iris recognition differs from our print authentication task in three important regards. First, our images are captured using a specialized contact imaging device DrCID (Dyson relay CMOS Imaging Device) [7] at an almost fixed high resolution (about 7200dpi), whereas iris images are captured using traditional optics and thus vary in size over a small but significant range. Second, parts of the iris are not properly imaged due to either obscuration (by the eyelids or the eye-lashes) or specular reflections of the near IR light sources. Thus encoded features extracted from these regions must be robustly and accurately excluded from the statistical comparison process. Print images, on the other hand, do not generally suffer such imperfections and the whole of the feature sequence can be used. Finally, unique iris features can be encoded across a wide range of spatial frequencies while the random perturbations associated with printed halftones are more limited. In

fact, we find that above a critical frequency range image noise tends to dominate, while below this range the portrayed image content itself is dominant. 2.1 Halftone registration and coding Figure 1. The image of the Rainbow Bridge, left, is rendered as a 600dpi halftone, centre, to cover a 4mm square region which is printed on an HP LaserJet 4345 and captured, right, by our high resolution contact imaging device. Halftone patterns are registered using multi-scale gradient descent [8]. Figure 1 shows an example image, its halftone representation and a registered and de-warped high resolution capture using DrCID. For the multi-scale we normalize band pass filters (difference of successive Gaussian filtered images) to have unit standard deviation in order to minimize the difference between the stylized scaled (12x from 600 to 7200dpi) half-tone images and their printed and captured equivalents as shown in figure 2. Initial approximation is achieved using image moments [9]. Gabor filters use Cartesian coordinates and not the polar coordinates used for iris biometrics. That is ignoring orientation: where is a complex valued bit whose real and imaginary parts are either 1 or 0 depending on the sign of the 2-D integral; is the warped raw image; and are size parameters of the Gaussian envelope; the is the spatial frequency of the filter. There is an additional orientation parameter 0 which is ignored in this formulation for simplicity. Thus, for all samples each wavelet provides two bits towards the phase encoding that describes the random elements of the printed halftone. Samples can be combined spatially over an grid and through the choice of filter control parameters notably frequency and orientation. Figure 3. Shows how details of the printed versions of the same halftone differ due to inherent toner and substrate variation. Note in particular differences in the large isolated dots to the top right 2.2 Statistical testing Figure 2. Shows 3 layers of the difference of Gaussian pyramid used for image registration. Scaled (to 7200dpi) halftone (top) and printed captured image (bottom). Registration recovers the affine transform that gives the local minimum sum squared distance between the band-pass filtered images. It is used to dewarp the captured image prior to 2-D Gabor wavelet encoding. Examples of detail from registered halftone images (in this case an HP Logo) are shown in figure 3. It is clear that while the carrier signal (the HP Logo) is preserved, high frequency modulation due to the random nature of some print processes is also evident. Following Daugman s methodology, the random signal is demodulated to extract its phase information using quadrature 2-D Gabor wavelets. In our case the In Daugman s method statistical independence is tested using the norm of the Boolean XOR operator applied to the complete code vectors. This is represented as a fractional Hamming Distance (HD) by dividing by the code length. (N.B. This is complicated, in the case of iris imaging, by the need to include a mask to represent the valid portions of the code that are free from obscuration and specular reflection.) Provided any given bit in the phase code is equally likely to be 1 or 0 and the phase codes are uncorrelated, the expected value of HD = 0.5. For iris images the lack of correlation for different eyes is true across a wide range of spatial scales, while for halftone images it is clearly the case that at lower frequencies the different halftone images (which convey the same subject matter) are closely correlated. In fact, figure 4 shows the results of an experiment in which we collect 48 images of different printed halftones (all of the Rainbow Bridge) and perform all 1128 possible false comparisons plotting mean HD against the wavelength of a single 2-D Gabor filter sampled over an grid (12.8Kbits phase code). As gets

larger the phase coded vector becomes increasingly correlated and the mean HD drops accordingly. Figure 4. Plot of mean HD (with error bars) and effective code length for 1128 pairs of different halftone prints against the wavelength of the Gabor filter. Figure 5 shows the histogram of HD for pixels, with mean and standard deviation for which the corresponding probability density function (PDF) is binomial having degrees of freedom. This is less than the 12.8K bits in the phase code due to internal correlations amongst the otherwise random trials. This represents the effective code length (ECL) and is also plotted in figure 4 for other wavelengths. Binomial distributions of this size are reasonably approximated by Gaussian for which standard scores (z-scores) are well defined and provide a convenient shorthand for the probability that the null hypothesis is violated by chance. This provides a simple robustness estimate as to whether two phase codes are in fact derived from the same printed halftone. For example an HD of 0.3 or less has a probability of approximately 2x10-241 according to the binomial distribution while it corresponds to a z- score of 32.5 which relates to a cumulative probability on the normal distribution of about 5x10-232 (which represents a modest overestimate of the actual probability as both are infinitesimally small). 3. Results We have printed a number of sequences of identical halftone images small enough (4mm square) to be captured by DrCID. Each print is captured twice (using different DrCID imagers) in order to compare the HD scores of valid matches with those of the binomially distributed statistically independent false matches. For brevity, the results presented here are for the Rainbow Bridge image introduced in figure 1. However, similar results are obtained for a range of image content including the HP Logo shown (detail) in figure 3. Figure 5. Distribution of HDs for all possible comparisons of phase codes for pairs of different halftone images for a single =8 and =0 with M=80 2-D Gabor filter. Figure 6 shows a scatter plot of the HD scores of the valid matches of the 48 printed halftones along with the binomial PDF from figure 5 (shown rotated). As can be seen the HD s are in fact all well separated from the false match population with minimum z-score of 44.79 and an average z-score of 58.62. While our sample size is relatively small these levels of statistical robustness are staggering even in the context of iris recognition (which is able to discriminate amongst many billions of comparisons [6]). Figure 6. Scatter plot of HD scores of valid matches. Figure 7 plots summary statistics as we vary just the wavelength of the Gabor filter but keep all other parameters constant. As well as plotting the min, max and mean z-scores of the population of 48 correct matches we also plot the less informative z- score of equal error rate [10] which marks the point at which false positives and negatives are equally likely. It is clear that the large standard deviation of the relatively small sample of valid matches tends to dominate this error statistic including as it does all forms of imaging and

experimenter error. and halftone coding is that the former is almost circularly symmetric and requires some alignment to compute minimum HD values. Halftone patterns, on the other hand, can be chosen to uniquely resolve the orientation of the code; thus increasing further the disambiguating power of the statistical test. More importantly, printed information can readily be associated with a unique document serial number (e.g. printed in a barcode) which transforms the search task to a statistically more powerful identification check. It is also possible for this information to be encoded in the halftone itself [11]. Figure 7. Plots of the z-scores of valid matches against the wavelength of the Gabor filter. Figure 7 shows that, as discussed previously, there is a relatively small range of wavelength that gives extremely good statistical robustness using a single 2- D Gabor filter. In figure 8 we plot the average z-score as we vary the code length (SQRT of the number of bits) by varying the sampling frequency (M) and the number of independent wavelets that contribute to the phase code at each location. Specifically, we use up to 4 orientations and 3 wavelengths to encode the halftone. Note that increasing the sampling frequency of a single wavelet plateaus out soon after the (i.e. 113 2 bits) as the samples become spatially correlated. Adding multiple orientation provides the biggest improvement (with average z-score increasing all the way to 114 for 339 2 bits) as they are most statistically independent. 4. Discussion In his work on iris recognition, Daugman [3] used a 2Kbit (256 Byte) code to represent each iris. This code size was chosen for pragmatic reasons, and was limited by the resolution of his captured images. Our images are much higher resolution, so even though we are limited to a modest range of applicable filter wavelengths, it is possible to select much longer phase codes with greater discrimination power. However, even if we restrict ourselves to a comparable encoding length (e.g. for a single Gabor filter of wavelength at each sample) the number of degrees of freedom of the corresponding binomial distribution is 1957 ( compared to the figure of 249 reported for iris recognition. It is important to note that this ECL is not the whole story. If we compare the graphs in figures 4 and 7, the extremely high ECL evident at very short wavelengths (e.g. ) appears to be encoding image noise and does not lead to optimal discrimination. Another important distinction between iris coding Figure 8. Plots of the mean z-scores of valid matches against the square root of the length of the encoding. References [1] D. Pizzanelli, The Future of Anti-Counterfeiting, Brand Protection and Security Packaging V, Pira International, Leatherhead, UK, 2009 [2] S. Pollard, S. Simske, G. Adams, Model based print signature profile extraction for forensic analysis of individual text glyphs, IEEE WIFS, 2010. [3] J.G. Daugman, High confidence visual recognition of persons by a test of visual phase information, IEEE PAMI, 15(11), 1993. [4] A.Sharma, L. Subramanian and E. Brewer, PaperSpeckle: microscopic fingerprinting of paper, ACM CCS, 2011. [5] J. Daugman, New methods in iris recognition, IEEE SMC, 37(5), 2007. [6] J. Daugman, Probing the uniqueness and randomness of IrisCodes: results from 200 billion comparisons, Proc. IEEE, 94(11), 2006. [7] G. Adams, Handheld Dyson Relay Lens for Anti- Counterfeiting, IEEE IST, 2010. [8] B. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, IJCAI, 1981. [9] J. Flusser, T. Suk, and B. Zitová, Moments and Moment Invariants in Pattern Recognition, John Wiley, Chichester, UK, 2009. [10] N. Poh and S. Bengio, How do correlation and variance of base-experts affect fusion in biometric authentication tasks?, IEEE Trans. Signal Processing, 53(11), 2005. [11] R. Ulichney, M. Gaubatz, and S. Simske, Encoding information in clustered-dot halftones, IS&T NIP26, 2010.