Low-Complexity Bayer-Pattern Video Compression using Distributed Video Coding Hu Chen, Mingzhe Sun and Eckehard Steinbach Media Technology Group Institute for Communication Networks Technische Universität München, Munich, Germany ABSTRACT Most consumer digital color cameras capture video using a single chip. Single chip cameras do not capture RGB triples for every pixel, but a subsampled version with only one color component per pixel (e.g. Bayer pattern). Conventionally, a full resolution video is constructed from the Bayer pattern by demosaicing before being converted to YUV domain for compression. In order to lower the encoding complexity, we propose in this work a novel color space conversion in the pre-processing step. Compared to the conventional method, the proposed scheme reduces the encoding complexity almost by half. Moreover, it improves the reconstructed video quality by up to 1.5 db in CPSNR, when H.264/AVC is used for compression. To further lower the encoding complexity, we additionally use our Wyner-Ziv video coder for compression. Again, we observe in our experiments a similar gain of the proposed scheme over the conventional one. Keywords: Bayer-pattern video compression, color space conversion, chroma subsampling, H.264/AVC, distributed video coding, Wyner-Ziv video coding. 1. INTRODUCTION Single chip video cameras capture images using color filter arrays. Currently, the most popular color filter pattern is the Bayer pattern. 1 Conventional compression of Bayer-pattern images employs demosaicing, color space conversion, chroma subsampling and H.264/AVC video coding. This approach, however, does not lead to an optimum solution in the context of Bayer-pattern video compression. In the process of demosaicing, the two missing color components are interpolated, thus the number of pixels increases, while no new information is created. In short, redundancy is introduced. If this redundancy cannot be eliminated altogether in the following process of video compression, a coding efficiency loss arises. Apart from this, the computational complexity is higher than necessary, because the encoder has to deal with those redundant pixels. In this work, we propose a novel method using a modified color space conversion for compressing Bayerpattern video sequences. We keep only a limited number of luma and chroma samples and forward them to an H.264/AVC video coder, in order to avoid introducing redundant pixels and wasting computational power for them. Also, we treat the chroma pixels in a way that the reconstructed video quality is improved. Our proposed scheme proves significantly more efficient than the conventional one over the entire bit rate range. Moreover, the computational complexity is reduced by almost 50%. In order to reduce the encoding complexity to an even greater extent, we additionally take advantage of distributed video coding (DVC). The fundamentals of distributed video coding were the distributed source coding theories established in the 1970s by Slepian and Wolf 2 as well as Wyner and Ziv. 3 According to these theories, the compression of an information source undergoes only limited or even no efficiency loss, when the redundancy of the source is analyzed and eliminated at the decoder instead of at the encoder. This implies that Further author information: (Send correspondence to Hu Chen.) Hu Chen: E-mail: chenhu@tum.de, Telephone: +49 89 289 23511 Mingzhe Sun: E-mail: mingzhe.sun@msn.com Eckehard Steinbach: E-mail: Eckehard.Steinbach@tum.de, Telephone: +49 89 289 23504
we can shift motion estimation from the encoder to the decoder for video compression and the rate-distortion performance can be close to that of the conventional video coding like H.264/AVC. 4 In this way, the encoding complexity can be largely reduced, given that the encoder does not have to compute intensively for motion estimation. In our experiments, we have observed that our novel color space conversion still has a gain over the conventional method if we use our Wyner-Ziv video coder instead of the H.264/AVC video coder for compression. In Section II, we describe related work in the area of Bayer-pattern image and video compression as well as practical distributed video coding systems. In Section III, we review the conventional approach for Bayerpattern video compression and present our proposed scheme. Section IV describes how we combine our novel Bayer-pattern compression method and distributed video coding. In Section V, the rate-distortion curves for different methods and different sequences are plotted. Finally, Section VI concludes the paper. 2. STATE OF THE ART 2.1 Bayer-pattern Image and Video Comparession A couple of new schemes for compressing Bayer-pattern still images and video have been proposed in recent years to compete with the conventional approach. Koh et al. address the compression of Bayer-pattern images using JPEG. 5 They conclude that Bayer-pattern images are not suitable for direct compression using JPEG. The scheme they propose is to apply color space transform and hand over the luma and chroma data to a JPEG coder. Besides, three different demosaicing methods, including Bilinear, Cubic and Laplacian, are discussed. In the literature, quite a few other demosaicing methods are addressed and compared with one another. 6 Another typical recent work by N. Zhang and X. Wu addresses a wavelet based scheme. 7 Mallat wavelet packet transform, a reversible lossless spectral-spatial transform that can remove statistical redundancies in both spectral and spatial domains is used to decorrelate color mosaic data. A low-complexity adaptive context-based Golomb-Rice coding technique is proposed to compress the coefficients of Mallat wavelet packet transform. The lossless compression performance of the proposed method on color mosaic images is apparently the best so far among the existing lossless image codecs. However, this scheme deviates quite much from standard image codecs like JPEG-LS or JPEG2000, thus its popularity can be limited, in spite of its high coding efficiency. When it comes to the compression of Bayer-pattern video data, there are two recent publications. In one of them, the green, red and blue pixels in the Bayer pattern are separated into three arrays before being compressed with an MPEG-2 like video coder. 8 This method is said to have poor performance for P-frames because of the severe aliasing generally contained in Bayer pattern. To alleviate the negative effects of the aliasing, a new method 9, 10 is proposed, which compresses Bayer-pattern video data with an H.264 video coder. Moreover, a modified motion compensation scheme is introduced to alleviate the above mentioned aliasing problem. However, both of these two schemes confine themselves to the RGB domain and, due partly to this, outperform the conventional method only in a limited bit rate range. 2.2 Distributed Video Coding Based on the distributed source coding theories 2, 3 proposed in the 1970s, researchers started developing practical systems, particularly for video coding, since the end of last decade. Ramchandran s group at the University of California, Berkeley, developed the system PRISM. 11 The compression of images is performed in a blockwise manner. Syndrome codes are employed. Typically, what is done at the encoder is simply generating syndrome bits for every macroblock. The syndrome bits are then transmitted to the decoder. No motion estimation is needed at the encoder. At the decoder, the motion estimation and the decoding for a macroblock is performed together. Similar to motion estimation at the encoder in the conventional video coding, here comes also a certain motion search range at the decoder. The decoder tries with every candidate block in this range until it finds one that can be decoded successfully using the syndrome bits transmitted to the decoder. This system is of relatively high coding efficiency, but the decoding complexity is rather high. Meanwhile, the Stanford s group led by Girod developed a system for pixel-domain Wyner-Ziv video coding. 12 Rate-compatible turbo codes are used in this system for the compression. Different from the system PRISM, the decoder in the Wyner-Ziv video coding system does not try exhaustively with every candidate in the motion search range. Instead it predicts the image to be decoded by interpolating or extrapolating it from previously
decoded frames. The prediction can make use of the motion information, which is extracted from adjacent frames. The encoding process is also of low complexity (similar to PRISM). Some parity bits for a frame are generated by a rate-compatible turbo coder and stored temporarily in the memory. The decoding of each frame is characterized by a decode and request procedure. The decoder keeps sending requests to the encoder, asking it to transmit more parity bits, until the decoding is successful. This procedure, however, entails a communication channel between the encoder and the decoder. And the compression has to be managed online. At a later time, this system is extended to the DCT-domain Wyner-Ziv video coding system 13 and the Wyner-Ziv residual video coding system. 14 A third typical Wyner-Ziv video coding system is the prototype of layered Wyner-Ziv video coding proposed by Xiong. 15, 16 A video sequence is divided typically into two spatial layers. The base layer, i.e. the subsampled version of every image, is coded by a conventional video coding system like H.264/AVC. The enhancement layer, in other words, the images in its original size, are coded in the Wyner-Ziv manner. Here comes a combination of conventional video coding and Wyner-Ziv video coding. 3. BAYER-PATTERN VIDEO COMPRESSION In this section, we introduce first the conventional method for compressing Bayer-pattern videos and look into its weaknesses. Then we present our novel method and point out why our proposed method can lead to a better rate-distortion performance and at the same time require less computation. 3.1 Conventional Method As illustrated in Figure 2(a), demosaicing or color interpolation is the first step in the conventional way of compressing Bayer-pattern video data. The full-color images are then transformed from the RGB domain to the YUV domain. The components U and V are subsampled by a factor of 2 both horizontally and vertically, as shown in Figure 1, so that it results in a sequence of YUV images in the standard format 4:2:0. After this, an H.264 video coder is employed for the compression. At the decoder, the YUV images in the format 4:2:0 are reconstructed and the components U and V are interpolated to their full size. Finally, the images in the YUV domain are converted back to RGB full-color images. As for the color space transform and its inverse, the two sets of formulas we use in our experiments are taken from Keith Jack s book: 17 Y 0.257 0.504 0.098 U = 0.148 0.291 0.439 R G + 16 128, (1) V 0.439 0.368 0.071 B 128 R 1.164 0 1.596 G = 1.164 0.813 0.391 Y 16 U 128. (2) B 1.164 2.018 0 V 128 The most significant advantage of the conventional approach lies in its simplicity. For all the main techniques, including demosaicing, color space conversion, chroma subsampling and H.264/AVC video coding, we can find some existing or standard methods. However, such a simple combination of different techniques does not lead to an optimum solution but results in an obvious drawback. As a matter of fact, the position for the chroma pixels in the chroma subsampling is not the optimum choice. Nominal chroma sample positions standardized in ITU-T recommendation H.264 are illustrated in Figure 1. The position for chroma pixels U and V is halfway between Y pixels. Alternative chroma sample locations are also supported in the standard. All standardized chroma sample locations have in common that U and V always lie in the same location. Although we have taken this for granted, it in fact results in a loss of coding efficiency when it comes to Bayer-pattern image and video compression. In the next section, we show the different positions we choose for the chroma pixel U and V. Then we explain why our choice is more reasonable.
-- Pixel with Y value -- Pixel with Cr and Cb values Figure 1. Chroma subsampling for YUV 4:2:0 3.2 Proposed Approach B-4:2:2 Our novel method employs also the color space transform and the equations we use for the transform are exactly the same as in the conventional scheme. The novelty of this approach lies mainly in the fact that we calculate chroma pixels at different positions from those in the conventional method. This improves the reconstructed video quality. Of course, we also calculate luma pixels, although we keep only half of them for the sake of computational complexity reduction. Then we convert the YUV data into standard format YUV 4:2:2 before handing the data over to the H.264 video coder. According to the format of the YUV data, we call the proposed method B-4:2:2. Here the letter B implies the context of Bayer-pattern video compression and differentiates the YUV data format in our proposed methods from the H.264 standard. This method is illustrated in Figure 2(b). The color space transform of our proposed method B-4:2:2 is shown in Figure 3. We calculate Y pixels only at the locations of G pixels in the Bayer-pattern images. This is exactly the same as what is proposed by Koh et al. 5 In this case, the number of Y pixels to calculate and compress is halved. That s why the encoding time is reduced almost by 50%. H.264 R G B Y U V R G Y U V (a) Conventional Method 4:2:0 B H.264 (b) Proposed Method B-4:2:2 Figure 2. Comparison of the conventional method, the proposed method B-4:2:2 As for the chroma pixels U and V, we choose carefully the positions where we calculate them. Only at positions of R pixels, we calculate the V values. We can find the reason for this in the set of equations in (2). When we transform the YUV pixels back to RGB values, only Y and V are necessary for the reconstruction of R pixels. In other words, except from Y pixels, V values are the most important for reconstructing R pixels. That s why we calculate V values at the position of R pixels. For the same reason, we calculate U values solely at
the positions of blue pixels. Briefly speaking, our selection of positions for chroma pixels is the optimum for the reconstruction of R and B pixels. The standard chroma subsampling, however, takes the U and V samples always at the same location, thus it cannot be optimum for Bayer-pattern image and video compression. This is the fundamental reason, why our proposed scheme can outperform the conventional one in terms of rate-distortion performance. In the calculation of YUV values, demosaicing or color interpolation is necessary. Because for every position in the Bayer pattern, only one component, either R or G or B, is available. But we need all of the three to transform the data to the YUV domain. Therefore, we have to interpolate the two missing components for every position from adjacent pixels, before we are able to calculate luma and chroma pixels. The demosaicing scheme we use is the bilinear interpolation and the equations are listed in Figure 3. R1 G2 R3 G4 G5 B6 G7 B8 R9 G10 R11 G12 B6 G7 U6 Y7 G13 B14 G15 B16 G10 R11 Y10 V11 147 Г # #Г# 147 Г # # Г# 147& Г Г Г # #Г#Г# Г# 147' Г Г Г Г ГГ ## Figure 3. Novel color space conversion for the proposed method B-4:2:2 The Y pixels we compress are half of the original in quantity and distributed in a quincunx pattern as G pixels in the Bayer pattern. That s why we need to add in a step to convert this quincunx pattern to a rectangular pattern before the H.264 video coder can compress the data. As shown in Figure 4, the Y pixels in the even rows are moved one unit upwards and the resulting complete rows of Y pixels are pushed toward one another and become a rectangular array. The chroma pixels, of course, are also pressed together. The arrays of U and V pixels have the same height as the array of Y pixels but only half of the width. Now the YUV data are ready to be compressed by an H.264 video coder using the 4:2:2 mode. Y U and V YUV 4:2:2 Figure 4. Structure conversion for YUV data in the proposed method B-4:2:2 For the reconstruction at the decoder, we need to convert the rectangular pattern of Y pixels back to the quincunx pattern. Then we interpolate the missing Y pixels as well as U and V pixels before being able to calculate the RGB values in the Bayer pattern. Finally, full color RGB images are generated by demosaicing the Bayer-pattern images. 4. BAYER-PATTERN VIDEO COMPRESSION USING DVC To further reduce the encoder s computational complexity we make use of a distributed video coding system to compress Bayer-pattern video. The system structure is very similar to those in Figure 2. For both the
40 seqpanning: 39 frames, 15 fps, RGB full 39 seqzooming: 39 frames, IBIB, RGB full 39 38 CPSNR (db) 38 37 36 DVC+Bayer422 35 DVC+Con420 H.264+Bayer422 H.264+Con420 34 0 500 1000 1500 2000 Bit Rate (kbps) (a) Panning CPSNR (db) 38 37 36 35 34 CPSNR (db) 33 DVC+Bayer422 DVC+Con420 32 H.264+Bayer422 H.264+Con420 31 0 500 1000 1500 2000 2500 3000 Bit Rate (kbps) 37 36 seqmovingobject: 39 frames, 15 fps, RGB full (c) Moving Object 35 DVC+Bayer422 34 DVC+Con420 H.264+Bayer422 H.264+Con420 33 0 500 1000 1500 2000 2500 Bit Rate (kbps) (b) Zooming Figure 5. Rate-distortion curves for reconstructed RGB full color images conventional scheme and the proposed scheme, the pre- and post-processing are exactly the same, and the only difference is that we substitute the H.264 video codec with our Wyner-Ziv video codec. The Wyner-Ziv video codec in this work is developed on our own based on the pixel-domain codec proposed in. 12 The major advance is the optimization of turbo codes oriented for distributed video coding. 18 5. EXPERIMENTAL RESULTS Our simulation is based on three different Bayer-pattern video sequences which we capture in our laboratory. They represent three different motion modes. The first sequence exhibits significant panning motion, the second one zooming motion and the third one an object over a static background. The H.264 video coder we use in our simulation is the JM 12.2. The GOP structure is set to I-B-I-B...I, which means that we have a B-frame between every two I-frames. For different simulations, we set the YUV format to 4:2:0 for the conventional scheme and 4:2:2 for the proposed one. We keep the default values for other parameters in the configuration file of the JM coder. When it comes to Wyner-Ziv coding, the GOP structure is set to I-WZ-I-WZ...I, which means that a Wyner- Ziv frame is between every two I-frames. Moreover, we assume that the side information at the decoder can be quite accurately generated thus simply take the result of motion compensated prediction for B-frames in H.264 video coding as the side information for decoding Wyner-Ziv frames.
Rate-distortion curves for different methods and different test sequences are plotted in Figure 5. For each sequence we simulate the conventional method (Con420) and the proposed scheme B-4:2:2 (Bayer422) using H.264/AVC video coding (H.264) as well as distributed video coding (DVC). We interpolate the original Bayer-pattern images and the reconstructed ones to full color RGB images and calculate the composite peak-signal-to-noise ratio (CPSNR) between them. Finally, we average the CPSNR for all the images in a sequence. We use equation (3) and (4) to calculate the CPSNR of a video frame. Here, I(i, j, k) is the pixel intensity at location (i, j) of the k-th color component for the reference video frame and I (i, j, k) for the reconstructed video frame. M and N are the height and the width of the frame. CP SNR = 10log 10 255 2 MSE (3) MSE = 1 3MN 3 N k=1 i=1 j=1 M [I(i, j, k) I (i, j, k)] 2 (4) Our experimental results show that for H.264 video coding the proposed scheme B-4:2:2 outperforms the conventional method over the entire bit rate range. At high bit rates, we have a significant gain of more than 1.5 db. At low bit rates, the improvement is less, but still more than 0.5 db. A similar gain can also be observed for distributed video coding at medium and high bitrates, which means that our proposed color space conversion can also contribute to a higher video quality when Wyner-Ziv video coding is applied to Bayer-pattern video data. At low bitrates, however, the proposed method converges to the conventional or even becomes a little bit worse. Another thing worth mentioning is the reduction of computational complexity for the encoder. Our proposed method requires the compression of only one half of the luminance pixels compared to the conventional scheme, that s why the time consumption for video encoding using the JM coder is reduced approximately by a factor of 2. If Wyner-Ziv video coder takes the place of H.264 video coder, the motion estimation is shifted to the decoder, thus the encoding time is reduced to an even greater extent. 6. CONCLUSION In this paper, we propose a novel color space conversion for the compression of Bayer-pattern video sequences. We choose properly the positions for chroma pixels, calculating and compressing the U and V values that are the most important for the reconstruction of the Bayer-pattern image. Moreover, we propose to keep and compress only half of the luma pixels. By doing this, the computational complexity is reduced almost by a factor of two. Furthermore, we combine the proposed method with Wyner-Ziv video coding to build an encoder of very low complexity and a similar gain over the conventional method still exists. ACKNOWLEDGMENTS This work has been financed in part by Taiwan Imaging Tek Corporation and by a grant from Deutsche Telekom Stiftung. REFERENCES [1] Bayer, B. E., Color imaging array. U.S. Patent 3,971,065 (1976). [2] Slepian, D. and Wolf, J., Noiseless coding of correlated information sources, IEEE Transactions on Information Theory 19, 471 480 (July 1973). [3] Wyner, A. D. and Ziv, J., The rate-distortion function for source coding with side information at the decoder, IEEE Transactions on Information Theory 22, 1 10 (January 1976). [4] Wiegand, T., Sullivan, G., and Luthra, A., Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264 ISO/IEC 14496-10 AVC), tech. rep., Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, Geneva, Switzerland (May 2003).
[5] Koh, C. C., Mukherjee, J., and Mitra, S. K., New efficient methods of image compression in digital cameras with color filter array, IEEE Transactions on Consumer Electronics 49(4), 1448 1456 (2003). [6] Gunturk, B. K., Glotzbach, J., Altunbasak, Y., Schafer, R. W., and Mersereau, R. M., Demosaicking: color filter array interpolation, IEEE Signal Process. Mag. 22, 44 54 (Jan. 2005). [7] Zhang, N. and Wu, X. L., Lossless compression of color mosaic images, IEEE Transactions on Image Processing 15(6), 1379 1388 (2006). [8] Gastaldi, F., Koh, C. C., Carli, M., Neri, A., and Mitra, S. K., Compression of videos captured via bayer patterned color filter arrays, in [Proc. 13th European Signal Processing Conference,], (2005). [9] Doutre, C. and Nasiopoulos, P., An efficient compression scheme for coulor filter array video sequences, in [IEEE 8th Workshop on Multimedia Signal Processing], 166 169 (Oct. 2006). [10] Doutre, C., Nasiopoulos, P., and Plataniotis, K. N., H.264-based compression of bayer pattern video sequences, IEEE Transactions on Circuits and Systems for Video Technology 18(6), 725 734 (2008). [11] Puri, R. and Ramchandran, K., Prism: a new robust video coding architecture based on distributed compression principles, in [Allerton Conf. Communication, Control and Computing], (2002). [12] Aaron, A., Zhang, R., and Girod, B., Wyner-ziv coding of motion video, in [Proc. Asilomar Conference on Signals and Systems], (November 2002). [13] Aaron, A., Rane, S., Setton, E., and Girod, B., Transform-domain wyner-ziv codec for video, in [Proc. Visual Communications and Image Processing], (January 2004). [14] Aaron, A., Varodayan, D., and Girod, B., Wyner-ziv residual coding of video, in [Proc. Picture Coding Symposium 2006], (April 2006). [15] Xu, Q. and Xiong, Z., Layered wyner-ziv video coding, IEEE Transactions on Image Processing 15, 3791 3803 (Dec. 2006). [16] Xu, Q. and Xiong, Z., Layered wyner-ziv video coding, in [Proc. SPIE Conference on Visual Communication and Image Processing], (Jan. 2004). [17] Jack, K., [Video Demystified], 978-0-7506-8395-1, Elsevier, 5 ed. (April 2007). [18] Chen, H. and Steinbach, E., Wyner-ziv video coding based on turbo codes exploiting perfect knowledge of parity bits, in [Proc. IEEE International Conference on Multimedia & Expo, ICME 07], (July 2007).