High Dynamic Range Image Compression of Color Filter Array Data for the Digital Camera Pipeline. Dohyoung Lee

Size: px

Start display at page:

Download "High Dynamic Range Image Compression of Color Filter Array Data for the Digital Camera Pipeline. Dohyoung Lee"

Reginald Hood
6 years ago
Views:

1 High Dynamic Range Image Compression of Color Filter Array Data for the Digital Camera Pipeline by Dohyoung Lee A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto Copyright c 2011 by Dohyoung Lee

2 Abstract High Dynamic Range Image Compression of Color Filter Array Data for the Digital Camera Pipeline Dohyoung Lee Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2011 Typical consumer digital cameras capture the scene by generating a mosaic-like grayscale image, known as a color filter array (CFA) image. One obvious challenge in digital photography is the storage of image, which requires the development of an efficient compression solution. This issue has become more significant due to a growing demand for high dynamic range (HDR) imaging technology, which requires increased bandwidth to allow realistic presentation of visual scene. This thesis proposes two digital camera pipelines, efficiently encoding CFA image data represented in HDR format. Firstly, a lossless compression scheme exploiting a predictive coding followed by a JPEG XR encoding module is introduced. It achieves efficient data reduction without loss of quality. Secondly, a lossy compression scheme that consists of a series of processing operations and a JPEG XR encoding module is introduced. Performance evaluation indicates that the proposed method delivers high quality images at low computational costs. ii

3 Contents 1 INTRODUCTION Motivation Key Challenges Thesis Scope and Contributions Lossless HDR CFA compression scheme for the digital camera pipeline Lossy HDR CFA compression scheme for the digital camera pipeline Thesis Organization BACKGROUND Digital Camera Design Digital Camera Architecture Image Processing Pipeline Color Demosaicking High Dynamic Range Imaging in Single Sensor Digital Cameras Image Compression Common Image Compression Techniques Image Compression Standards : JPEG family Prior arts on Bayer CFA compression Image Quality Assessment Metrics Non-perceptual Quality Metrics iii

4 2.3.2 Perceptual Quality Metrics Lossless CFA Compression using Prediction Introduction Proposed Algorithm Deinterleaving Bayer CFA Green sub-image prediction Non-Green sub-image prediction Compression of prediction error Experimental Results Primary color channel and color difference channel Green channel interpolation method Dissimilarity measure in template matching Prediction algorithm Chapter Summary Lossy CFA Compression using Colorspace Conversion Introduction Proposed Algorithm Interpolation of missing green components Interpolation of color difference components Correction of green and color difference components YCoCg color conversion Structure conversion Experimental Results Edge Sensing Mechanism (ESM) and Compression Color Space and Compression Proposed Pipeline and Conventional Pipelines iv

5 4.4 Chapter Summary Conclusions and Future Work Conclusions Future Work Potential extensions on the proposed systems General future work Bibliography 80 v

6 List of Tables 3.1 Lossless bitrate of proposed compression scheme with primary channel and color difference channel Lossless bitrate of proposed compression scheme with various G interpolation schemes Lossless bitrate of proposed compression scheme with SAD and SSE dissimilarity metrics Lossless bitrate of various CFA compression schemes (direct CFA encoding schemes) Lossless bitrate of various CFA compression schemes (predictive coding schemes) Number of operations per pixel required for the proposed scheme Encoding time for different pipelines and codecs vi

7 List of Figures 2.1 Typical optical path for single sensor cameras Bayer CFA arrangement Conventional Image Processing Pipeline Alternative Image Processing Pipeline Typical images with limited dynamic range and a HDR image HDR image acquisition by capturing multiple images HDR image acquisition by estimation Image pipeline design with raw CFA image storage Image pipeline design exploiting HDR contents compression Block diagram of JPEG XR encoding process CFA deinterleave process CFA deinterleave process : G subimage Overview of the proposed lossless CFA compression pipeline Bayer CFA deinterleave method Current pixel to be predicted and its 4 closest neighborhood pixels in a quincunx G sub-image Template of G sub-image centered at (i,j). o indicates pixels in the template region Pixel values required for the prediction of G pixel at (i,j) Weight computation for the prediction of G pixel at (i,j) vii

8 3.7 Current pixel to be predicted and its closest neighborhood pixels in a red difference (dr) sub-image Template of red difference (dr) sub-image centered at (i,j). o indicates pixels in the template region Weight computation for the prediction of red difference (dr) pixel at (i,j) Test digital color images (referred to as image 1 to image 31, from left to right and top to bottom) D autocorrelation graphs for the image 4 in database (a) original images, R and B, (b) color difference images, dr and db Entropy of sample images from the database with various prediction methods Overview of the proposed lossy HDR CFA image compression pipeline Indexing of the samples within a 5x5 window of Bayer CFA Two versions of color space conversion Rate-distortion curves of proposed pipelines with different ESMs for various quality metrics Rate-distortion curves of proposed pipelines with different color spaces for various quality metrics Rate-distortion curves of the proposed pipelines and 4 other pipelines for various image quality metrics Full color images obtained from four examined IPPs with JPEG XR codec at bit rate between 1 and 2 bpp. First 4 images are sub-regions of the image 18, next 4 images are from the image 21, and last 4 images are from the image 1 in the database viii

9 Acronyms ALCM Activity level classification model ASIC Application specific integrated circuit BPP Bit per pixel CCD Charge coupled device CDM Color demosaicking CFA Color filter array CMBP Context matching based prediction CMOS Complementary metal oxide semiconductor DCT Discrete cosine transform DSP Digital signal processor DWT Discrete wavelet transform ESM Edge-sensing mechanism EXIF Exchangeable image file HDR High dynamic range HDRI High dynamic range imaging ix

10 HVS Human visual system JPEG Joint photography experts group JPEG XR JPEG extended range LBT Lapped bi-orthogonal transform LDR Low dynamic range MOS Mean opinion score MSE Mean square error PSNR Peak to signal noise ratio RCT Reversible color transform SAD Sum of absolute differences SM Spectral model SSE Sum of square errors SSIM Structural similarity index UHDTV Ultrahigh-definition television VDP Visual difference predictor x

11 Chapter 1 INTRODUCTION 1.1 Motivation Over the past years, advancement in color imaging technology has reduced complexity, size, and cost of color devices, such as digital cameras, monitors, and printers, allowing more convenient access to them in various environments. One of the rapidly evolving fields in color imaging technology is digital photography which has gained significant popularity in recent years. In order to create an image of a scene, digital cameras use a sensor, an array of light-sensitive spots, called photosites, which records the total intensity of the light that touches its surface. Commonly used image sensors have monochromatic characteristic which cannot record color information. Among existing solutions, a singlesensor imaging technology, which captures visual scenes in color using a monochrome sensor in conjunction with a color filter array (CFA), offers tradeoffs among cost, performance, and complexity. Thus, the single-sensor solution is widely adopted to typical consumer grades digital cameras. Due to the advancement and proliferation of emerging digital camera based applications and commercial devices, such as multimedia mobile phones, sensor networks, and personal digital assistants(pda), the demand for singlesensor imaging and digital camera image processing solutions will grow considerably in 1

12 Chapter 1. INTRODUCTION 2 the next decade. [1] Digital cameras embed a series of signal processing operations in their processors to produce digital images, which is called an image processing pipeline. The three main components of the image processing pipeline include image acquisition, image transmission/storage, and image visualization. Since the pipeline design is a key element to determine image quality and computational efficiency of digital cameras, a significant amount of research efforts has been devoted to it. At the first stage of pipeline, singlesensor cameras produce a mosaic-like image formed by intermixing samples from RGB channels, also called a raw CFA image. The CFA image differs from a full color RGB image, as it contains only one color component at each pixel. In order to convert the CFA image to a full color RGB image, two missing components of each pixel are estimated by demosaicking operation. Then various image processing techniques are applied to the full color demosaicked image to enhance image quality. Finally the enhanced image is compressed to reduce memory consumption. Recently, this demosaicking-first approach is found to be sub-optimal in terms of compression efficiency. An alternative solution, which performs compression prior to demosaicking, is proposed and it raised an issue specific to single-sensor cameras, the compression of a mosaic-like CFA image. One of the most challenging and rapidly emerging issues for digital cameras is supporting high dynamic range imaging (HDRI) technology. HDRI uses increased number of bits to represent each pixel of digital images than conventional systems and thus provides increased tonal resolution. As a result, it achieves more realistic representation of the visual scene with smoother gradation. It is foreseeable that the imaging industry will inevitably transit to HDRI technology in near future. This change will affect all stages of the image processing pipeline of digital cameras from data acquisition to visualization. Especially, increase in dynamic range leads to increased number of bits in image data. For example, many digital cameras started to produce the CFA image in high bit format, typically between 10 and 16 bit per pixel (replacing conventional 8 bit). Therefore, it

13 Chapter 1. INTRODUCTION 3 has become highly important to develop efficient compression techniques for HDR CFA images to use expensive storage effectively. The purpose of this thesis is to propose an efficient compression scheme for singlesensor digital cameras to encode CFA images given in HDR format. The proposed system is designed to minimize the amount of memory required to store HDR CFA data, while maintaining computational requirements low due to limited resources in digital cameras. The development of an efficient HDR CFA compression scheme will ultimately enables ordinary users to experience promising HDRI technology in consumer level cameras and allows considerable improvement of the visual realism in digital visual contents. 1.2 Key Challenges In a design for the efficient CFA image compression scheme, a number of engineering decisions is need to be made. This section lists general challenges and considerations associated with it for digital cameras. Main concerns are cost, image quality, operational/power efficiency, and portability. [2] Dynamic Range (Image Precision) : Recently high dynamic range imaging (HDRI) technologies have gained significant popularity in various fields, such as movie, digital photography and computer graphics industries. Research trend in digital photography is shifting from enhancement of spatial resolution to tonal resolution and significant emphasis is given to incorporation of HDRI technologies into consumer level digital cameras. HDRI addresses limitations of traditional low dynamic range imaging (LDRI) by providing wider range of luminance information to achieve more precise representation of real visual scenes. Consequently, HDRI technology can represent entire dynamic range of luminance that human can perceive. [3] In order to support HDRI in digital cameras, each stage in image processing pipeline (IPP), from image acquisition to visualization, should be updated

14 Chapter 1. INTRODUCTION 4 to facilitate image data in HDR format. Especially, the proposed CFA compression scheme should retain the high bit-depth of given CFA data. Cost/Operational Efficiency : Production cost and operational efficiency are two closely related consideration factors in IPP design. The proposed scheme should efficiently manage expensive camera on-board memories and other computational resources. Embedding sophisticated algorithms in resource constrained system is a challenging task due to hardware limitation and cost. The ideal solution exploits low complexity techniques in on-board processors and offloads high complexity algorithms to end devices, where sufficient processing power is provided. For optimum computational efficiency, computing hardware on a camera can be explicitly designed to implement a given processing algorithm in the form of an application specific integrated circuit (ASIC). However, development of new ASIC is an expensive process requiring relatively high usage volumes to make this approach financially attractive. Once constructed, the image processing chain in the ASIC cannot be changed. On the other hands, the digital signal processor (DSP) provides a significant degree of freedom over the ASIC block as the DSP is a programmable device. In addition, the DSP is advantageous in terms of production cost. In terms of processing speed, the ASIC is better choice than the DSP, as the ASIC is dedicated for a given task, thus more optimized. Image Quality : The proposed compression scheme should be able to reproduce color in great fidelity and high accuracy. The quality of final images is affected by selection of processing algorithms. There are two categories of approaches depending on the nature of compression: lossless and lossy. The lossless compression algorithm does not allow the loss of image quality and the regenerated image after decompression is an exact replica of the original image. The lossless compression algorithm is applicable for areas including medical imaging, image archiving

15 Chapter 1. INTRODUCTION 5 system, cultural-heritage, and surveillance system. On the other hands, a lossy compression algorithm aims to achieve high compression ratio than lossless one by allowing marginal image distortion. Thus, part of original data can be lost with a lossy approach but it should maintain good perceptual quality of the reconstructed image. 1.3 Thesis Scope and Contributions This thesis focuses on implementing color filter array (CFA) compression schemes for the digital camera pipeline that efficiently encode CFA images given in high dynamic range format (high bit-depth). Although there exist various other CFA patterns, we only focus on the Bayer CFA since it is most commonly used one in the industry due to its optimal spatial arrangement [4]. Therefore, hereafter, if a CFA image is mentioned, a Bayer CFA image is referred to unless specifically stated otherwise. Two different types of compression schemes are proposed in this thesis. The first proposed solution encodes HDR CFA data without loss of quality, referred as a lossless scheme. The other solution is a lossy scheme that compresses HDR CFA image with marginal quality loss to enhance compression efficiency Lossless HDR CFA compression scheme for the digital camera pipeline The first contribution of this thesis is proposing a lossless Bayer CFA image compression scheme capable of handling HDR representation. The proposed pipeline consists of a series of pre-processing operations followed by a JPEG XR encoding module. A deinterleaving step separates the CFA image to sub-images of a single color channel, and each sub-image is processed by a proposed weighted template matching based prediction. The utilized JPEG XR codec allows the compression of HDR data at low computational

16 Chapter 1. INTRODUCTION 6 cost. Extensive experimentation is performed using sample test HDR images to validate performance and the proposed pipeline outperforms existing lossless CFA compression solutions in terms of compression efficiency Lossy HDR CFA compression scheme for the digital camera pipeline The second contribution of this thesis is proposing a lossy Bayer CFA image compression scheme capable of handling HDR representation. The proposed pipeline consists of a series of pre-processing steps followed by a JPEG XR encoding module. A 8-directional edge sensing mechanism and an inter-channel correlator are used to reduce estimation errors and preserve edge related information in missing color component estimation. The utilized YCoCg color space allows for a simplified pipeline implementation and deliverance of high quality results. The proposed solution is tested using sample HDR images and performance is validated using three image quality assessment metrics, including composite peak-signal-to-noise ratio (CPSNR), multi-scale structural similarity index (MSSIM), and HDR visual difference predictor (HDR-VDP). Extensive experimentation reported in this thesis indicates that the proposed lossy compression solution is suitable for limited resource environments due to low complexity and high performance. 1.4 Thesis Organization The remainder of this thesis is organized as follows. Chapter 2 provides necessary background information and review of previous works related to single-sensor imaging technology, HDRI technology, image compression techniques, and image quality assessment metrics. Our proposed CFA compression schemes are presented in Chapters 3 and 4. In Chapter 5, we conclude this thesis and discuss some limitations and practical issues to be considered in future researches.

17 Chapter 2 BACKGROUND In this chapter, we provide technical concepts and existing research activities on digital camera processing pipeline, high dynamic range imaging, fundamentals of image compression, and image quality assessment metrics. 2.1 Digital Camera Design Digital Camera Architecture In digital cameras, the color information of an real-world scene is acquired through an image sensor, usually a charge-coupled device (CCD) [5] or complementary metal oxide semiconductor (CMOS) sensor [6] in the format of superimposition of three primary colors, red(r), green(g), and blue(b). Commonly used image sensors are monochromatic devices that sense the light within limited frequency range, and therefore cannot acquire color information directly. Due to the monochromatic nature of the image sensor, digital camera manufacturers implement several solutions to capture the visual scene in color. The most straightforward approach to capture a digital image is to use three separate sensors to capture RGB light. A beam splitter is used to project the light through three color filters, and towards three sensors. However, a sensor is one of the most expensive 7

18 Chapter 2. BACKGROUND 8 components of a digital camera, usually taking upto 25 percent of the total production costs [7], and thus, the three-sensor method is only used for high-end professional cameras. The cost effective alternative to the three-sensor approach is a single-sensor imaging technology. To reduce cost and complexity, most of digital cameras are equipped with a sensor coupled with a color filter array (CFA). A CFA is a mosaic of color filters placed on the top of conventional CCD/CMOS image sensor to filter out two of the R, G, and B components in each pixel position. Consequently, a digital image acquired by CFA, called a raw CFA image, stores only a single measurement of RGB in each pixel and missing components are regenerated through a color demosaicking (CDM) process, also known as a CFA interpolation. [1] Typical optical path for a single sensor camera is shown in Figure 2.1. Figure 2.1: Typical optical path for single sensor cameras Figure 2.2: Bayer CFA arrangement A number of RGB CFAs with the various layout of color filters in the array are used in practice. Since the CFA is placed in the early stage in the image acquisition pipeline, it determines the maximal resolution, image quality, and computational efficiencies achievable by subsequent processing pipeline. The most common CFA design is a Bayer pattern [8], contains two green, one blue, and one red samples arranged in a 2x2 block, as shown in Figure 2.2. The green component in the Bayer CFA is measured at

19 Chapter 2. BACKGROUND 9 double sampling rate since human visual system (HVS) is more sensitive to the green portion of the spectrum Image Processing Pipeline Digital cameras embed a series of signal processing operations in their processors to produce images, which is called an image processing pipeline (IPP). An image pipeline design plays a key role in digital camera systems for generating high quality images. Although the sequence of operations differs from manufacturer to manufacturer, a general image pipeline consists of a series of processing functions as shown in Figure 2.3. In typical digital camera pipeline architecture, the CDM is one of the first operations performed after CFA image acquisition. The CDM is a mandatory process that restores the color information from the original CFA image. Then, the demosaicked RGB images are modified by adjusting white balance, and performing color and gamma correction to match the colors of the input scene when rendered on a display device. White balancing removes the color tint of an image to make white objects appear white. Color correction transforms the CFA sensor color space to a standard RGB space, such as linear srgb [9]. Gamma correction adjusts the image intensity to compensate the non-linearity of CRT or LCD display. Once adjustment and correction processes are completed, the enhanced image is compressed for storage or transmission. Typical cameras commonly store the image in a compressed format using the Joint Photography Experts Group (JPEG) standard [10]. The exchangeable image file (EXIF) format [11] allows storage of additional metadata information related to the camera and the image characteristic along with compressed image data using JPEG. [1] A drawback of conventional IPP in Figure 2.3 is that CDM does not increase the information content of the original image, but introduces redundancies by estimating missing pixels, consuming substantial storages of the camera. The objective of image compression is to reduce redundancies in image data, and therefore, compression of demosaicked images can be counterproductive.

20 Chapter 2. BACKGROUND 10 To avoid such issue, an alternative IPP in Figure 2.4, which reverses the CDM and compression stages, can be utilized. [12] Figure 2.3: Conventional Image Processing Pipeline Figure 2.4: Alternative Image Processing Pipeline In the alternative scheme, the CFA image is compressed prior to converting it to a full color image. The main advantage of the alternative IPP is that a number of CFA samples is only 1/3 of that in the full color image, thus requiring less computational resource and storage capacity. In addition, this approach allows CDM and other enhancement/correction operations to be performed in the end device, rather than inside the camera. Offloading of the CDM from the camera to the end device, such as a personal computer (PC), allows utilization of a highly sophisticated CDM algorithm to produce a more visually pleasing color output, because computational cost is less of issue in this case. Moreover, it simplifies the hardware architecture and reduces cost, processing delay and power consumption of digital cameras. Experimental results from various literatures [13, 14, 15] suggests that the alternative IPP can generate similar or higher quality images than the conventional chain under low compression ratios.

21 Chapter 2. BACKGROUND Color Demosaicking Color demosaicking (CDM) [16, 17] is a crucial operation in the single-sensor imaging pipeline to restore the color image from the raw mosaic sensor data. The image acquired through CFA appears as an interleaved mosaic similarly to a grayscale image and missing components in the CFA image are reconstructed through CDM in order to produce a complete RGB image. Thus, the objective of CDM is to transform a K 1 K 2 grayscale image z : Z 2 Z to a K 1 K 2 full color image x : Z 2 Z 3. The CDM process can be modeled as an interpolation function f ϕ, which defines a relationship between output image x and input CFA image z as follows: x = f ϕ (Λ, Ψ, ζ, z) Λ : ESM(edge sensing mechanism) operator Ψ : SM(spectral model) operator (2.1) ζ : local neighborhood area z : CFA image The edge-sensing mechanism (ESM) operator Λ = {w (i,j) ; (i, j) ζ} generates edgesensing weights w (i,j) of each individual neighborhood pixel on the basis of edge direction so that the structural information of the input image z is preserved in missing information estimation. Non-data adaptive ESM operators use simple linear averaging models and fixed weights for all surrounding pixels resulting in blurred edges. On the other hands, data adaptive ESM operators produces better quality full-color images with enhanced fine details by adjusting edge-sensing weight factors of surrounding pixels. The spectral model (SM) operator Ψ uses correlation between color channels to eliminate spectral artifacts in the output image x. There are two fundamental inter-channel correlation models: the color ratio rule [18] and the color difference rule [19]. The first model employs a property that ratios of two color channels are constant over local regions. It assumes that within a given object, the ratio R/G or B/G are locally stationary.

22 Chapter 2. BACKGROUND 12 The second model is based on the property that the color difference signal between R, G, and B images are slowly varying and thus, they are regarded as locally constant. Instead of estimating the original intensity in the two chromatic color channels, R and B, color difference model based algorithms estimate the difference signals, R-G, or B-G, in order to derive missing values. It is essential to use appropriate ESM operator and SM operator in order to reduce excessive blur, color shifts and visible aliasing effects during the demosaicking process. The equation (2.1) reflects important characteristics of natural scenes such that i) nonstationary characteristic due to existence of edges and fine details, ii) existence of interchannel correlation among RGB channels, and iii) existence of intra-channel correlation among spatially neighborhood pixels. [20] High Dynamic Range Imaging in Single Sensor Digital Cameras Currently, the research emphasis in digital photography is shifting from spatial resolution to tonal resolution and a significant amount of research effort has been devoted to HDRI. HDRI is a imaging technology that enables more realistic representation of the visual scene than conventional technologies by increasing dynamic range of image data. Dynamic range of a digital camera refers to the ratio between the maximum charge that the sensor can collect and the minimum detectable charge that just overcomes sensor noise. Once the light intensities of real world scene are measured in a sensor, they are quantized to produce digital data, traditionally into 8 bit per component, which gives 256 distinct levels. [21] However, the 8-bit representation is often not sufficient to represent the range of intensity levels in visual scenes containing both very bright and dark areas at the same time, and often such limitation results in improper exposure issues in captured images. For instance in a digital image captured with low exposure settings, dark areas in the scene will be recorded as black (underexposure). On the other hands, in high exposure

For example, 16-bit format can be used to represent pixels in a HDR image, which provides us tonal levels of 65,536 (= 2 16 ).

23 Chapter 2. BACKGROUND 13 settings, bright areas will be saturated (overexposure). HDRI performs operations on color data with a larger number of bits per component than 8 bit to represent more tonal levels over a much wider dynamic range. For example, 16-bit format can be used to represent pixels in a HDR image, which provides us tonal levels of 65,536 (= 2 16 ). It is sufficient to reveal more detail in complex scene lighting conditions. Figure 2.5 demonstrates poorly captured images due to limited dynamic range and a HDR image that preserves a wide dynamic range of light intensities. It can be seen that texture pattern on the wall is hidden under dimly illuminated areas in a low exposure image while detail of stained glass is not visible due to saturation in a high exposure image. On the other hands, the final HDR image reveals all details without loss of information. (a) image taken with low exposure time (b) image taken with high exposure time (c) HDR image Figure 2.5: Typical images with limited dynamic range and a HDR image This section provides a brief overview of the three major components in the HDR

24 Chapter 2. BACKGROUND 14 image processing pipeline for digital cameras: image acquisition, compression, and visualization. Especially strong emphasis is given on acquisition and compression of HDR images, which are generally embedded on digital cameras rather than end devices. HDR Content Acquisition There are two common approaches to produce HDR images in single sensor digital cameras: i) capture images directly from a HDR sensor, ii) generate HDR images by combining multiple low dynamic range (LDR) images at more than one exposure level using a regular sensor. Due to high production cost associated with a HDR sensor, the latter approach is more practical for consumer level cameras. In order to generate HDR images, multiple photos in different exposure values are captured and combined together to get good detail in all areas of a scene. Merging of multiple LDR images, so called HDR reconstruction process, involves the characterization of the sensor s intensity response function f, which relates a image pixel value z ij and an actual scene radiance value E ij as follows. z ij,k = f(e ij t k + η ij ) (2.2) A collection of k differently exposed pictures of a scene acquired with known variable exposure times t k and the sensor s noise η ij give a set of z ij,k values for each pixel ij, where k is the index on exposure times. Once f is recovered, the actual scene radiance values are obtained by applying its inverse f 1 to the set of correspondent brightness values z ij,k observed in the differently exposed images. One of the most popular techniques for HDR reconstruction is the Debevec and Malik method, shown in Figure 2.6 [22]. It is a two-stage HDR reconstruction algorithm that estimates a non-parametric response function from image pixels and then recovers the radiance map.

25 Chapter 2. BACKGROUND 15 Figure 2.6: HDR image acquisition by capturing multiple images The input to the algorithm is a number of digital images taken from the same vantage point with different known exposure durations t k. It is assumed that the scene is static, the sensor s noise η ij is negligible, the irradiance values E ij,k for each pixel ij are constant, and f is monotonic, thus invertible. The camera response function f is z ij,k = f(e ij t k ) f 1 (z ij,k ) = E ij t k ln f 1 (z ij,k ) = ln E ij + ln t k (2.3) g(z ij,k ) = ln E ij + ln t k,where g = ln f 1 The algorithm finds the function g and the radiances E ij that best satisfy an objective function in a least-squared error sense. Once g is obtained, it can be used to convert pixel values to relative radiance values E ij using known t k. For multiple capture approaches, it is essential that the scene is completely static during captures. Otherwise, misalignment between images due to movement of either objects in the scene or a camera causes a ghosting effect, which introduces blurry or transparent artifacts on a generated HDR image. Several techniques are proposed to reduce ghosting problem: i) use a tripod to eliminate camera movements, ii) capture a scene with a faster shutter speed to freeze motion of objects, and iii) exploit anti-ghosting techniques [23, 24]. Recently, new HDR acquisition technique [25] is proposed which doesn t require multiple captures of images. This method, shown in Figure 2.7, generates multiple LDR

26 Chapter 2. BACKGROUND 16 images of different exposure levels from an input Bayer CFA image using predefined look-up tables(luts) and merging the original and generated LDR images together to produce a final HDR image. Since this method removes needs for iterative processing and ghosting artifact caused by moving object, it is a reasonable solution that makes the HDRI technology feasible in single sensor imaging devices along with the multiple LDR capture method. Figure 2.7: HDR image acquisition by estimation HDR Image Compression As discussed in previous section, there are different techniques to create HDR images in digital cameras. Compression of acquired HDR content forms the next component in the processing chain. Nowadays, high-end/professional cameras allow the storage of the raw CFA data in high bit-depth, typically between 10 to 16 bit per pixel. For example, a popular high-end camera, the Canon EOS 5D Mark 2 can provides raw CFA image in bit depth of 14 bits. Increase in data bit depth leads to increased amount of image data and we need more efficient encoding algorithms. The JPEG compression, the most widely used image compression solution, disallows future manipulation offered by the high bit depth data since they are limited to 8 bit representation. Therefore, original HDR contents should be squashed into 8 bit prior to apply JPEG compression, causing the loss of precision. Current high-end cameras address this issue by allowing the storage of raw CFA image without compression, as illustrated in Figure 2.8.

HDR contents. Camera manufacturers different types of raw files, such as CR2 (Canon), NEF (Nikon), ORF (Olympus), PEF (Pentax), RW2 (Panasonic) and SR2 (Sony), mostly based on the TIFF file format.

27 Chapter 2. BACKGROUND 17 Figure 2.8: Image pipeline design with raw CFA image storage In such design, the user can retrieve CFA images from the digital camera and perform high quality post-processing operations in PC without loss of HDR contents. Camera manufacturers different types of raw files, such as CR2 (Canon), NEF (Nikon), ORF (Olympus), PEF (Pentax), RW2 (Panasonic) and SR2 (Sony), mostly based on the TIFF file format. However, preserving CFA images in a raw format leads to excessive consumption of the camera storage memory. Figure 2.9 presents the image processing pipeline that addresses the storage inefficiency issue associated with HDR data. Figure 2.9: Image pipeline design exploiting HDR contents compression In the proposed IPP, image compression standard capable of handling high bit-depth data, such as JPEG XR or JPEG 2000, is applied immediately after raw CFA image acquisition. It allows the CFA image to be compressed while retaining the necessary high bit-depth data for future manipulation. Ultimately the user will be offered efficient usage of expensive memory resources while maintaining superior image quality during various post-processing operations.

28 Chapter 2. BACKGROUND 18 HDR Image Display Displaying HDR content is the last component of the HDR image processing chain. HDR content usually cannot be directly displayed on common display devices, LCD or CRT monitors, as dynamic ranges of such devices are limited to conventional 8 bit representation. Tone mapping process performs a conversion which takes luminance of a HDR image as input and produces output pixel intensity that can be displayed on standard display devices. Several tone mapping algorithms are proposed in the literatures and they are categorized in two classes, global approaches [26, 27] and local approaches [28]. Global tone-mapping algorithms apply same transfer function for all pixels. On the other hands, local tone tone-mapping algorithms adapt mapping functions depending on local statistics and pixel contexts. Generally, there is no single method which produces the best result for all images and thus, user need to select an optimal algorithm based on particular requirements and available computational resources. 2.2 Image Compression In digital imaging, each pixel is a sample of an original image, and its intensity is typically represented with a fixed number of bits. The statistical analysis indicates that digital images contain a significant amount of spatial and spectral redundancies. Image compression aims at taking advantage of these redundancies to reduce the number of bits to represent an image. In addition, the insensitivity of HVS allows further reduction of bandwidth by ignoring certain signals that is not sensible by human. This section elaborates on fundamental image compression techniques, common image compression standards, and various CFA compression algorithms for single sensor imaging devices.

29 Chapter 2. BACKGROUND Common Image Compression Techniques Color Space Conversion A digital image generally has three color components per pixel, RGB. Instead of coding RGB data directly, common compression standards exploit color space conversion to transform them into luminance/chrominance system. The luminance/chrominance system defines a color space in terms of one luminance and two chrominance components. Luminance is perceived brightness of the light, while chrominance is defined as the characteristic of light that produces the sensation of color apart from luminance. [1] Luminance/chrominance spaces are advantageous over RGB in two major reasons. Firstly, for general color images, inter-channel correlation can be reduced by converting RGB images to luminance/chrominance images, thus color space conversion allows better compression performance. Secondly, it is a more convenient form to apply a subsampling technique that allows reduction of visually redundant content that is less sensible for human. The most commonly used luminance/chrominance system in multimedia compression is the YCbCr space. The forward and inverse conversions between RGB and YCbCr are defined in the JPEG 2000 specification as follows [29]: Y R Cb(U) = G Cr(V ) B R Y G = Cb B Cr (2.4) The conversion process in (2.4) is computationally expensive due to floating point arithmetic. Recently the YCoCg color space was introduced to simplify color transformation by avoiding use of floating point coefficients and rounding errors. This new color space defines two chrominance channels, Co and Cg, which can be regarded as excess orange

30 Chapter 2. BACKGROUND 20 and excess green. The transform matrix of YCoCg is derived by close approximation of Karhunen-Loeve transform (KLT) from standard Kodak image set and can be implemented using simple addition and right shift as follows [30]: Y 1/4 1/2 1/4 R Co = 1/2 0 1/2 G Cg 1/4 1/2 1/4 B R Y G = Co B Cg (2.5) The reversible form of YCoCg transform, referred as YCoCg-R, is used in the JPEG XR standard and in recent edition of the H.264/MPEG-4 AVC standard. Predictive Coding Instead of encoding original signal directly, a predictive coding technique, also known as a differential coding, encodes the difference between the original signal and its prediction. Since pixels in a natural image are highly correlated to each other, a pixel can be predicted with a good accuracy from its adjacent pixels. A predicted value is then subtracted from the original value of the corresponding pixel to obtain a prediction error, also called a prediction residue. The performance of predictive coding is significantly affected by the accuracy of prediction algorithm. If the prediction is well designed, distribution of prediction error signal will be closely concentrated on zero and the variance of the error signal will be much lower than that of the original signal. Consequently, applying an entropy coding on the prediction error signal will improve compression efficiency. Predictive coding is often used in lossless compression standards. The most popular compression standards make use of predictive coding technique is JPEG-LS [31]. JPEG- LS standard exploits a predictor called Median Edge Detector (MED) which provides a good balance between prediction accuracy and computational simplicity. It predicts the

31 Chapter 2. BACKGROUND 21 value of the current pixel by examining 3 neighbor pixels of the current one in North, West, and North-west directions. Another lossless image codec CALIC [32] employs an advanced predictor called Gradient Adaptive Predictor (GAP) that provides a higher prediction performance by using 7 neighbor pixels Image Compression Standards : JPEG family In digital photography, there are many different formats to compressed raw images. However, most frequently used compression standards are the ones established by Joint Photographic Experts Group. These standards are widely adopted by manufacturers for compatibility for their products. The first standard released by the JPEG group is the JPEG standard [10], introduced in the 1980s. JPEG s baseline mode, the most dominantly used operation mode, is a lossy compression scheme based on two dimensional Discrete Cosine Transform (DCT). Its workflow consists of color space conversion, DCT transform, quantization, and entropy coding. Although JPEG has been successful in the industry for a long period, its limitation in rate-distortion performance and lack of supports for unified pipeline for both lossy and lossless coding raised the need for an advanced compression standard. To overcome limitations of JPEG, JPEG2000 [33] was released in 2000 under the principle of the Discrete Wavelet Transform (DWT). JPEG2000 provides not only higher rate-distortion performance than the original JPEG standard but also a single pipeline for both lossy and lossless encoding. Its spatial and quality scalability allows decoding of compressed bitstream in different resolution and precision configurations to meet different application requirements. In addition, JPEG2000 can handle high bit-depth data such as 16-bit integer or 32-bits floating point per components, enabling compression of HDR images. However, main disadvantage of JPEG2000 compared with the JPEG is its complex architecture which resulted in limited industrial adoption. JPEG XR (extended range) [34], released in 2009, is a new image compression stan-

32 Chapter 2. BACKGROUND 22 dard based on Microsoft coding technology known as HD Photo [35]. JPEG XR provides many convenient features offered in JPEG 2000 while maintaining its architecture substantially simpler than JPEG 2000 since it only uses integer based computations internally. Figure 2.10: Block diagram of JPEG XR encoding process JPEG XR supports a wide range of input bit-depth from 1 bit through 32 bit per component. 8-bit and 16-bit formats are supported for both lossy and lossless compression, while 32-bit format is only supported for lossy compression as only 24 bits are typically retained through internal operations. Following conventional image compression structure, JPEG XR s coding path, shown in Figure 2.10, includes color space conversion, block transform based on a reversible lapped bi-orthogonal transform (LBT), quantization, and entropy coding. The LBT converts image data from spatial domain to frequency domain. As a result of the LBT the coefficients are grouped into three subbands, DC, lowpass(lp), and highpass(hp). DC, LP and HP subbands are then quantized and entropy coded independently. The performance of JPEG XR has been compared with other compression standards in literatures. [36] evaluates rate-distortion performance of JPEG XR against JPEG, JPEG 2000 and AVC/H.264 HP 4:4:4 intra using objective quality metrics, such as PSNR and MSSIM index. It concludes that the performances of JPEG XR and JPEG 2000 are

33 Chapter 2. BACKGROUND 23 very close to each other and JPEG 2000 outperforms JPEG XR slightly in some cases. [37] performs perceptual quality assessments to compare rate-distortion performance of JPEG, JPEG2000, and JPEG XR. Experimental results drew the similar outcome as objective assessments Prior arts on Bayer CFA compression As discussed in Section 2.1.4, storage of raw CFA images leads to excessive usage of camera on-board memory and therefore, it raised the problem of efficient CFA image compression. This section summarizes various CFA image compression schemes in literatures which follows the alternative processing workflow that performs compression in earlier stage than CDM. The most straightforward approach is a direct application of standard image compression, such as JPEG, JPEG-LS, or JPEG 2000, on raw CFA images. [38, 39] However, direct compression of raw CFA images is found to be inefficient since existing compression solutions are generally optimized for continuous tone images and don t work as effectively for mosaic-like images. Due to nonuniform spectral sensitivity of image sensor, pixels from different color channels have different average intensity levels. Therefore, intermixing pixels from different color channels generates artificial discontinuity. In order to address this issue, advanced CFA compression schemes typically exploit various pre-processing operations prior to image encoding for optimal use of compression tools. In current, state-of-the-art, single sensor camera designs utilize compression schemes in three different ways: lossless [13, 39, 40, 41], lossy [14, 15, 38, 42, 43, 44, 45], and nearlossless [40], depending on nature of pre-processing algorithms and compression tools. Lossless compression is used when the exact replica of the original image data is preferred over high compression ratio. It is crucial in the field of medical imaging, cinema industry, and image archiving system of museum arts and relics. On the other hands, lossy approaches aim to minimize amount of image data by discarding visually redun-

Chapter 2. BACKGROUND 24 dant contents. They are suitable for the areas where the efficient usage of memory and computational resource is paramount.

34 Chapter 2. BACKGROUND 24 dant contents. They are suitable for the areas where the efficient usage of memory and computational resource is paramount. Near-lossless schemes lie somewhere in-between two other classes, where algorithms achieve perceptually lossless compression by limiting distortion in compressed image to pre-defined threshold values. Figure 2.11: CFA deinterleave process In following, a number of pre-processing techniques exploiting a pixel rearrangement strategy are discussed. Commonly, prior-art solutions deinterleave the CFA images into sub-images, each consists of samples from a single color channel. The resulting R and B sub-images form rectangular lattice which can be easily encoded by common standards. However, the quincunx lattice of G sub-image is needed to be further processed for subsequent compression. There are three popular approaches to transform the quincunx G sub-image to the form more convenient for compression: i) merge, ii) separation, and iii) rotation. Some CFA compression techniques [14, 42] employ a color space conversion to convert CFA image in RGB domain to luminance-chrominance domain prior to deinterleave. In such scenario, deinterleave operation produces a quincunx luminance (Y) and rectangular chrominance (C) color channels, and thus following techniques can be applied to Y channel instead of G. Firstly, the merge method [14, 43, 44] shifts either even pixel rows up or even pixel columns left by one pixel. This produces a rectangular grid where one dimension is equal and the other one is a half of the corresponding CFA. The generated rectangular

35 Chapter 2. BACKGROUND 25 Figure 2.12: CFA deinterleave process : G subimage images are compressed by JPEG or JPEG2000. Since, simple shift can introduce distortion causing suboptimal compression, [14] applies directional lowpass filtering prior to compression. This is only suitable for lossy approaches as lowpass filtering removes edges and fine details. Secondly, the separation method [14, 38, 40, 42] splits the quincunx lattice into two rectangular lattices and compresses them separately. Independent encoding of two sublattices is inefficient as it disregards spatial correlation between two sublattices, and therefore, [40] applies a predictive coding technique to improve compression efficiency. Lastly, rotation method [45] rotates the quincunx grid by 45 degree and removes blank pixel positions. However, the resulting image forms a rhombus and standard encoders such as JPEG, and JPEG2000 cannot be applied directly. Instead of performing color channel deinterleaving, [13] applies a wavelet decomposition followed by an entropy coding directly to CFA images to alleviate the aliasing issue in direct CFA encoding. In this scheme, the Mallet wavelet transform decorrelates CFA images by efficiently packing the signal energy into subbands. Overall, there exist various CFA compression schemes and the experimental result indicates that there is no single best method for all test images. Therefore, the ultimate design goal is to decide appropri-

36 Chapter 2. BACKGROUND 26 ate pre-processing operations and compression standards to meet a set of requirements: rate-distortion performance and computational cost. 2.3 Image Quality Assessment Metrics With the advent of various multimedia compression standards, it has become increasingly important for industry to devise standardized quality assessment tools for compressed digital contents. Since human observers are ultimate receivers in image processing applications, the most reliable way to evaluate quality is to conduct a survey, where a group of humans is asked to rate on perceived quality of presented images on a numerical scale. The average of obtained values is called the mean opinion score (MOS) and such assessment technique is referred as subjective quality assessment (QA). However, the impracticality of subjective QA raised the need for objective QA that measures the perceived quality of visual contents using automated algorithms. Those metrics can be employed to benchmark image processing systems and also embedded into system to optimize system parameter settings. Generally objective QA metrics are categorized in three classes: i) full-reference (FR), ii) no-reference (NR), and iii) reduced-reference (RR). [46] FR algorithm require an original version of image (non-distorted) to predict perceived quality of a sample distorted image. NR algorithms don t need an access to original image and RR algorithms lie somewhere in-between where they only require some characteristics of a reference image. This section focuses on image QA metrics implementing FR algorithms, which are mainly used in this thesis research.

37 Chapter 2. BACKGROUND Non-perceptual Quality Metrics One of the most common objective QA metrics, the Mean Square Error(MSE) is defined as, MSE = 1 MN M N (X (i,j) Y (i,j) ) 2 (2.6) i=1 j=1 where X denotes a reference image, Y denotes a distorted image to be compared, and M, N denote image dimensions. The MSE is basically a normalized Minkowsky distance with order p being 2, where the Minkowsky distance is defined as follows: M N E p = ( X (i,j) Y (i,j) p ) 1/p (2.7) i=1 j=1 In addition, setting p = 1 yields the mean absolute error (MAE), and p = yields the maximum absolute difference (MAD). In practice, MSE s variant, the peak to signal noise ratio(psnr) is often used in db scale, which is defined as follows: P SNR = 10 log 10 (2 B 1) 2 MSE (2.8) where B represents bit depth. MSE, PSNR and other variants can be easily implemented in real world applications but often don t reflect the way that HVS perceives images. Therefore, a major emphasis in recent research has been given to image QA algorithms based on explicit modeling of the HVS, such as the structural similarity index (SSIM) and the Visible Difference Predictor (VDP) Perceptual Quality Metrics The SSIM index [47] is a widely used FR algorithm based on an idea that HVS is highly adapted to extract structural information from visual scenes. It separates the task of image similarity measurement into three components: luminance, contrast, and structure. The luminance and contrast distortions are affected by illuminance variations, while structure information of the objects is independent of the illuminance. Hence,

38 Chapter 2. BACKGROUND 28 the SSIM algorithm performs independent structure distortion measurement along with luminance and contrast analysis. Similarly to other FR approaches, the SSIM index is a function of two images denoted as X and Y, that if one of the images is assumed to be the reference image, the SSIM index can be regarded as a quality measure of the other image. Initially, the algorithm estimates the local luminance of each image signal by the mean intensity. The local luminance of image X, µ x, is obtained by µ x = 1 MN M N X (i,j) (2.9) i=1 j=1 Secondly, the mean intensity is removed from the signal and the standard deviation is used as a round estimation of the contrast information. The contrast of image X, σ x, is estimated as follows 1 σ x = { MN 1 M N (X (i,j) µ x ) 2 } 1/2 (2.10) i=1 j=1 Next, the signal is normalized by its own mean and standard deviation. This normalized signal, (X µ x )/σ x, is used as a structure estimation of image X. Parameters for local luminance, contrast, and structure information is obtained for each image signal and they formulates luminance comparison function l(x, Y ), contrast comparison function C(X, Y ), and structure comparison functions s(x, Y ) as follows: l(x, Y ) = (2µ x µ y + C 1 )/(µ 2 x + µ 2 y + C 1 ) c(x, Y ) = (2σ x σ y + C 2 )/(σ 2 x + σ 2 y + C 2 ) s(x, Y ) = (2σ xy + C 3 )/(σ x σ y + C 3 ) where σ xy = 1 MN 1 M i=1 N (X (i,j) µ x )(Y (i,j) µ y ) j=1 (2.11) C 1, C 2, and C 3 are defined as C 1 = (K 1 L) 2, C 2 = (K 2 L) 2, and C 3 = C 2 /2, where L denotes the dynamic range of the pixel values, and K 1, K 2 are positive constants generally set to be 0.01 and 0.03 respectively. Finally, the three components are combined to yield

39 Chapter 2. BACKGROUND 29 an overall similarity measure SSIM(X, Y ) SSIM(X, Y ) = [l(x, Y )] α [c(x, Y )] β [s(x, Y )] γ (2.12) where α, β and γ are positive parameters that adjust the relative importance of the three components. Typically, the SSIM method is applied locally rather than globally using a support window, producing a SSIM index quality map of the image. In practice, when a single quality measure of the entire image is preferred to the quality map, a mean SSIM index is often used using (2.13): SSIM(X, Y ) = 1 M M SSIM(x i, y i ) (2.13) i=1 where x i and y i are the image pixel values of the reference and the distorted images at the i-th local window, and M is the number of local windows in the image. An advanced SSIM metric, called a multi-scale SSIM (MSSIM) [48] is often used due to its robustness in variation of viewing conditions. MSSIM initially decomposes a test image into several scales and provides statistics by measuring luminance, contrast, and structure information of each sub-scale image. Finally, all the data is pooled into a single number. MSSIM provides good correlation to subjective measurements at a reasonable computational cost. Another widely used image QA metric is the Visible Difference Predictor(VDP). The VDP metric predicts pixel percentage of a test image that standard observers would perceive as different from an original. In order words, VDP does not try to judge how irritating image artifacts introduced by compression are, it only tries to predict whether they are detectable. VDP deploys a highly complex model of the HVS and thus, computationally intensive than MSSIM. The VDP algorithm customized for HDR images is called the HDR-VDP [49]. It deploys several modifications to the VDP to improve its prediction accuracy in the wider range of luminance and under the adaptation conditions corresponding to real scene observation.

40 Chapter 3 Lossless CFA Compression using Prediction 3.1 Introduction In this chapter, a new lossless CFA compression method capable of handling HDR representation is presented. We focus on the Bayer CFA structure as it is the dominant CFA arrangement in the industry. The proposed scheme consists of color channel deinterleave, weighted template matching prediction, and lossless image compression operations. There are main differences of the proposed method compared to prior art solutions. Firstly, it introduces a weighted template matching prediction to increase the accuracy of prediction and achieve high compression efficiency. Our method is similar to the context matching based prediction (CMBP) presented in [41], but is more advantageous in terms of computational complexity. It is because the proposed method does not require the generation of the direction vector map, that is necessary to carry out predictive coding in CMBP. Secondly, we make use of the JPEG XR codec [34] to facilitate a lossless compression of CFA image in HDR representation, such as 16 bit per pixel format. Although other codecs, such as JPEG 2000 or JPEG-LS, are also capable of handling HDR input, 30

41 Chapter 3. Lossless CFA Compression using Prediction 31 JPEG XR s balance between performance and complexity makes it a suitable solution for digital camera implementation. The rest of this chapter is structured as follows. The proposed lossless CFA compression pipeline is presented in Section 3.2. Experiment results and analysis are demonstrated in Section 3.3. Finally the chapter summary is given in Section Proposed Algorithm Figure 3.1 illustrates the proposed CFA compression method for encoding process and decoding process. The proposed scheme employs a structure separation to extract 3 subimages of single color component from the original CFA layout. Then, each sub-image undergoes a predictive coding process. The predictive coding forms a prediction for current pixel based on a linear combination of previously coded neighborhood pixels, and encodes the prediction error signal to remove spatial redundancies. Initially we process G sub-image using the weighted template matching prediction technique in raster scan order, and generate the prediction error of G channel, e g. After completion of G channel prediction, non-green sub-images are processed. Instead of carrying out the prediction on R and B samples directly, we use color difference domain signals, dr (G-R), and db (G-B) for non-green components. This allows us to reduce spectral (inter-channel) redundancies in the data, leading to higher compression efficiency. In order to obtain color difference signals, the estimation of missing G values at non-green pixel positions is necessary. In the proposed algorithm, we perform a bilinear interpolation on a quincunx G sub-image, which delivers satisfactory performance at low computational cost. Again, the prediction error of color difference signals, e dr and e db are obtained by the proposed predictor. The generated error signals constitute standard 4:2:2 formatted data. Therefore, they are encoded by JPEG XR codec using its 4:2:2 lossless encoding mode. In the companion decoding pipeline, compressed prediction error signals are decoded.

42 Chapter 3. Lossless CFA Compression using Prediction 32 Then the decoder forms the identical prediction as the one from the encoding pipeline using decompressed error signal to reconstruct individual sub-images. Finally, we combine generated sub-images to recover original CFA layout. (a) Encoding process (b) Decoding process Figure 3.1: Overview of the proposed lossless CFA compression pipeline Deinterleaving Bayer CFA The proposed pipeline initially deinterleaves the Bayer CFA images into three sub-images, r, g, and b, as shown in Figure 3.2. As previously mentioned in Section 2.2.3, the direct application of compression solution to the CFA image is inefficient as CFA data are formed by intermixing samples from different color channels. Although for most natural

43 Chapter 3. Lossless CFA Compression using Prediction 33 images, there still exist spatial correlations between CFA samples, pixels from different channels contain high frequency discontinuities, disallowing high compression ratio. By deinterleaving the CFA image, three downsampled sub-images, each of which consists of pixels in a single color channel, are extracted. Figure 3.2: Bayer CFA deinterleave method Let us consider, a K 1 K 2 grayscale CFA image z (i,j) : Z 2 Z representing a twodimensional input image to encode. The deinterleaving process can be formulated as follows: z (i,j), (i, j) {(2m 1, 2n), (2m, 2n 1)} g (i,j) = 0, otherwise z (i,j), (i, j) {(2m 1, 2n 1)} r (i,j) = 0, otherwise z (i,j), (i, j) {(2m, 2n)} b (i,j) = 0, otherwise (3.1) where m = 1, 2,, K 1 /2, and n = 1, 2,, K 2 /2. The obtained R and B sub-images form square lattices, while the obtained G sub-image constitutes a quincunx lattice. Each sub-image contains pixels from same color component and thus, subsequent prediction process can effectively remove spatial redundancies to achieve high compression performance.

44 Chapter 3. Lossless CFA Compression using Prediction Green sub-image prediction The compression efficiency of predictive coding depends on the accuracy of a prediction model. Simple linear predictors often yield poor performance at image edge regions. The proposed adaptive predictor exploits a template matching technique to achieve high prediction performance. It measures the dissimilarity between the template of a current pixel to predict and the template of candidate pixels in neighbor to determine weight factors of candidate pixels. The weight factors adaptively increases the influence of candidate pixel whose associated template closely resembles the template of the pixel to predict and located closer from current spatial position. The proposed scheme handles the pixels in a raster scan order, which means from left pixel to right and from top to bottom. Figure 3.3: Current pixel to be predicted and its 4 closest neighborhood pixels in a quincunx G sub-image Figure 3.3 illustrates the current G pixel g (i,j) to predict and its 4 candidate pixels, which are previously scanned 4 neighbor G pixels. The predicted value of g (i,j), denoted as ĝ (i,j), is given by, ĝ (i,j) = (w g(p,q) g (p,q) ) (3.2) (p,q) ζ 1 where ζ 1 are 4 closest neighborhood pixels of g (i,j) such that ζ 1 {(i, j 2), (i 1, j

45 Chapter 3. Lossless CFA Compression using Prediction 35 1), (i 2, j), (i 1, j + 1)}. The normalized weight factors, w g(p,q) are given by w g(p,q) = w g(p,q) / w g(m,n) (3.3) (m,n) ζ 1 The original weight factor w g(p,q) is defined as follows: w g(p,q) = {1 + ( (r,s) ζ 1 Diff(T g(p,q), T g(r,s) )/D(g (p,q), g (r,s) ))} 1 (3.4) where T g(p,q) is the template of G prediction centered at pixel (p,q), T g(p,q) {(p, q 2), (p 1, q 1), (p 2, q), (p 1, q + 1)}, operator Diff( ) is a dissimilarity metric, and operator D( ) is a spatial distance between two pixels. We add 1 in the denominator to avoid a singularity issue that ( (r,s) ζ 1 Diff(T g(p,q), T g(r,s) )/D(g (p,q), g (r,s) )) becomes zero. [50] Figure 3.4: Template of G sub-image centered at (i,j). o indicates pixels in the template region The template used for G prediction is shown in Figure 3.4. Although using a larger template image in matching process improves prediction performance, the template of 4 pixels shows good trade-off between prediction accuracy and computational cost. Typically, prediction techniques use sum of absolute differences (SAD) or sum square errors (SSE) between two templates in order to determine the degree of dissimilarity. We use the SAD due to its simplicity in implementation. Therefore, Diff(T g(p,q), T g(r,s) ) is

46 Chapter 3. Lossless CFA Compression using Prediction 36 defined as follows: Diff(T g(p,q), T g(r,s) ) = g (p,q 2) g (r,s 2) + g (p 1,q 1) g (r 1,s 1) + g (p 2,q) g (r 2,s) + g (p 1,q+1) g (r 1,s+1) (3.5) Figure 3.5: Pixel values required for the prediction of G pixel at (i,j) As shown in Figure 3.5, the proposed predictor requires a 5x7 support window centered at pixel location (i-2, j-1) to calculate ĝ (i,j). w g(i,j 2), w g(i 1,j 1), w g(i 2,j), and w g(i 1,j+1), correspond to the west, northwest, north, and northeast weight factors of g (i,j) pixel, are obtained using equation(3.6). w g(i,j 2) = {1 + ( g (i,j 2) g (i,j 4) + g (i 1,j 1) g (i 1,j 3) + g (i 2,j) g (i 2,j 2) + g (i 1,j+1) g (i 1,j 1) )/(2)} 1 w g(i 1,j 1) = {1 + ( g (i,j 2) g (i 1,j 3) + g (i 1,j 1) g (i 2,j 2) + g (i 2,j) g (i 3,j 1) + g (i 1,j+1) g (i 2,j) )/( 2)} 1 w g(i 2,j) = {1 + ( g (i,j 2) g (i 2,j 2) + g (i 1,j 1) g (i 3,j 1) + (3.6) g (i 2,j) g (i 4,j) + g (i 1,j+1) g (i 3,j+1) )/(2)} 1 w g(i 1,j+1) = {1 + ( g (i,j 2) g (i 1,j 1) + g (i 1,j 1) g (i 2,j) + g (i 2,j) g (i 3,j+1) + g (i 1,j+1) g (i 2,j+2) )/( 2)} 1 Figure 3.6 demonstrates the weight factor computation sequence for the G pixel at location (i,j). In diagrams, the template region for the current pixel to predict is indicated

Chapter 3. Lossless CFA Compression using Prediction 37 with red boxes and the template region for candidate pixel are indicated with blue boxes.

47 Chapter 3. Lossless CFA Compression using Prediction 37 with red boxes and the template region for candidate pixel are indicated with blue boxes. (a) weight factor for west (b) weight factor for northwest (c) weight factor for north (d) weight factor for northeast Figure 3.6: Weight computation for the prediction of G pixel at (i,j) Once ĝ (i,j) is obtained, G prediction error, e g(i,j), is determined by e g(i,j) = g (i,j) ĝ (i,j) and coded in the encoding module. Since the decoder can make same prediction ĝ (i,j) as the encoder, the original G sub-image can be reconstructed without loss by adding decoded prediction error, e g, and ĝ (i,j) Non-Green sub-image prediction Independent encoding of deinterleaved sub-images yields suboptimal compression efficiency since data redundancy in the form of inter-channel correlation is disregarded during compression. In order to take into account inter-channel correlation, we per-

Chapter 3. Lossless CFA Compression using Prediction 38 form the prediction of non-green sub-images in the color difference domain rather than the original intensity domain.

48 Chapter 3. Lossless CFA Compression using Prediction 38 form the prediction of non-green sub-images in the color difference domain rather than the original intensity domain. To obtain color difference images, we need to estimate G samples at original R and B pixel locations, which are unavailable in original CFA layout. The missing G values are estimated from available G samples of the CFA image by interpolation. Various interpolation schemes are available from the low-complexity bilinear method to the complex methods utilizing a variety of estimation operators and edge-sensing mechanisms. Our simulation results have shown that advanced interpolation techniques typically improve the compression efficiency only marginally and thus, we use the simple bilinear approach. Figure 3.7: Current pixel to be predicted and its closest neighborhood pixels in a red difference (dr) sub-image Two color difference images, dr (i,j) and db (i,j) are defined as follows: dr (i,j) = G (i,j) r (i,j), (i, j) {(2m 1, 2n 1)} (3.7) db (i,j) = G (i,j) b (i,j), (i, j) {(2m, 2n)} (3.8) where G denotes interpolated G channels. Since prediction procedure of two color difference images, dr (i,j) and db (i,j), are essentially identical, we only present a prediction procedure for the red difference image using generalized difference signal d (i,j) in this section. Similarly to G case, the proposed scheme predicts a current pixel d (i,j) using its four closest candidate pixels placed in the direction of west, northwest, north, and

49 Chapter 3. Lossless CFA Compression using Prediction 39 northeast, as shown in Figure 3.7. However, unlike G component, non-green components forms square lattices rather than quincunx ones, and hence, candidate pixels are defined to be ζ 2 {(i, j 2), (i 2, j 2), (i 2, j), (i 2, j + 2)}. The prediction of color difference sub-images is also performed in a raster-scan order using the weighted template matching technique. The template for the color difference sub-image is defined in Figure 3.8 using G samples, since edge and fine detail are typically deemphasized in color difference domain, while well preserved in G channel due to double sampling rate. Figure 3.8: Template of red difference (dr) sub-image centered at (i,j). o indicates pixels in the template region The original weight factor of difference sub-image w d(p,q) is defined as follows : w d(p,q) = {1 + ( (r,s) ζ 1 Diff(T d(p,q), T d(r,s) )/D(d (p,q), d (r,s) ))} 1 (3.9) where T d(p,q) denotes the template of color difference image at (p,q), and defined as T d(p,q) {(p, q + 1), (p, q 1), (p + 1, q), (p 1, q)}. w d(i,j 2), w d(i 1,j 1), w d(i 2,j), and w d(i 1,j+1), correspond to the west, northwest, north, and northeast weight factors of d (i,j)

50 Chapter 3. Lossless CFA Compression using Prediction 40 pixel, are obtained using equation (3.10). w d(i,j 2) = {1 + ( g (i,j 1) g (i,j 3) + g (i 1,j) g (i 1,j 2) + g (i,j+1) g (i,j 1) + g (i+1,j) g (i+1,j 2) )/(2)} 1 w d(i 2,j 2) = {1 + ( g (i,j 1) g (i 2,j 3) + g (i 1,j) g (i 3,j 2) + g (i,j+1) g (i 2,j 1) + g (i+1,j) g (i 1,j 2) )/(2 2)} 1 w d(i 2,j) = {1 + ( g (i,j 1) g (i 2,j 1) + g (i 1,j) g (i 3,j) + (3.10) g (i,j+1) g (i 2,j+1) + g (i+1,j) g (i 1,j+2) )/(2)} 1 w d(i 2,j+2) = {1 + ( g (i,j 1) g (i 2,j+1) + g (i 1,j) g (i 3,j+2) + g (i,j+1) g (i 2,j+3) + g (i+1,j) g (i 1,j+2) )/(2 2)} 1 (a) weight factor for west (b) weight factor for northwest (c) weight factor for north (d) weight factor for northeast Figure 3.9: Weight computation for the prediction of red difference (dr) pixel at (i,j)

51 Chapter 3. Lossless CFA Compression using Prediction 41 Figure 3.9 demonstrates the weight factor computation sequence for the red difference pixel at location (i,j). In diagrams, the template region for the current pixel to predict is indicated with yellow boxes and the template region for candidate pixel are indicated with blue boxes. Once weight factors for all directions are computed, the predicted value is obtained using normalized weights w d(p,q) as follows: ˆd (i,j) = (p,q) ζ 2 (w d(p,q) d (p,q) ) (3.11) The prediction error of color difference images e d is determined by e d(i,j) = d (i,j) ˆd (i,j) and coded in the encoding module. Again, the decoder has all information to make same prediction as the encoder and thus, it can reconstruct the R and B sub-image without loss Compression of prediction error The prediction error for three sub-images, e g, e dr, and e db, are obtained from previous stages. To compress them without loss, various existing image codecs with lossless encoding capability, such as JPEG-LS, JPEG 2000, and JPEG XR are considered. In our proposed pipeline, we make use of JPEG XR standard due to the following reasons: i) JPEG XR supports channel bit-depth upto 24 bits for lossless compression, allowing efficient storage of HDR format data, and ii) JPEG XR yields balanced output between compression efficiency and computational complexity. In our experiment, JPEG XR provides almost comparable coding efficiency to high performance JPEG In terms of complexity, JPEG XR has considerably simpler architecture than JPEG 2000 and is comparable to low complexity JPEG-LS. Therefore, we believe that JPEG XR is an ideal compression solution for resource constrained environments such as digital cameras. The number of samples to compress in e g is twice as much as the ones in e dr, and e db. It implies that the prediction error signal forms a standard 4:2:2 arrangement and thus, YCC

52 Chapter 3. Lossless CFA Compression using Prediction 42 4:2:2 encoding mode of JPEG XR can be applied to compress it. JPEG XR performs lapped bi-orthogonal transform (LBT), quantization, and adaptive Huffman coding to compress given input. 3.3 Experimental Results Experiments are carried out using 31 RGB images from the Para-Dice Insight Compression Database [51], shown in Figure This database is chosen since it is a publicly available dataset containing a wide variety of RGB images in 16-bit HDR representation, varying in the edges and color appearances, and thus suitable for the evaluation of our proposed solution. Three channel RGB images in the database are initially resized to 960x640 and sampled by the Bayer CFA to produce the grayscale CFA images z : Z 2 Z. The CFA images z are then processed by the proposed pipeline and compressed into JPEG XR format c by JPEG XR reference software [52]. The reconstructed CFA images x : Z 2 Z are generated by applying JPEG XR decompression to the compressed data c, followed by processing operations in decoding pipeline. As all intermediate steps are lossless, the reconstructed CFA images x should be identical to the original CFA images z. Performance of different solutions is evaluated by comparing lossless compression bitrate. Compression bitrate is reported in bits per pixel (bpp), (8 B)/n, where B is the file size in bytes of the compressed image including image header and n is the number of pixels in the image. The JPEG XR codec is operated in lossless mode as follows: i) all subbands (DC, LP, and HP) and flexbits are preserved during encoding, ii) Quantization is disabled by setting quantization parameters to 1 for all subbands and color channels.

53 Chapter 3. Lossless CFA Compression using Prediction 43 Figure 3.10: Test digital color images (referred to as image 1 to image 31, from left to right and top to bottom)

54 Chapter 3. Lossless CFA Compression using Prediction Primary color channel and color difference channel This section compares the compression performance of original R/B channels and color difference channels. (a) original channels (b) color difference channels Figure 3.11: 2D autocorrelation graphs for the image 4 in database (a) original images, R and B, (b) color difference images, dr and db Figure 3.11 shows the two-dimensional autocorrelation of the primary color images R and B, and the color difference images dr and db, for the image 4 in our database. The height at each position indicates the correlation between the original image and spatially shifted version of itself, which is defined in equation(3.12): i j Corr(m, n) = (X (i,j) X (i,j) )(X (i+m,j+n) X (i+m,j+n) ) (3.12) i j (X (i,j) X (i,j) ) 2 i j (X (i+m,j+n) X (i+m,j+n) ) 2 where X (i,j) is the original image, X (i+m,j+n) is the shifted version of itself, X represent the mean values of the given image, and m, n denote spatial shifts in horizontal and vertical directions. The value at the center of graph is always 1 as it corresponds to zero shift case. The figure shows that the level of similarity drops off more rapidly with color difference images than primary color images as shifting distance increases. This observation

55 Chapter 3. Lossless CFA Compression using Prediction 45 holds true for the other images in database. It implies that dr and db have lower spatial correlation between neighborhood pixels than R and B. Since spatial redundancy is reduced by using color difference images, more efficient entropy coding is expected. As shown in Table 3.1, the proposed scheme yields average lossless compression bitrates of bit per pixel (bpp) for primary color images and bpp for color difference images, respectively. Image RB drdb Image RB drdb Avg Table 3.1: Lossless bitrate of proposed compression scheme with primary channel and color difference channel

56 Chapter 3. Lossless CFA Compression using Prediction Green channel interpolation method Img BI SPL EDI NEDI Img BI SPL EDI NEDI Avg Table 3.2: Lossless bitrate of proposed compression scheme with various G interpolation schemes Since we perform the weighted template matching prediction on the color difference domain, the estimation of missing G samples at R and B pixel positions is necessary. This is essentially achieved by interpolating the quincunx G image. In order to investigate the influence of an interpolation technique in coding performance, we examined several interpolation methods, including bilinear (BI), cubic spline interpolation (SPL), edge-

57 Chapter 3. Lossless CFA Compression using Prediction 47 directed interpolation (EDI) [16], new edge-directed interpolation (NEDI) [53], which vary in estimation accuracy and computational complexity. For BI, missing G samples are estimated by taking an average value of four surrounding pixels. In SPL, a piecewise continuous curve, passing through each of the given samples in G sub-image, is defined to determine missing pixel values. EDI is an adaptive approach that measures horizontal and vertical gradients of missing G samples to decide the direction to perform interpolation. NEDI initially computes the local covariance coefficients and and use them to adapt the interpolation direction. Table 3.2 lists the lossless compression bitrates of the proposed scheme for different interpolation methods. The bitrates for BI is not listed as they are the same as the bitrates of color difference image in Table 3.1. On average, lossless bitrates are , , , bpp for BI, SPL, EDI, and NEDI, respectively. The observation shows that use of advanced interpolation doesn t significantly improve compression efficiency and sometimes even degrades performance. Therefore, it is sufficient to use low complexity bilinear interpolation in our proposed scheme for optimal compression performance Dissimilarity measure in template matching The dissimilarity measure is a key element in template matching during prediction, since the choice of dissimilarity metric in equation(3.4) and equation(3.9) affects computational complexity and the accuracy of the prediction process. Table 3.3 presents the lossless compression bitrates of the proposed scheme for the images from our database using two commonly used dissimilarity metrics, SAD and SSE. They are defined as follows: SAD (i,j) = i j (3.13) SSE (i,j) = (i j) 2 (3.14) According to Table 3.3, the lossless bitrates for SAD and SSE are almost identical as bpp and bpp, respectively. We can conclude that selection of dissimilarity

58 Chapter 3. Lossless CFA Compression using Prediction 48 measure does not significantly affect compression performance and therefore, SAD is preferred to SSE due to its low complexity in implementation. Image SAD SSE Image SAD SSE Avg Table 3.3: Lossless bitrate of proposed compression scheme with SAD and SSE dissimilarity metrics Prediction algorithm We compared performance of our proposed method with other methods described in the literature. Methods in comparison are : i) method 1 : direct CFA image encoding using

59 Chapter 3. Lossless CFA Compression using Prediction 49 JPEG XR, ii) method 2 : direct CFA image encoding using JPEG 2000, iii) method 3 : direct CFA image encoding using JPEG-LS, iv) method 4 : prediction based on separation method [40] in junction with JPEG XR compression, v) method 5 : CMBP predictor based method [41] in junction with JPEG XR compression, vi) method 6 : activity level classification model (ALCM) [54] predictor based method combined with JPEG XR compression, and vii) method 7 : our proposed method. As a basis for performance comparison, we used some representative lossless compression schemes, such as JPEG XR, JPEG 2000, and JPEG-LS, directly on the CFA image in first three methods. Kakadu v.6.4 software implementation is used for JPEG 2000 coding and FFMpeg software is used for JPEG-LS coding. Other methods from 4 to 7 are considered to demonstrate the relationship between accuracy of predictor and the compression efficiency. In method 4, quincunx G channel is separated into two rectangular lattices G1 and G2, and the prediction is carried out by estimating G1 from G2. Non-green channels are directly encoded in color difference domain. The CMBP predictor in method 5 is essentially very similar to our proposed predictor. It initially generates a direction vector map of sample image to determine homogeneous regions and only performs prediction in nonhomogeneous regions with pre-defined weight factors for neighborhood pixels. The ALCM predictor in method 6 estimates a current pixel using a weighted combination of neighbor pixels. Initially equal weights are assigned for all pixels and if previous prediction was higher than the actual pixel value, then the weight of the largest neighbor pixel is decreased by 1/256 and the one for smallest neighbor pixel is increased by the same amount. If previous prediction was lower than the actual pixel value, then the weights of the largest and the smallest neighbor pixels are adjusted in opposite way. Figure 3.12 shows the entropy of sample images from our database associated with different prediction schemes, from method 4 to 7. The entropy of image can be determined

60 Chapter 3. Lossless CFA Compression using Prediction 50 by the formula n H = P i log 2 P i (3.15) i=1 where P i is probability of occurrence of pixel value i and H is the entropy of image. The entropy is evaluated by generating image histogram from the prediction error image of each sample images. Since the entropy of image data determines the theoretical lower bound which can be achieved by lossless compression, we can evaluate the effectiveness of different prediction algorithms. The average entropies of various prediction methods result in , , , and for method 4, 5, 6, and 7, respectively. The proposed method shows the lowest average entropy value, indicating potential high compression efficiency. Figure 3.12: Entropy of sample images from the database with various prediction methods The output compression bitrates of CFA images from our database achieved by various methods are presented in Table 3.4 and Table 3.5. The results clearly show that direct compression of the CFA mosaic image is not efficient. In direct CFA compression

61 Chapter 3. Lossless CFA Compression using Prediction 51 scenario, JPEG 2000 is superior to JPEG XR and JPEG-LS in terms of compression efficiency, outperforming JPEG XR and JPEG-LS in average bitrate by 0.5 and 1.1 bpp, respectively. However, as can be seen, exploiting accurate prediction method allows the JPEG XR equipped pipeline to achieve higher compression ratio than JPEG On Img M1 M2 M3 Img M1 M2 M Avg Table 3.4: Lossless bitrate of various CFA compression schemes (direct CFA encoding schemes) average, the our proposed scheme yields a lossless compression bitrate of bpp for images in our database. The average compression bitrate obtained by other reviewed

62 Chapter 3. Lossless CFA Compression using Prediction 52 Img M4 M5 M6 M7 Img M4 M5 M6 M Avg Table 3.5: Lossless bitrate of various CFA compression schemes (predictive coding schemes) predictors with JPEG XR compression are , , and bpp, for method 4, 5, and 6, respectively. For most of images in database, the proposed method consistently achieves the lowest lossless compression bitrates, proving robustness of the solution in terms of compression efficiency. Apart from the lossless bitrate performance of the proposed solution, its computational complexity is also analyzed in terms of normalized operations, such as addition

63 Chapter 3. Lossless CFA Compression using Prediction 53 (ADD), bit shift (SHF), multiplication (MUL), absolute value (ABS), and comparison (CMP). Table 3.6 presents a summary of number of operations per pixel required to carry out each stage of prediction process. In this analysis, the bilinear interpolation is used for missing G pixel estimation and SAD metric is used for dissimilarity measurement during prediction. It can be seen that performing non-green prediction in the color difference domain instead of the intensity domain increases number of operations for the proposed scheme by 2 addition and 0.5 shift per pixel since the G interpolation and the difference signal estimation stages are unnecessary for the intensity domain. Such a marginal increase in computational cost is considered to be tolerable given that use of the color difference domain yields reduction in average lossless bitrate by 0.5 bpp as shown in Section Stage ADD SHF MUL ABS CMP G sub-image prediction G interpolation (BI) Diff R/B channel estimation Diff R sub-image prediction Diff B sub-image prediction Total Table 3.6: Number of operations per pixel required for the proposed scheme 3.4 Chapter Summary In this chapter, a lossless Bayer CFA compression scheme capable of handling HDR representation is presented. In summary, the following conclusion can be drawn from this chapter: i) the structure separation step reduces high frequency artifacts, leading to high compression efficiency, ii) the proposed weighted template matching predictor

64 Chapter 3. Lossless CFA Compression using Prediction 54 exploits inter-channel and spatial correlation to achieve high compression performance, and iii) the proposed scheme utilizes low complexity building blocks, such as bilinear interpolation, SAD dissimilarity measure, and JPEG XR encoding module, to minimize the computational cost. The image entropy analysis and experimental results indicate that the proposed scheme delivers higher lossless compression performance than other prior-art solutions.

65 Chapter 4 Lossy CFA Compression using Colorspace Conversion 4.1 Introduction The previous chapter presented a HDR CFA compression solution which is reversible so that the original CFA image can be perfectly reconstructed. Despite its advantage of having no loss of information, the proposed lossless scheme do not provide adequate compression ratios for target devices with low data storage. This chapter presents a lossy CFA compression pipeline capable of handling HDR representation, which provide greater compression ratio gains than the lossless scheme at the expense of marginal quality loss. We focused on the Bayer CFA structure as it is the most widely utilized CFA arrangement in the industry. The proposed scheme consists of a color space conversion module, structure conversion step, and thus similar to the approaches discussed in [14, 38, 43, 55]. However, there are three important differences between the proposed scheme and the prior art solutions. First, a novel color space namely YCoCg is used, instead of the YCbCr in order to offer higher compression with reduced computation cost. YCoCg, another variation of luminance-chrominance based color space, offers simplified implementation due 55

66 Chapter 4. Lossy CFA Compression using Colorspace Conversion 56 to its integer based operation [30]. Secondly, we introduce a data adaptive edge-sensing mechanism into the encoding pipeline in order to enhance the quality of reconstructed images, which are generated by the companion decoding pipeline. Contrary to most of the prior art solutions which utilize non-data adaptive or 4-direction based mechanism, the proposed pipeline uses 8-directional system based approach to generate higher quality images at fractionally higher computation cost. Lastly, we make use of the recently standardized image compression, JPEG XR in pipeline to facilitate CFA compression with HDR representation [34]. HDR imaging typically requires 10 to 16 bit per color component to represent image scenes, whereas conventional low dynamic range (LDR) imaging only requires up to 8 bit. Due to its higher precision, HDR capability has recently become one of the key features for high-end digital cameras. However, most of the prior art in CFA compression is limited to codecs applied to conventional 8 bit per color channel image inputs. Such a conventional pipeline disallows the rich visual content afforded by the HDR CFA data as the original HDR data stream is mapped onto an 8 bit equivalent representation prior to applying compression solutions. It would be shown that the proposed CFA compression pipeline produces high quality compressed images while using expensive memory resources efficiently. The rest of this chapter is organized as follows. Section 4.2 presents the new CFA compression pipeline in detail. Experiment results are reported in Section 4.3 and the chapter summary is provided in Section Proposed Algorithm The proposed CFA compression schemes require a series of reversible pre-processing operations prior to applying JPEG XR compression. The pre-processing operations give us full control on color space conversion and pixel arrangement of input images to achieve highly efficient compression performance.

67 Chapter 4. Lossy CFA Compression using Colorspace Conversion 57 Figure 4.1: Overview of the proposed lossy HDR CFA image compression pipeline Initially the CFA image is transformed from the RGB domain into the YCoCg domain to reduce inter-channel redundancy. We advocate the use of the YCoCg color space over commonly used YCbCr since it is shown that YCoCg transform provides higher coding gain at lower computational complexity [30]. The color space conversion requires all three RGB components at each pixel location but the CFA image contains only one at each pixel, so missing two components need to be estimated from adjacent pixels. Our methods use a 8-directional data adaptive CDM to interpolate missing pixels. Following a conventional CDM approaches on Bayer CFA, this algorithm initially perform interpolation on G pixels, followed by interpolation of color difference signals, R-G, and B-G. We then immediately compute the YCoCg image from interpolated G, R-G, and B-G signals. As illustrated in Figure 4.1, two versions of image processing pipelines (IPP) are proposed depending on number of the Y pixels to calculate during this stage. Namely, IPP1 computes Y values at all pixel location to preserve complete edge information. On the other hands, IPP2 reduces computational complexity by keeping only half of the Y values. In both IPPs, only one chrominance pixels are computed for every each 2x2 pixel blocks in the original CFA image. Once color conversion is completed, the YCoCg image

68 Chapter 4. Lossy CFA Compression using Colorspace Conversion 58 is rearranged to a shape more appropriate for the subsequent compression. Since this structure conversion step produces the data output formatted in YUV 4:2:0 for IPP1 and YUV 4:2:2 for IPP2, matching encoding modes provided by JPEG XR codec are applied to them corresponding to output formats. In the companion decoding pipeline, where a final reconstructed RGB image is produced to be rendered in display devices, the sequence of encoding pipeline is reversed. Unlike the encoding pipeline which has to be implemented on the camera, the decoding pipeline can be off-loaded to the end device, such as personal computers (PC). A PC based decoding pipeline can include advanced algorithms to produce high fidelity reconstructed images due to sufficient resources whereas a camera on-chip solution typically exploits less complex algorithms to reduce computational cost and power consumption Interpolation of missing green components In Bayer CFA, the G is a dominant component among three primary colors and suffers the least from an aliasing issue. For this reason, it is common to start estimation of missing pixels from the G components [50, 56, 57]. In our method, we employ an ESM operator, and an inter-channel correlator to reconstruct missing G components. Among several ESM operators, we found that the 8-directional data adaptive algorithm [50, 56] offers high performance at low computation cost, and thus it is exploited in our pipeline. In this algorithm, missing pixel value is computed by weighted sum of neighbor pixels from 8 directions. The estimation of G pixels is formulated as follows: x (i,j)g if z (i,j) = x(i,j)g y (i,j)g = (4.1) (p,q) ζ (w (p,q) x (p,q)g ) otherwise where operator = denotes a one to one relationship, z is the pixel value of the original grayscale CFA image, x (i,j)g is the G pixel value at position (i,j), ζ are the 8 neighborhood pixels of (i,j) such that ζ {(i 1, j), (i, j 1), (i, j + 1), (i + 1, j), (i 1, j 1), (i 1, j +

Chapter 4. Lossy CFA Compression using Colorspace Conversion 59 1), (i + 1, j 1), (i + 1, j + 1)}, and x (i,j)g are the predicted G values of neighborhood pixels obtained using local edge information.

69 Chapter 4. Lossy CFA Compression using Colorspace Conversion 59 1), (i + 1, j 1), (i + 1, j + 1)}, and x (i,j)g are the predicted G values of neighborhood pixels obtained using local edge information. The normalized edge-sensing weights w (p,q) are given by, w (p,q) = w (p,q) / w (m,n) (4.2) (m,n) ζ The original edge-sensing weight factor, w (p,q) is defined in equation(4.3) using inverse gradient w (p,q) = {1 + ( z (p,q) z (r,s) /D(z (p,q), z (r,s) ))} 1 (4.3) (r,s) ζ where D(z (p,q), z (r,s) ) represents spatial distance between two pixel locations and 1 is added in the denominator to avoid singularity issue. The weight factor adaptively reduces the influence of pixels across an edge, and located further from current spatial location, to enhance estimation performance. Figure 4.2: Indexing of the samples within a 5x5 window of Bayer CFA Using the 8-directional data adaptive system, calculation of G components requires a 5x5 support window centered at the missing G location. In Figure 4.2, estimation of G components at location (i, j) requires edge-sensing weight coefficients and predicted G values for 8 adjacent pixels. w (i 1,j), w (i,j+1), w (i+1,j), and w (i,j 1), correspond to the

70 Chapter 4. Lossy CFA Compression using Colorspace Conversion 60 north, east, south, and west weight factors of x (i,j) pixel are defined as: w (i 1,j) = {1 + ( z (i,j) z (i 2,j) + z (i 1,j) z (i+1,j) )/(2)} 1 w (i,j+1) = {1 + ( z (i,j) z (i,j+2) + z (i,j+1) z (i,j 1) )/(2)} 1 w (i+1,j) = {1 + ( z (i,j) z (i+2,j) + z (i+1,j) z (i 1,j) )/(2)} 1 (4.4) w (i,j 1) = {1 + ( z (i,j) z (i,j 2) + z (i,j 1) z (i,j+1) )/(2)} 1 w (i 1,j 1), w (i 1,j+1), w (i+1,j+1), and w (i+1,j 1), correspond to the north-west, north-east, south-east, and south-west weight factors are defined as, w (i 1,j 1) = {1 + ( z (i,j) z (i 2,j 2) + z (i 1,j 1) z (i+1,j+1) )/(2 2)} 1 w (i 1,j+1) = {1 + ( z (i,j) z (i 2,j+2) + z (i 1,j+1) z (i+1,j 1) )/(2 2)} 1 w (i+1,j+1) = {1 + ( z (i,j) z (i+2,j+2) + z (i+1,j+1) z (i 1,j 1) )/(2 2)} 1 (4.5) w (i+1,j 1) = {1 + ( z (i,j) z (i+2,j 2) + z (i+1,j 1) z (i 1,j+1) )/(2 2)} 1 Similar to computation of edge-sensing weights, computation of predicted values of G pixel around x (i,j) differentiates horizontal/vertical and diagonal directions. For horizontal and vertical directions, predicted G pixel values are given by, x (i 1,j)G = x (i 1,j)G + (z (i 2,j) z (i,j) + z (i 1,j) z (i+1,j) )/(4) x (i,j+1)g = x (i,j+1)g + (z (i,j+2) z (i,j) + z (i,j+1) z (i,j 1) )/(4) x (i+1,j)g = x (i+1,j)g + (z (i+2,j) z (i,j) + z (i+1,j) z (i 1,j) )/(4) (4.6) x (i,j 1)G = x (i,j 1)G + (z (i,j 2) z (i,j) + z (i,j 1) z (i,j+1) )/(4)

71 Chapter 4. Lossy CFA Compression using Colorspace Conversion 61 For diagonal direction, they are defined as follows: x (i 1,j 1)G = {x (i 1,j)G + x (i,j 1)G + (z (i 1,j 1) z (i+1,j+1) )/(2 2) + (z (i 2,j) + z (i,j 2) 2 z (i,j) )/(4)}/2 x (i 1,j+1)G = {x (i 1,j)G + x (i,j+1)g + (z (i 1,j+1) z (i+1,j 1) )/(2 2) + (z (i 2,j) + z (i,j+2) 2 z (i,j) )/(4)}/2 x (i+1,j+1)g = {x (i+1,j)g + x (i,j+1)g + (z (i+1,j+1) z (i 1,j 1) )/(2 2) (4.7) + (z (i+2,j) + z (i,j+2) 2 z (i,j) )/(4)}/2 x (i+1,j 1)G = {x (i+1,j)g + x (i,j 1)G + (z (i+1,j 1) z (i 1,j+1) )/(2 2) + (z (i+2,j) + z (i,j 2) 2 z (i,j) )/(4)}/2 By substituting normalized edge-sensing weight factors and predicted G pixel values into equation(4.1), missing G pixels are estimated and full G channel is constructed Interpolation of color difference components We perform interpolation in the color difference domain, R-G and B-G, instead of the original R and B intensity domain. Image signals in the color difference domain are generally smoother than ones in the intensity domain, thus, are more suitable for linear interpolation. The difference signal R-G is obtained as follows: x (i,j)r y (i,j)g y (i,j)rg = (p,q) ζ 1 (w(p,q) y (p,q)rg) (p,q) ζ 2 (w (p,q) y (p,q)rg) if z (i,j) = x(i,j)r if z (i,j) = x(i,j)g if z (i,j) = x(i,j)b (4.8) ζ 1 {(i 1, j), (i, j 1), (i, j + 1), (i + 1, j)} ζ 2 {(i 1, j 1), (i 1, j + 1), (i + 1, j 1), (i + 1, j + 1)} where y (i,j)rg is the estimated R-G value at pixel (i, j), y (i,j)g is the estimated G value from previous stage, and ζ 1 and ζ 2 are horizontal/vertical and diagonal neighbor pixels

72 Chapter 4. Lossy CFA Compression using Colorspace Conversion 62 of (i, j) respectively. Here, w and w are renormalized edge-sensing weights for horizontal/vertical and diagonal directions, respectively. The B-G signal, y (i,j)bg, can be calculated using the same technique as follows: (p,q) ζ 2 (w (p,q) y (p,q)bg) y (i,j)bg = (p,q) ζ 1 (w(p,q) y (p,q)bg) x (i,j)b y (i,j)g if z (i,j) = x(i,j)r if z (i,j) = x(i,j)g if z (i,j) = x(i,j)b (4.9) Correction of green and color difference components The correction operation utilizes correlation between color channels and edge information to enhance estimation accuracy. The correction mechanism initially updates G as follows: x (i,j)r (p,q) ζ 1 (w(p,q) y (i,j)g = y (p,q)rg) if z (i,j) = x(i,j)r x (i,j)b (4.10) (p,q) ζ 1 (w(p,q) y (p,q)bg) if z (i,j) = x(i,j)b Then, the corresponding color difference signals, R-G and B-G, at corrected G pixel positions are also updated as follows: y (i,j)rg = x (i,j)r y (i,j)g,if z (i,j) = x(i,j)r y (i,j)bg = x (i,j)b y (i,j)g,if z (i,j) = x(i,j)b (4.11) Finally R-G and B-G planes are corrected using the same formula given by equation(4.8) and (4.9). This simple iteration reduces false color estimation and blurred edges while preserving original z values of CFA data. [50]

Chapter 4. Lossy CFA Compression using Colorspace Conversion 63 4.2.4 YCoCg color conversion The G, R-G, and B-G planes are fully populated through previous stages.

73 Chapter 4. Lossy CFA Compression using Colorspace Conversion YCoCg color conversion The G, R-G, and B-G planes are fully populated through previous stages. Color space conversion from the RGB domain to the YCoCg domain is given by: y (i,j)y = 1 4 y (i,j)r y (i,j)g y (i,j)b = (y (i,j)rg + y (i,j)bg + 4 y (i,j)g ) 4 y (i,j)co = 1 2 y (i,j)r 1 2 y (i,j)b = y (i,j)rg y (i,j)bg 2 y (i,j)cg = 1 4 y (i,j)r y (i,j)g y (i,j)b = ( y (i,j)rg y (i,j)bg ) 4 (4.12) It should be noted that calculating full resolution of three channels, Y, Co, and Cg, will triple number of pixels to compress compared to the ones in original CFA image. We propose two methods to reduce number of pixels to compress. (a) Color space conversion for IPP1 (b) Color space conversion for IPP2 Figure 4.3: Two versions of color space conversion The first method, IPP1, preserves four Y, one Co, and one Cg components for every 2x2 CFA pixels. This process reduces the spatial resolution of chrominance (chroma) channels by 75 percent, but still allows us to maintain high image quality by keeping full Y plane, which is perceptually more significant than chroma planes. In order to reduce the spatial resolution of Co and Cg, a chroma subsampling is applied. Here, we discard three chroma pixels from each 2x2 block for simplicity. After subsampling, the spatial resolutions of chroma channels are halved in both horizontal and vertical direction. The second method, IPP2, further reduces number of pixels to compress by discarding half of Y pixels. It calculates Y pixels only at the G positions of the original CFA image. It is because, the G is the dominant color in Y calculation and distortion can be

74 Chapter 4. Lossy CFA Compression using Colorspace Conversion 64 minimized by using reliable original G samples instead of interpolated ones. [43] Two chroma channels of IPP2 are subsampled in the same manner as IPP Structure conversion Since image compression standards typically only allow rectangular patterns as inputs, a structure conversion process is necessary. During this stage, the quincunx Y channel in IPP2 is rearranged into a rectangular array by up-shifting every Y pixels located in even rows by 1 pixel. It should be noted this step is unnecessary for IPP1 as Y pixels already arranged in rectangular grid. For both IPP1 and IPP2, the Co and Cg pixels are pressed together to form rectangular arrays. After structure conversion, the YCoCg data in IPP1 constitutes the standard YUV 4:2:0 format, and thus, can be compressed by applying YCC 4:2:0 mode of JPEG XR encoding. Similarly, the rearranged YCoCg data in IPP2, formatted in YUV 4:2:2, can be compressed by YCC 4:2:2 mode of JPEG XR encoding. 4.3 Experimental Results The performance of the proposed solution is examined in following sequences. RGB images of 16-bit per component representation from the Para-Dice Insight Compression Database [51] in Figure 3.10 are initially resized to 960x640. The resized test images o : Z 2 Z 3 are sampled by the Bayer CFA to produce the CFA images z : Z 2 Z. The CFA images z are then preprocessed using the proposed pipelines and compressed into JPEG XR format c by JPEG XR reference software [52]. The reconstruct RGB images x : Z 2 Z 3 to be displayed to the end-user are generated by applying JPEG XR decompression to the compressed data c followed by processing operations in reverse order. In our experiments, we apply the bilinear interpolation to estimate missing Y, Co, and Cg components in the decoding pipeline. The reconstructed image x should be as

75 Chapter 4. Lossy CFA Compression using Colorspace Conversion 65 close as possible to the desired RGB image o. We modified the reference software to allow 16-bit per component YUV 4:2:0 data as inputs for raw encoding mode. This modification allows us to simulate IPP1. The JPEG XR codec is configured in following manners: i) all subbands (DC, LP, and HP) and flexbits are preserved during encoding, ii) first level overlapping mode is used for the pre-filter function, and iii) the bit rate of encoded image is controlled by adjusting quantization variables. Uniform quantization parameters are used for all three subbands and color channels. To evaluate the performance of the proposed solutions, image quality is measured by comparing o and x using three quality assessment (QA) metrics: i) Composite Peak Signal to Noise Ratio (CPSNR), ii) Multi-scale Structural Similarity Index (MSSIM) [48], and iii) High Dynamic Range Visible Difference Predictor (HDR-VDP) [49]. CPSNR is defined as follows : CP SNR = 10 log 10 ((2 16 1) 2 1 /( 3K 1 K 2 3 K 1 K 2 (o (r,s)k x (r,s)k ) 2 )) (4.13) k=1 r=1 s=1 where B stands for bit depth. Although, CPSNR is widely used in literatures, it has poor correlation with perceived quality. Therefore, we include human visual system (HVS) modeling oriented metrics, multi-scale MSSIM and HDR-VDP. MSSIM initially decomposes a test image into several scales and provides statistics by measuring luminance, contrast, and structure information of each sub-scale image. It is generally evaluated by assigning different weights to color channels, and represented in db scale as follows: MSSIM = 20 log 10 {(w Y MSSIM Y ) + (w Cb MSSIM Cb ) + (w Cr MSSIM Cr )} 1 (4.14) In this report, the weight coefficients for each channel of MSSIM are selected to be w Y = 0.95, w Cb = 0.02, and w Cr = 0.03, following suggested usage from previous publications [36, 58].

76 Chapter 4. Lossy CFA Compression using Colorspace Conversion 66 The VDP metric predicts pixel percentage of a test image that standard observers would perceive as different from an original. The HDR-VDP deploys several HVS characteristics into VDP to enhance its prediction accuracy in full visible range of luminance. It is specifically tuned to support HDR images, and widely adopted in the comparison of HDR images, and thus, we make use of HDR-VDP in reporting experimental results. Similar to the MSSIM metric, HDR-VDP has been plotted in the db scale as follows: HDR V DP = 20 log(1/r) (4.15) where r denotes the ratio of pixels that standard observers would perceive as different from the original. Results reported in following sections are obtained from a wide range of compression ratio values by averaging computed image quality of test images Edge Sensing Mechanism (ESM) and Compression The rate distortion performance of CFA compression pipelines with various ESMs are illustrated in Figure 4.4. The ESMs under consideration include the bilinear interpolation (BI), the Laplacian interpolation (HA) [59], and the 8-directional data adaptive interpolation (ESCC), which is deployed in our proposed pipeline. These ESM schemes represent simple to complicated in terms of computational costs and vary in the quality of image they produce. The BI interpolation is a typical example of non data adaptive estimator that utilizes fixed edge-sensing weight factors for missing pixel estimation. The HA interpolation is a classical edge-directed interpolator using the second order gradients as the correction terms. Those two algorithms are often used as benchmark algorithms in literatures [14, 38, 43] and thus we compare the performance of my solution against them. For IPP1 pipeline, the ESCC outperforms other ESMs throughout almost entire bit rate range in all three quality metrics. The HA provides slightly higher CPSNR gain

77 Chapter 4. Lossy CFA Compression using Colorspace Conversion 67 than the ESCC at low bit rate, but other two perceptual metrics, strongly correlated to visual perception, indicate that the ESCC is superior than the HA. It implies that utilizing sophisticated ESM enhances the rate distortion performance of CFA compression pipeline. However, as bit rate decreases, the selection of ESM has less impact on the performance. At low bit rate, as shown in Figure 4.4, the ESSC provides almost identical compression performance to the HA, and is still more efficient than the BI, although the improvement is not as significant as the one at high bit rate. It is because the advanced ESMs are more sensitive to fine edge detail, susceptible to compression errors, than low complexity ones. For IPP2 pipeline, not all error criteria show consistent results. CPSNR and MSSIM metrics indicate that the ESCC achieves the best performance over almost entire range of bit rates, while HDR-VDP metric indicates the HA outperforms the ESCC at the bit rate range higher than 3 bits per pixel (bpp). This observation shows that the ESSC ESM is more optimized for IPP1 than IPP2. Such suboptimal compression performance of IPP2 is caused by artificial high frequency components introduced in structure conversion stage. Similar to IPP1 case, advanced ESMs in IPP2 provide less benefit in terms of compression efficiency as bit rate decreases Color Space and Compression The Figure 4.5 demonstrates rate distortion curves of the proposed scheme in junction with the RGB-YCoCg conversion, and two other variants, including the RGB-YCbCr and the JPEG2000 reversible color transform (RCT). The RGB-YCbCr conversion is commonly used color conversion in CFA compression pipeline [14, 43] and thus, we consider it as a reference method. The JPEG2000 RCT is considered in comparison since it features low complexity as the YCoCg, requiring only addition and shift operations in computation. Our experimental results show that all three color space variants produce nearly

78 Chapter 4. Lossy CFA Compression using Colorspace Conversion 68 (a) CPSNR for IPP1 (b) MSSIM for IPP1 (c) HDR-VDP for IPP1 (d) CPSNR for IPP2 (e) MSSIM for IPP2 (f) HDR-VDP for IPP2 Figure 4.4: Rate-distortion curves of proposed pipelines with different ESMs for various quality metrics

79 Chapter 4. Lossy CFA Compression using Colorspace Conversion 69 (a) CPSNR for IPP1 (b) MSSIM for IPP1 (c) HDR-VDP for IPP1 (d) CPSNR for IPP2 (e) MSSIM for IPP2 (f) HDR-VDP for IPP2 Figure 4.5: Rate-distortion curves of proposed pipelines with different color spaces for various quality metrics

80 Chapter 4. Lossy CFA Compression using Colorspace Conversion 70 identical performance for both IPP1 and IPP2. The YCoCg slightly outperforms other two methods in MSSIM and HDR-VDP metrics, but results in a small loss (maximum 0.2dB) in CPSNR measure compared to the YCbCr. Since the YCoCg space offers marginally higher perceptual metric performance at low complexity, among reviewed, it is the most efficient choice for our CFA compression pipeline implementation Proposed Pipeline and Conventional Pipelines The Figure 4.6 compares the rate distortion performance of our proposed pipelines, IPP1 and IPP2, against other variants. Namely, IPP3 represents the conventional workflow, that initially demosaicks the CFA image via the ESSC CDM and then compresses the resultant RGB image. The compressed image is decoded and displayed. IPP4 firstly encodes the CFA image directly without any pre-processing operations. The full RGB image is obtained by demosaicking the decoded CFA image using the ESCC CDM. The combination of two new pipelines with two codecs, JPEG XR and JPEG 2000, allows us to test four new solutions in addition to our methods. For JPEG 2000 coding, the JasPer software implementation [60] is used. Comparison to the conventional JPEG is omitted due to its lack of support for 16-bit per components input. Experimental result shows that IPP1 consistently outperforms IPP3 and IPP4 in all three quality measures at high bit rates, above 8 bpp, regardless of the used codec. Also, IPP1 substantially outperforms IPP2 at high bit rates. For mid-range bit rate, between 2 and 8 bpp, IPP4 provides the best image quality. At low bit rate, all three metrics show that IPP3 produces images of superior quality than other pipelines. At low bit rate IPP2 outperforms IPP1 in terms of the rate distortion performance. There are two reasons behind this. Higher compression removes more texture and edge details and thus it reduces the high frequency artifacts generated during quincunx to rectangular array conversion of Y pixels in IPP2. Consequently, the reduction of the high frequency components leads to improvement of the compression efficiency. In addition,

81 Chapter 4. Lossy CFA Compression using Colorspace Conversion 71 (a) CPSNR for various IPPs (b) MSSIM for various IPPs (c) HDR-VDP for various IPPs at below 4 bpp (d) HDR-VDP for various IPPs at above 4 bpp Figure 4.6: Rate-distortion curves of the proposed pipelines and 4 other pipelines for various image quality metrics the smaller input size of IPP2 results in giving better performance. Conversely, at high bit rate the aliasing in IPP2 disallows efficient coding and the reduction of Y pixels results in poor edge restoration. Thus, high bit rate favors IPP1 while low bit rate favors IPP2. Figure 4.7 allows the visual evaluation of pipelines via the sub-region of reconstructed images generated at low bit rate between 1 and 2 bpp. We can observe that IPP2 and IPP3 maintain acceptable visual quality even under high compression ratio, whereas IPP1 and IPP4 suffer from various visual artifacts. Images generated by IPP4 at low

82 Chapter 4. Lossy CFA Compression using Colorspace Conversion 72 bit rate are significantly distorted by lattice patterned artifacts. This unpleasing texture appears for IPP4 with both JPEG 2000 and JPEG XR. Applying high compression on CFA data removes edge information required for CDM and introduces noise which can misguide ESM operators to generate false weight factors. Thus, advanced ESMs, typically more sensitive to edge detail, may not produce acceptable quality images out of highly compressed CFA data. The conventional workflow, IPP3, does not suffer from such problem at low bit rate since CDM is done prior to compression. In addition, demosaicked data typically have higher inter-pixel correlation than CFA data, enabling more efficient compression. For these reasons, IPP3 works well at low bit rate, providing almost same rate distortion performance as our proposed IPP2 pipeline. In Figure 4.6, the perceptual metrics, MSSIM and HDR-VDP, indicate that IPP4 results in lower quality gain than IPP2 and IPP3, providing us consistent results as the visual inspection. Our experimental results show that the compression performance of JPEG XR and JPEG 2000 is very close to each other and generally JPEG 2000 is slightly superior but the gain is marginal. It can be seen in Figure 4.6 that for both IPP3 and IPP4, the use of JPEG 2000 instead of JPEG XR compression slightly improves the rate distortion performance over wide bit rate ranges in all three metrics. Apart from the rate distortion performance, we also report the average encoding time per image in millisecond for different combinations of pipelines and codecs in Table 4.1. Experimental results, averaged over the image sets, are obtained on an Intel Core 2 Duo 2.53GHz CPU with 4GB RAM running Windows 7 operating system. For the CFA input of size K 1 K 2, number of pixels to encode in IPP1, IPP2, IPP3, and IPP4 are 1.5 K 1 K 2, K 1 K 2, 3 K 1 K 2, and K 1 K 2 pixels, respectively. The result shows that the encoding delay for each pipeline is proportional to the number of pixels in input data. This observation clearly shows a trade-off between the quality and the complexity. At low bit rate, as shown in Figure 4.6, IPP2 performs significantly better than IPP1 and almost comparable to IPP3 in terms of image quality. The average encoding speed

First 4 images are sub-regions of the image 18, next 4 images are from the image 21, and last 4 images are from the image 1 in the

83 Chapter 4. Lossy CFA Compression using Colorspace Conversion 73 (a) IPP1 (b) IPP2 (c) IPP3 (d) IPP4 (e) IPP1 (f) IPP2 (g) IPP3 (h) IPP4 (i) IPP1 (j) IPP2 (k) IPP3 (l) IPP4 Figure 4.7: Full color images obtained from four examined IPPs with JPEG XR codec at bit rate between 1 and 2 bpp. First 4 images are sub-regions of the image 18, next 4 images are from the image 21, and last 4 images are from the image 1 in the database of IPP2 is considerably faster than either IPP1 and IPP3 with JPEG XR encoding. Therefore, on the condition that small quality loss is tolerable for reduction in encoding delay, low complexity IPP2 solution is desirable. According to Table 4.1, JPEG XR substantially faster than JPEG 2000 by 3.5 to 4 times in compression speed. It is important to note that direct comparison of compression

Module 6 STILL IMAGE COMPRESSION STANDARDS

Module 6 STILL IMAGE COMPRESSION STANDARDS Lesson 16 Still Image Compression Standards: JBIG and JPEG Instructional Objectives At the end of this lesson, the students should be able to: 1. Explain the