Towards Real-time Hardware Gamma Correction for Dynamic Contrast Enhancement

Similar documents
Figure 1 HDR image fusion example

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

Digital Integrated CircuitDesign

CHAPTER 1 INTRODUCTION

Image processing. Image formation. Brightness images. Pre-digitization image. Subhransu Maji. CMPSCI 670: Computer Vision. September 22, 2016

Digital Image Processing

Sensors and Sensing Cameras and Camera Calibration

Computer Arithmetic (2)

Low Power VLSI CMOS Design. An Image Processing Chip for RGB to HSI Conversion

Image Enhancement in Spatial Domain

Computer Vision. Intensity transformations

Comparing Exponential and Logarithmic Rules

Implementation of an IFFT for an Optical OFDM Transmitter with 12.1 Gbit/s

Image Recognition for PCB Soldering Platform Controlled by Embedded Microchip Based on Hopfield Neural Network

TDI2131 Digital Image Processing

Digital Image Processing. Lecture # 3 Image Enhancement

Vector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

Digital Image Processing

Embedded Architecture for Object Tracking using Kalman Filter

CS 445 HW#2 Solutions

Image Extraction using Image Mining Technique

Image Enhancement using Histogram Equalization and Spatial Filtering

Color Transformations

ECC419 IMAGE PROCESSING

A New Lossless Compression Algorithm For Satellite Earth Science Multi-Spectral Imagers

Image Processing Lecture 4

Filtering. Image Enhancement Spatial and Frequency Based

EE521 Analog and Digital Communications

CS302 - Digital Logic Design Glossary By

IMAGE ENHANCEMENT - POINT PROCESSING

Implementing Multipliers with Actel FPGAs

Reconfigurable High Performance Baugh-Wooley Multiplier for DSP Applications

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

WFC3 TV3 Testing: IR Channel Nonlinearity Correction

A Comparison of the Multiscale Retinex With Other Image Enhancement Techniques

USE OF HISTOGRAM EQUALIZATION IN IMAGE PROCESSING FOR IMAGE ENHANCEMENT

Application Note #AN-00MX-002

Arithmetic Structures for Inner-Product and Other Computations Based on a Latency-Free Bit-Serial Multiplier Design

Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates

Local Adaptive Contrast Enhancement for Color Images

FIR_NTAP_MUX. N-Channel Multiplexed FIR Filter Rev Key Design Features. Block Diagram. Applications. Pin-out Description. Generic Parameters

Session 5 Variation About the Mean

An Analysis of Multipliers in a New Binary System

Single Chip FPGA Based Realization of Arbitrary Waveform Generator using Rademacher and Walsh Functions

IMAGE PROCESSING: POINT PROCESSES

MULTIRATE IIR LINEAR DIGITAL FILTER DESIGN FOR POWER SYSTEM SUBSTATION

CS 376b Computer Vision

Modular arithmetic Math 2320

You could identify a point on the graph of a function as (x,y) or (x, f(x)). You may have only one function value for each x number.

AUTOMATED INSPECTION SYSTEM OF ELECTRIC MOTOR STATOR AND ROTOR SHEETS

Contrast Image Correction Method

Continuous Flash. October 1, Technical Report MSR-TR Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

Techniques for Implementing Multipliers in Stratix, Stratix GX & Cyclone Devices

DSP VLSI Design. DSP Systems. Byungin Moon. Yonsei University

Logarithmic Circuits

CoE4TN4 Image Processing. Chapter 3: Intensity Transformation and Spatial Filtering

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

ATA Memo No. 40 Processing Architectures For Complex Gain Tracking. Larry R. D Addario 2001 October 25

DIGITAL IMAGE PROCESSING (COM-3371) Week 2 - January 14, 2002

FIR System Specification

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Piecewise Linear Circuits

Design of Digital FIR Filter using Modified MAC Unit

GREATER CLARK COUNTY SCHOOLS PACING GUIDE. Algebra I MATHEMATICS G R E A T E R C L A R K C O U N T Y S C H O O L S

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Digital Logic, Algorithms, and Functions for the CEBAF Upgrade LLRF System Hai Dong, Curt Hovater, John Musson, and Tomasz Plawski

Document Processing for Automatic Color form Dropout

Images and Displays. Lecture Steve Marschner 1

Logarithms ID1050 Quantitative & Qualitative Reasoning

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

MA10103: Foundation Mathematics I. Lecture Notes Week 3

IMPLEMENTATION OF VLSI BASED ARCHITECTURE FOR KAISER-BESSEL WINDOW USING MANTISSA IN SPECTRAL ANALYSIS

TurboDrive. With the recent introduction of the Linea GigE line scan cameras, Teledyne DALSA is once again pushing innovation to new heights.

DIGITAL IMAGE PROCESSING Quiz exercises preparation for the midterm exam

SECTION I - CHAPTER 2 DIGITAL IMAGING PROCESSING CONCEPTS

Image Processing. 2. Point Processes. Computer Engineering, Sejong University Dongil Han. Spatial domain processing

Bode plot, named after Hendrik Wade Bode, is usually a combination of a Bode magnitude plot and Bode phase plot:

Study on Digital Multiplier Architecture Using Square Law and Divide-Conquer Method

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

Keywords: Adaptive filtering, LMS algorithm, Noise cancellation, VHDL Design, Signal to noise ratio (SNR), Convergence Speed.

AutoBench 1.1. software benchmark data book.

Using FPGA. Warin Sootkaneung Department of Electrical Engineering. and

Stratix II DSP Performance

REALIZATION OF FPGA BASED Q-FORMAT ARITHMETIC LOGIC UNIT FOR POWER ELECTRONIC CONVERTER APPLICATIONS

8.1 Exponential Growth 1. Graph exponential growth functions. 2. Use exponential growth functions to model real life situations.

Design of 2 4 Alamouti Transceiver Using FPGA

High-Speed Hardware Efficient FIR Compensation Filter for Delta-Sigma Modulator Analog-to-Digital Converter in 0.13 μm CMOS Technology

Chapter 8. Representing Multimedia Digitally

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Image Quality Assessment for Defocused Blur Images

FPGA Based Hardware Efficient Digital Decimation Filter for - ADC

Advanced Digital Signal Processing Part 5: Digital Filters

Digital Applications (CETT 1415) Credit: 4 semester credit hours (3 hours lecture, 4 hours lab) Prerequisite: CETT 1403 & CETT 1405

Face Detection System on Ada boost Algorithm Using Haar Classifiers

The Noise about Noise

A Comparison of Two Computational Technologies for Digital Pulse Compression

Image Enhancement (from Chapter 13) (V6)

Transcription:

Towards Real-time Gamma Correction for Dynamic Contrast Enhancement Jesse Scott, Ph.D. Candidate Integrated Design Services, College of Engineering, Pennsylvania State University University Park, PA jus2@engr.psu.edu Michael Pusateri, Ph. D. and Director Integrated Design Services, College of Engineering, Pennsylvania State University University Park, PA mpusateri@engr.psu.edu Abstract Making the transition between digital video imagery acquired by a focal plane array and imagery useful to a human operator is not a simple process. The focal plane array sees the world in a fundamentally different way than the human eye. Gamma correction has been historically used to help bridge the gap. The gamma correction process is a non-linear mapping of intensity from input to output where the parameter gamma can be adjusted to improve the imagery s visual appeal. In analog video systems, gamma correction is performed with analog circuitry and is adjusted manually. With a digital video stream, gamma correction can be provided using mathematical operations in a digital circuit. In addition to manual control, gamma correction can also be automatically adjusted to compensate for changes in the scene. We are interested in applying automatic gamma correction in systems such as night vision goggles where both low latency and power efficiency are important design parameters. We present our results in developing an automatic gamma correction algorithm to meet these requirements. The algorithm is comprised of two parts, determination of the desired value for gamma and the application of the correction. The calculation of the gamma value update is performed based upon statistical metrics of the imagery s intensity. HDL code implementing the measurement of the statistical metrics has been developed and tested in hardware. Both the computation of a gamma update and the application of the gamma correction were simplified to basic arithmetic operations and two specialized functions, logarithm and exponentiation of a constant base by a variable exponent. We present approximation methods for both specialized functions simplifying their implementation into basic arithmetic operations. The hardware implementations of the approximations allow the above requirements to be met. We evaluate the accuracy of the approximations as compared to full resolution double-precision floating point mathematical operations. We present the final results for visual judging to evaluate the impact of the approximations. I. BACKGROUND A. Problem Statement In addition to its other uses, gamma correction is an effective tool for manipulating the histogram of an image that is either over or under exposed, but not fully compromised with saturation. While it is available as a tool in most image processing software, the functions used to implement gamma correction have relatively complex implementations in hardware. We present an implementation that approximates gamma correction with a satisfactory level of visual quality, but with a tractable hardware implementation. Our design is meant to be capable of providing real-time, pixel-serial gamma correction to imagery generated by a focal plane array with an active area of 6x2 pixels and a pixel clock rate in excess of 5 MHz utilizing commercially available FPGA hardware. B. Gamma Correction Gamma correction is an intensity transformation that takes the form of a generalized power law with equivalent range and domain. Letting x [,] represent the intensity domain and y represent the intensity range, a gamma correction transform is described by: y=x () where γ>. For gamma γ<, gamma compression occurs, moving the intensity histogram to the right. For gamma γ>, gamma expansion occurs, moving the histogram to the left. Figure shows a normal probability distribution with both a gamma compressed and gamma expanded version of the distribution. corrections of a normal distribution. 3.5 Intensity Probability 3 2.5 2.5 =.5 = 4 =.25.2.4.6.8 Normalized Pixel Intensity Figure Gamma correction applied to probability distributions Figure 2 shows the visual impact of changing gamma by presenting the same image with three different values of

gamma [3]. We see the general progression of overall lightening of the image as gamma increases. In this sequence, the most pronounced change can be observed between the image with γ= and the image with γ=2. The whole underwater ridge is ill defined and shadowy with γ=, however, with γ=2, most of its features become sharply defined. A small prominence at the left end of the ridge is not visible with γ= while its left edge becomes well defined with γ=2. Note that both the left and right edges of the prominence are well defined with γ=4, but, this is at the expense of the bulk of the image looking overexposed. equivalent form of the gamma correction transform is described by: y= y (2) The introduction of the terms x and y allow us to also use the transformation to adjust the gray scale domain of the imagery from that of the focal plane array to the range of the display. In order to automate the process of gamma correction, we need to define a metric, derived from the imagery, allowing the automatic determination of gamma for an image. We have chosen to utilize imagery statistics collected on each frame although they are not the only possible choice. When utilizing imagery statistics, we must introduce the real-time compromise that we either introduce a frame of latency to allow application of the actual imagery statistics or we utilize statistics from the previous frame to process the current frame. Our implementation uses the latter choice; however, it does not impact the mathematical development of the module. Our imagery metric is used to determine a set point for the domain, denoted x, that is mapped to a pre-determined set point for the range, y. The set point pair (x,y ) can then be used to compute an appropriate value of gamma for the frame as: γ= = ( ) ( ). (3) ( ) ( ) The computation of gamma needs to occur only once per frame. In examining how to actually implement the gamma correction transform given in (2), it became clear that implementing a function that computed the result of a variable base raised to a variable exponent was not feasible within our hardware limitations. To overcome this problem, we found it useful to rewrite (2) in the form: y= y2 ( ( ) ( )). (4) Given this representation, the implementation problem is simplified to computing a variable exponentiation of base two and computation of log base two, both with variable argument. While different bases would provide identical results, we selected base two due to hardware considerations. Both of these calculations have tractable hardware implementations. Figure 2 Effects of gamma correction on grayscale imagery II. DESIGN A. Modifications for Implementation For real-time applications with actual hardware, it is useful to consider a slightly modified version of the gamma correction transformation. Letting x [, x] represent the intensity domain and y [, y] represent the intensity range, an B. Computation of log 2 (s) For our problem, the general computation of log (s) is done over a limited, but potentially large range: s [, 2 ) (5) where p is an integer. We handle s = as a special case. Imaging systems typically require p 6; that is, we need to find the logarithm over to 6 octaves. We can express the logarithm argument as:

s=q2 (6) where q is a positive real number and where we can find: and p=floor(log (s)) (7) q=,q [, 2) (8) In hardware, finding p can be easily accomplished by finding the largest nonzero bit in s. Finding q is accomplished by simple right shifting of s. We can now find: log (s) =p+log (q) (9) where the log (q) term can be approximated on its small range using a function with an acceptable hardware implementation. C. Approximation of log 2 (q) There are many well known ways to approximate a logarithm over a closed interval. After evaluation of several methods, we chose direct fitting of a fixed order polynomial to the function over the desired interval. This provided a significant improvement in accuracy over a comparable order Taylor series approximation. It was also as accurate as a comparable order Padé approximation [], but does not require a highprecision arbitrary argument division of the Padé. Arbitrary argument division was avoided because of its significant hardware requirements. Based on a propagation of error analysis, we chose to implement the approximation to obtain precision at eight places to the right of the binary point. With this requirement, we found that a directly fitted third-order polynomial with 6- bit signed coefficients was sufficient. Using MATLAB to generate an initial direct fit of the logarithm with floating point coefficients, we then truncated to scaled integer coefficients using a simple iterative search to find the minimum error [2]. The approximation is given as: log (q) () (26q 8435)q + 24666 q 748 2 where the equation is staged for three multiply accumulate (MAC) operations and the coefficients are shown as their decimal values. The absolute error of the approximation is shown in Figure 3; it meets our precision requirements within an acceptable margin. Because the log (q) approximation is additive in (9), the approximation error in log (s) repeats every decade of p. D. Computation of 2 s For our problem, the general computation of 2 is done over a limited but potentially large range: s [p, ) () where p is a negative integer. Our goggle systems typically produce 6 p. We can express the exponent as: s=p+q (2) where q is a positive real number and where we can find: and.4 x -3.2.8.6.4.2 p=floor(log (s)) (3) q=s p,q [, ) (4) Error for log(q)..2.4.6.8 2 Input range, q Figure 3 Error of the polynomial approximation of log (q) In hardware, finding p can be easily accomplished by finding the position of the largest nonzero bit in s with respect to the binary point. Likewise, finding q is accomplished by subtraction. We can now find: 2 =2 2 (5) The operation of finding 2 on its small range is performed using an approximation function with an acceptable hardware implementation. The multiplication of the result by 2 is then handled by simple bit shifting. E. Approximation of 2 q After evaluation of several of the well known methods of approximating an exponentiation with constant base and variable exponential, we again chose direct fitting of a fixed order polynomial to the function over the desired interval. Again, the accuracy improvement over a comparable order Taylor series approximation was significant. Based on a propagation of error analysis, we chose to implement the approximation to obtain precision at eleven places to the right of the binary point. With this requirement, we found that a direct fitted third-order polynomial with 6-bit unsigned coefficients was sufficient. Using MATLAB to generate an initial direct fit of the logarithm with floating point coefficients, we then truncated to scaled integer coefficients using a simple iterative search to find the minimum error [2]. The approximation is given as:

2 (579q + 4689)q + 45668 q + 65524 2 (6) where the equation is staged for three MAC operations and the coefficients are shown as their decimal values. The absolute error of the approximation is shown in Figure 4; it meets our precision requirements with some margin. Because the 2 approximation is multiplicative in (5), the approximation error in 2 changes with the magnitude of p and is presented in Figure 5. x -4 Error for 2 q. Comparison of the approximation to the floating point result for the duration of the video shows that the vast majority of the pixel intensities matched within -bit of round-off. However, we also found anomalous intensities with tens of bits difference between exact and approximation. We are investigating the source of these errors as they are not consistent with our error prediction. Results frame from the quad video sequence.2.4.6.8 Input range, q 5 5 2 Figure 4 Error of the polynomial approximation of 2 Error for 2 s. 45 4 35 3 25 2 5 5 Input range, s Figure 5 Error of the polynomial approximation of 2 III. RESULTS We tested a MATLAB implementation of our approximate gamma correction transform versus a full floating point implementation using a thermal video sequence. The video was captured using a Video IR long wave infrared camera, sensitive between 8 and 4 µm. The camera utilizes an active array area of 64x48 and an intensity depth of 4-bits. Figure 6 below shows a typical source frame, top left, a linear correction, top right, the floating point exact gamma correction, bottom left, and the approximate gamma correction, bottom right. The gamma correction shown here represents an extreme for the error conditions of the approximation; it was not chosen for visual improvement. /7/2 5 Figure 6 Frame from video sequence IV. ANALYSIS Implementation Resources and Latency Because our gamma correction is intended to support an actual system under development, our solution took several requirements into consideration:. Execute as a pixel-serial operation 2. Latency on the order of tens of pixel clock cycles 3. Simple implementation for FPGAs, keeping mathematical complexity to MAC and less 4. Keep power and resource requirements to a minimum 5. Work at pixel clock rates in excess of 65 MHz We felt that meeting the second and third requirements would be the key to meeting requirements four and five. At this phase of the project, the focus is primarily on the first three requirements. Tables and 2 present the expected major hardware requirements for the implementation of log (s) and 2 and provide their total latency. Tables 3 and 4 present similar information for the overall modules used to compute the per frame gamma and the per pixel serial gamma correction. They both include appropriate counts for each instantiation of log (s) and 2. Overall, the expected latency of all modules is satisfactory. Likewise, the hardware requirements represent a reasonable portion of the overall available resources.

Table : requirements for log (s) shifter adder 2 multiplier MAC 3 divider Latency in Pixel Clock cycles 2 Table 2: requirements for2 shifter adder multiplier MAC 3 divider Latency in Pixel Clock cycles 2 Table 3: requirements gamma computation shifter 4 adder multiplier MAC 2 divider Latency in Pixel Clock cycles 3 Table 4: requirements for gamma correction shifter 3 adder 5 multiplier 3 MAC 9 divider Latency in Pixel Clock cycles 5 REFERENCES [] M. Vajta, Some Remarks on Padé-Approximations, in Proc. of 3rd TEMPUS INTCOM Symposium on Intelligent Systems in Control and Measurement (edited by J.Vass and D.Fodor), pp.53-58, Sept. 2. [2] F. B. Hildebrand, Introduction to Numerical Analysis, 2 nd Ed, Dover Publications: June 987. [3] "Gamma Correction." Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 22 July 24. Web. Aug. 29.