Implementation of Different Architectures of Forward 4x4 Integer DCT For H.264/AVC Encoder

Similar documents
CHAPTER 2 LITERATURE STUDY

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

To provide data transmission in indoor

Area-Time Efficient Digit-Serial-Serial Two s Complement Multiplier

Mixed CMOS PTL Adders

DIGITAL multipliers [1], [2] are the core components of

Solutions to exercise 1 in ETS052 Computer Communication

Design and implementation of a high-speed bit-serial SFQ adder based on the binary decision diagram

Experimental Application of H Output-Feedback Controller on Two Links of SCARA Robot

Sequential Logic (2) Synchronous vs Asynchronous Sequential Circuit. Clock Signal. Synchronous Sequential Circuits. FSM Overview 9/10/12

LATEST CALIBRATION OF GLONASS P-CODE TIME RECEIVERS

Interference Cancellation Method without Feedback Amount for Three Users Interference Channel

Th ELI1 09 Broadband Processing of West of Shetland Data

AN ANALYSIS ON SYNTHETIC APERTURE RADAR DATA AND ENHANCEMENT OF RECONSTRUCTED IMAGES

Design and Development of 8-Bits Fast Multiplier for Low Power Applications

CS 135: Computer Architecture I. Boolean Algebra. Basic Logic Gates

Study on SLT calibration method of 2-port waveguide DUT

& Y Connected resistors, Light emitting diode.

Chapter 2 Literature Review

Multi-beam antennas in a broadband wireless access system

Math Circles Finite Automata Question Sheet 3 (Solutions)

Subword Permutation Instructions for Two-Dimensional Multimedia Processing in MicroSIMD Architectures

On the Description of Communications Between Software Components with UML

Redundancy Data Elimination Scheme Based on Stitching Technique in Image Senor Networks

Discontinued AN6262N, AN6263N. (planed maintenance type, maintenance type, planed discontinued typed, discontinued type)

A Development of Earthing-Resistance-Estimation Instrument

A New Algorithm to Compute Alternate Paths in Reliable OSPF (ROSPF)

ABSTRACT. We further show that using pixel variance for flat field correction leads to errors in cameras with good factory calibration.

METHOD OF LOCATION USING SIGNALS OF UNKNOWN ORIGIN. Inventor: Brian L. Baskin

TIME: 1 hour 30 minutes

ABB STOTZ-KONTAKT. ABB i-bus EIB Current Module SM/S Intelligent Installation Systems. User Manual SM/S In = 16 A AC Un = 230 V AC

On the Effectivity of Different Pseudo-Noise and Orthogonal Sequences for Speech Encryption from Correlation Properties

April 9, 2000 DIS chapter 10 CHAPTER 3 : INTEGRATED PROCESSOR-LEVEL ARCHITECTURES FOR REAL-TIME DIGITAL SIGNAL PROCESSING

The Discussion of this exercise covers the following points:

A Simple Approach to Control the Time-constant of Microwave Integrators

Dataflow Language Model. DataFlow Models. Applications of Dataflow. Dataflow Languages. Kahn process networks. A Kahn Process (1)

Convolutional Networks. Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

Design and Modeling of Substrate Integrated Waveguide based Antenna to Study the Effect of Different Dielectric Materials

Design of FPGA-Based Rapid Prototype Spectral Subtraction for Hands-free Speech Applications

Engineer-to-Engineer Note

Regular languages can be expressed as regular expressions.

A New Stochastic Inner Product Core Design for Digital FIR Filters

CHAPTER 3 AMPLIFIER DESIGN TECHNIQUES

MAXIMUM FLOWS IN FUZZY NETWORKS WITH FUNNEL-SHAPED NODES

Simulation of Transformer Based Z-Source Inverter to Obtain High Voltage Boost Ability

FPGA Based Five-Phase Sinusoidal PWM Generator

Improved Ensemble Empirical Mode Decomposition and its Applications to Gearbox Fault Signal Processing

Automatic Synthesis of Compressor Trees: Reevaluating Large Counters

ISSCC 2006 / SESSION 21 / ADVANCED CLOCKING, LOGIC AND SIGNALING TECHNIQUES / 21.5

Hardware Implementation of Image Compression Technique using Wavelet

Analysis of circuits containing active elements by using modified T - graphs

Homework #1 due Monday at 6pm. White drop box in Student Lounge on the second floor of Cory. Tuesday labs cancelled next week

Asynchronous Data-Driven Circuit Synthesis

This is a repository copy of Effect of power state on absorption cross section of personal computer components.

Modeling of Inverter Fed Five Phase Induction Motor using V/f Control Technique

Color gamut reduction techniques for printing with custom inks

Logic Design of Elementary Functional Operators in Quaternary Algebra

(1) Non-linear system

Control of high-frequency AC link electronic transformer

Available online at ScienceDirect. Procedia Engineering 89 (2014 )

Proceedings of Meetings on Acoustics

PRO LIGNO Vol. 11 N pp

D I G I TA L C A M E R A S PA RT 4

A Practical DPA Countermeasure with BDD Architecture

EE Controls Lab #2: Implementing State-Transition Logic on a PLC

Student Book SERIES. Patterns and Algebra. Name

Use of compiler optimization of software bypassing as a method to improve energy efficiency of exposed data path architectures

Y9.ET1.3 Implementation of Secure Energy Management against Cyber/physical Attacks for FREEDM System

Two-layer slotted-waveguide antenna array with broad reflection/gain bandwidth at millimetre-wave frequencies

Information-Coupled Turbo Codes for LTE Systems

mac profile Configuration Guide Adobe Photoshop CS/CC Sawgrass Virtuoso SG400/SG800 Macintosh v

Experiment 3: Non-Ideal Operational Amplifiers

DYE SOLUBILITY IN SUPERCRITICAL CARBON DIOXIDE FLUID

All-optical busbar differential protection scheme for electric power systems

Markov mode-multiplexing mode in OFDM outphasing transmitters

Research on Local Mean Decomposition Algorithms in Harmonic and Voltage Flicker Detection of Microgrid

Performance Monitoring Fundamentals: Demystifying Performance Assessment Techniques

Address for Correspondence

High Speed On-Chip Interconnects: Trade offs in Passive Termination

Application of Wavelet De-noising in Vibration Torque Measurement

Example. Check that the Jacobian of the transformation to spherical coordinates is

DESIGN OF CONTINUOUS LAG COMPENSATORS

Available online at ScienceDirect. 6th CIRP International Conference on High Performance Cutting, HPC2014

Synchronous Generator Line Synchronization

Fuzzy Logic Controller for Three Phase PWM AC-DC Converter

Performance Comparison between Network Coding in Space and Routing in Space

Experiment 3: Non-Ideal Operational Amplifiers

University of Dayton Research Institute Dayton, Ohio, Materials Laboratory Wright Patterson AFB, Ohio,

Understanding Basic Analog Ideal Op Amps

Open Access A Novel Parallel Current-sharing Control Method of Switch Power Supply

Exponential-Hyperbolic Model for Actual Operating Conditions of Three Phase Arc Furnaces

CSI-SF: Estimating Wireless Channel State Using CSI Sampling & Fusion

Exercise 1-1. The Sine Wave EXERCISE OBJECTIVE DISCUSSION OUTLINE. Relationship between a rotating phasor and a sine wave DISCUSSION

Eliminating Non-Determinism During Test of High-Speed Source Synchronous Differential Buses

Geometric quantities for polar curves

A COMPARISON OF CIRCUIT IMPLEMENTATIONS FROM A SECURITY PERSPECTIVE

Domination and Independence on Square Chessboard

Synchronous Machine Parameter Measurement

Application of Feed Forward Neural Network to Differential Protection of Turbogenerator

Transcription:

Implementtion of Different Architectures of Forwrd 4x4 Integer DCT For H.64/AVC Encoder Bunji Antoinette Ringnyu, Ali Tngel, Emre Krulut 3 Koceli University, Institute of Science nd Technology, Koceli, Turkey unjintoinetteringnyu@gmil.com Koceli University, Deprtment of Electronics nd Communiction Engineering, Koceli, Turkey tngel@koceli.edu.tr 3 YONGATEK, Teknoprk Istnul, Turkey emre.krulut.dde@gmil.com Astrct This pper presents n overview nd different implementtions of the 4x4 Integer Discrete Cosine Trnsform (DCT) used for the H.64 stndrd, lso presenting the Utiliztion Report nd the Mximum Operting Frequency for ech implementtion. The H.64 stndrd specifies the use of Integer DCT to decompose the results of inter prediction nd intr prediction from sptil to Frequency Domin. We implemented the -D direct multipliction, the -D (utterfly) method nd the -D multipliction with dders. With these implementtions, the results show tht -D with dders nd -D implementtions outperform the direct -D multipliction. However, the -D with dders uses fewer resources thn the -D utterfly nd still chieves reltively high frequency. Keywords H.64/AVC, Integer DCT, Imge Compression, VHDL. Introduction Since the 990s, imge compression hs experienced significntly high progress. From the H.6 nd JPEG used in the 980s, to the MPEG- nd MPEG- used in the 990s. With the populrity internet got in the 000s, stndrds like MPEG-4, H.64 or MPEG- Prt 0/AVC, MP4 nd HEVC sprung up. Of ll the stndrds mentioned ove, H.64 stndrd is the most widely used codec. Compred to previous stndrds, H.64 provides higher compression of out 50% over wide rnge of it rtes nd high video resolutions. Although its coding lgorithms re sed on the sme lock-sed motion compenstion nd trnsform sed sptil coding frmework of prior video coding stndrds, the H.64 hs mny innovtions compred to the older stndrds []; such s hyrid predictive/trnsforms coding of intr frmes nd integer trnsforms. Sme s the existing stndrds, AVC encoding is ccomplished through mny locks which include Motion Estimtion nd Motion Compenstion, Intr prediction, Trnsform nd Quntiztion, Inverse Trnsform nd Quntiztion, nd Entropy Encoder. Fig. shows lock digrm of H.64/AVC encoder scheme []. The Motion Estimtion module is used to identify nd eliminte temporry redundncies tht exist etween individul frmes. It involves use of motion vectors tht descries the trnsformtion of the video /imge from one dimension to the next. Motion vectors my e pplied to the whole imge in which cse we hve glol motion estimtion or on prts of the imge in which it ecomes locl motion estimtion or even per pixel Motion Compenstion (MC) will decode the imge tht is encoded y Motion Estimtion [], [3]. The input to the inter prediction nd intr prediction locks re mcrolocks. These locks re encoded in either inter or intr mode. In inter mode, prediction is formed y motioncompensted prediction or two reference picture(s) selected from the set of list 0 nd/or list reference pictures [4]. In instnces where motion estimtion cnnot e exploited, intr mode is used to eliminte sptil redundncies y ttempting to predict the current lock y extrpolting the neighoring from djcent locks in defined set of djcent directions. Fig.. Digrm of AVC encoding scheme. [3]

To decompose the results of inter prediction nd intr prediction from sptil to Frequency Domin, integer DCT is used. This is usully chieved through the use of 4X4 DCT. In the cse of mcrolock coded in 6 6 intr prediction mode, its lum pixels re first trnsformed using the 4 4 DCT nd s second step, gthered 4 4 DC coefficients lock is trnsformed gin using 4 4 Hdmrd trnsform [5].The coefficients from the Trnsform lock re quntized to remove unimportnt informtion, llowing only significnt coefficients for representing the residul frme. At the level of the trnsformtion lock, over ll precision of integers coefficients re reduced, leding to the elimintion of high frequency coefficients. There re severl ppers [6, 7, 9, 0] discussing hrdwre implementtion of Trnsform nd Quntiztion. In this pper we will focus on the Trnsform lock. The core trnsform mtrix is 4x4 mtrix which cn e implemented using -D mtrix multipliction or the populr utterfly lgorithm which is -D using dditions, sutrctions nd shift opertions long rows then long columns. In ddition to the two methods used ove, implement the 4x4 DCT using -D multiplictions with ddition opertions only nd the finl results from the three rchitectures compred to see which of them uses less resources. These rchitectures re implemented using VHDL. The rest of the pper is mde of section, which presents n overview of H.64 Trnsform lgorithm. Section 3 presents the implementtion of the three different rchitectures for the 4x4 Integer DCT Block. The synthesis nd results re presented in section 4. The pper is concluded with section 5.. Overview of H.64 Trnsform In the AVC stndrd, the residul frme of the prediction, which is the difference of the originl frme nd the predicted frme, is prtitioned into fixed-size of mcrolocks. A mcrolock is composed of 6 6 luminnce(y) smples, 8 8 chrom lue(c) smples, nd 8 8 chrom red(cr) smples in the cse of 4::0 chrom susmpling formt. A ock digrm of these three smple locks is shown in Fig. three different trnsforms used in H.64/AVC. According to the lock digrm of H.64 trnsform component s illustrted in Fig. 4, the residue is trnsformed using integer DCT. In the 6 6 Intrprediction mode, DC coefficients of ll trnsformed residul locks re grouped into n rry of 4 4 efore eing sent to Hdmrd trnsform. Detils of these processes re descried in mthemticl models in section... 4 4 Forwrd Trnsforms DCT hs een used in oth previous stndrds (like the 8x8 DCT) nd existing stndrds of imge compression (like the Integer DCT used in H.64 nd H.65). The H.64/AVC is sed on 4x4 Integer DCT Fig. 3. Block digrm of H.64 Trnsform Component tht cn e computed exctly with integer rithmetic in order to void inverse trnsform mismtch prolems. There re two types of 4x4 integer trnsforms for the residul coding. The first one is for luminnce residul locks nd is descried y () []. Y = CXC T () Where X is the 4x4 residul input of the Trnsform lock nd C is specified y C = c c c _ c = /, = / cos( π / 8), c = / cos(3π /8) This cn e fctorized s Y = (CXC T ) E () Fig.. Processing order of locks in mcrolock[3] At su-mcrolock level, mcrolocks re sudivided into su-locks of 4 4 smples for encoding. Due to the 3 different smples, there re C = d d d d E =

Where E is mtrix of scling fctors. The symol mens tht ech component of CXC T is multiplied y the corresponding coefficient in E. To reduce hrdwre implementtion of the trnsform, the constnt d is pproximted y 0.5 nd the constnt y /5. The finl forwrd trnsform expression ecomes [3], []: C f = Y = (CfXCf T ) Ef (3) E f / = / /4 / / /4 / / /4 / / /4 So, the scling mtrix Ef cn e incorported into the quntiztion process. Then CfXC T f ecomes the core of -D integer forwrd trnsform. This cn e computed y using two -D trnsforms. The first -D is pplied to the rows of the incoming residue. The second -D is then pplied on the columns. This is wht is populrly known s utterfly lgorithm. Since the 4x4 trnsform mtrix hs only, -,, - s coefficients, its utterfly is s shown elow: Fig.. Butterfly Digrm of 4x4 Integers DCT.. Implementtion of 4x4 DCT Trnsform There re severl ppers discussing on the VLSI implementtion of -D integer trnsform for H.64.Thus, implementtion of fst -D trnsform cn e clssified into two ctegories: row/column decomposition (-D) pproch nd direct -D pproch. Though direct -D requires more resources, it is implemented to e used for comprison with other rchitectures. Also, the -D is implemented with full dders only. Then, the Butterfly is lso implemented. 3. Direct -D Multipliction This is implemented y using norml Mtrix multipliction with Finite Stte Mchine (FSM). This FSM consists of the sttes INITIALIZATION, then MULT which performs the first mtrix multipliction of eqution () which is C*X nd finlly, stte MULT which performs the second trnsform. All the other rchitectures re implemented using similr sttes. 3. -D Multipliction with Adders Fig.. Butterfly Digrm of 4x4 Integers DCT.. 4 4 Hdmrd Trnsform The other kind of trnsform is Hdmrd Trnsform (HT). It is pplied to the luminnce DC terms in 6x6 intr prediction mode. The Hdmrd trnsform is defined y eqution (4) H f = Y = HXH T (4) The Hdmrd trnsform mtrix is very similr to the Forwrd trnsform mtrix with the only difference eing, the in forwrd trnsform is eing replced with in the Hdmrd trnsform. The utterfly of the Hdmrd is s shown elow : Unlike performing multipliction directly, this rchitecture is implemented y replcing multiplictions (*) with full dders. This is ccomplished with the help of the conctention opertor in VHDL. 3.3 Implementtion of -D-Trnsform using - D pproch. This is ccomplished y firstly performing the utterfly lgorithm on the rows. The trnspose of the resulting out is tken nd the send utterfly lgorithm is tken. This second utterfly is performed on the columns. A finl trnspose is tken to otin the required output. The lock digrm of this stge is s shown elow: Fig.. Block of the Integer DCT using the -D pproch.

4. Results nd Discussions The implementtion is done oth with VHDL nd Mtl s explined elow. Firstly, the Residue vlues re generted in Mtl nd written to dumper ( text file). The Residue vlues re then red from the dumper to Mtl nd VHDL, then the lgorithms executed. The results from the Trnsform lock re gin dumped to two seprte text files nd lstly, compred to e sure tht the results were the sme. The whole process of generting Residue vlues to compring is s depicted in Fig. 7. This process ws repeted for different sets of dt, just to confirm the lgorithms were working correctly nd no its were lost especilly in VHDL. Finlly, synthesis ws done to vıew the Utiliztion Anlysis nd Mximum Frequency supported y ech rchitecture. The results re presented in Tle. The -D direct multipliction does the opertion with multipliers using 0 Slice LUTs nd 45 Slice Registers. The rchitecture lso hs mximum operting frequency of 5 MHz. This confirms the resons why multipliers re -D integer DCT implemented with full dders cn still e used in H.64/AVC encoders. 5. Conclusion This pper hs given the Mximum Frequency nd Resource usge improvements for the -D Integer DCT in H.64 encoder. The -D with dders nd - D rchitectures chieved higher Mximum frequency (out 33.3 %), which is higher thn the - D with multipliers. Even though the two rchitectures hve reltively higher frequencies, the -D with dders uses less hrdwre resources. With this mximum frequency chieved nd low hrdwre utıliztion cpilities, the -D rchitectures with dders cn lso e used in systems where the - D rchitecture especilly in rel-time pplictions such s moile communiction nd video rodcsting. Even though the D implementtion is not lwys used, the one implemented here cn still e used in low frequency systems like Video Compression for storge devices like DVDs, nd Digitl signl processing. 6. Acknowledgement We will like to thnk Mr. Muhmmed Aslm for his constnt support during this project nd the entire stff of YONGA TEK, TEKNOPARK Istnul, for providing suitle environment for this reserch. Fig.. Block digrm of the synthesis nd simultion processes discourged in the implementtion of Integer DCT. They use more resources nd operte t low frequencies.the -D just s expected uses less resources (66 Slice LUTs nd 730 Slice Registers) nd mximum frequency of 00MHz). The -D with dders on the other hnd chieves the sme mximum frequency s the -D method (00 MHz), using even the lest resources (794 Slice LUTs nd 4 Slice Registers). Tle. Synthesis Report of the Three Different Architectures Architecture Slice LUTs Slice Registers Mximum operting Frequency Direct -D Multipliction 0 45 5MHz -D Multipliction With Adders 794 4 00MHz The -D Approch 66 730 00MHz The use of -D integer DCT is generlly not dvisle ecuse for the hrdwre resources they consume. But it cn e seen from the tle ove tht, 7. References [] Meihu Gu et l, Hrdwre Prototyping for Vrious Trnsforms in H.64 High Profile, Journl of Informtion & Computtionl Science, 0, pp. 9 8. [] Thoms Wiegnd nd Gry J. Sullivn, The H.64/MPEG4 Advnced Video Coding Stndrd nd its Applictions, IEEE SIGNAL PROCESSING MAGAZINE, Agust 006, pp. 34-43. [3] H.S.Mrver, A.Hllpuro, M.Krczewicz nd L.kerofsky, Low Complexity Trnsform nd Quntistion in H.64/AVC IEEE Trnsctions on circuits nd systems for video technology, vol. 3, No 7, July, 003, pp. 560-576. [4] I.E.G Richrdson, H.64 nd MPEG-4 Video Compression,pulished y John Wiley nd sons, West Sussex, UK, 003 [5] Meihu Gu et l, Hrdwre Prototyping for Vrious Trnsforms in H.64 High Profile, Journl of Informtion & Computtionl Science, 0, pp. 9 8. [6] H.S.Mrver, A.Hllpuro, M.Krczewicz nd L.kerofsky, Low Complexity Trnsform nd Quntistion in H.64/AVC IEEE Trnsctions on circuits nd systems for video

technology, vol. 3, No 7, July, 003, pp. 560-576. [7] Meihu Gu et l, Hrdwre Prototyping for Vrious Trnsforms in H.64 High Profile, Journl of Informtion & Computtionl Science, 0, pp. 9 8. [8] H. Klv nd J.B.Lee, The VC- nd H.64 Video Compression Stndrds for Brodnd Video Services, Springer, New York, USA, 008. [9] Chrles S. Luoy, Mqele M. Dlodlo, Gerhrd De. Jger, nd Keith L.Ferguson, Optimiztion of 4x4 Integer DCT in H.64/AVC Encoder, Council for Scientific nd Industril Reserch. [0] Drft ITU-T Recommendtion nd Finl Drft Interntionl Stndrd of Joint Video Specifiction, ITU-TRec.H.64 nd ISO/IEC4496-0 AVC,003. [] I.E.G.Richrdson, H.64 nd MPEG4 Video Compression-Video Coding for Next Genertion Multimedi,NewYork:Wiley,003. [] Xun-Tu Trn nd Vn-Hun Trn, An Efficient Architecture of Forwrd Trnsforms nd Quntiztionfor H.64/AVC Codecs, Journl on Electronics nd Communictions, Vol., No., April June, 0, pp. - 9.