High Performance Integer DCT Architectures For HEVC

Similar documents
High Speed, Low Power And Area Efficient Carry-Select Adder

PERFORMANCE EVALUATION OF BOOTH AND WALLACE MULTIPLIER USING FIR FILTER. Chirala Engineering College, Chirala.

A High-Speed Multiplication Algorithm Using Modified Partial Product Reduction Tree

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

Fast Code Detection Using High Speed Time Delay Neural Networks

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode

FFT Spectrum Analyzer

Design of Practical FIR Filter Using Modified Radix-4 Booth Algorithm

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

Digital Transmission

antenna antenna (4.139)

Fast Algorithm of A 64-bit Decimal Logarithmic Converter

Low Switching Frequency Active Harmonic Elimination in Multilevel Converters with Unequal DC Voltages

Uncertainty in measurements of power and energy on power networks

Chaotic Filter Bank for Computer Cryptography

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

1. Introduction. Key words: FPGA, Picoblaze, PID controller, HDL, Simulink

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

Graph Method for Solving Switched Capacitors Circuits

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

Design and Implementation of DDFS Based on Quasi-linear Interpolation Algorithm

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

Multiple Error Correction Using Reduced Precision Redundancy Technique

RC Filters TEP Related Topics Principle Equipment

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

HIGH PERFORMANCE ADDER USING VARIABLE THRESHOLD MOSFET IN 45NM TECHNOLOGY

@IJMTER-2015, All rights Reserved 383

AN IMPROVED BIT LOADING TECHNIQUE FOR ENHANCED ENERGY EFFICIENCY IN NEXT GENERATION VOICE/VIDEO APPLICATIONS

The Performance Improvement of BASK System for Giga-Bit MODEM Using the Fuzzy System

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

MTBF PREDICTION REPORT

Revision of Lecture Twenty-One

Section 5. Signal Conditioning and Data Analysis

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

INSTANTANEOUS TORQUE CONTROL OF MICROSTEPPING BIPOLAR PWM DRIVE OF TWO-PHASE STEPPING MOTOR

Enhanced Artificial Neural Networks Using Complex Numbers

Adaptive Modulation for Multiple Antenna Channels

THE GENERATION OF 400 MW RF PULSES AT X-BAND USING RESONANT DELAY LINES *

ANNUAL OF NAVIGATION 11/2006

New Wavelet Based Performance Analysis and Optimization of Scalable Joint Source/Channel Coder (SJSCC & SJSCCN) for Time-Varying Channels.

DTIC DTIC. 9o o FILE COPY NATIONAL COMMUNICATIONS SYSTEM TECHNICAL INFORMATION BULLETIN 87-8 PULSE CODE MODULATION FOR GROUP 4 FACSIMILE

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

OVER-SAMPLING FOR ACCURATE MASKING THRESHOLD CALCULATION IN WAVELET PACKET AUDIO CODERS

FPGA Implementation of Ultrasonic S-Scan Coordinate Conversion Based on Radix-4 CORDIC Algorithm

Design of Shunt Active Filter for Harmonic Compensation in a 3 Phase 3 Wire Distribution Network

In-system Jitter Measurement Based on Blind Oversampling Data Recovery

Application of Intelligent Voltage Control System to Korean Power Systems

HUAWEI TECHNOLOGIES CO., LTD. Huawei Proprietary Page 1

Multicarrier Modulation

Real-Time Power Quality Waveform Recognition with a Programmable Digital Signal Processor

LOCAL DECODING OF WALSH CODES TO REDUCE CDMA DESPREADING COMPUTATION

Delay Constrained Fuzzy Rate Control for Video Streaming over DVB-H

An Efficient Blind Watermarking Method based on Significant Difference of Wavelet Tree Quantization using Adaptive Threshold

aperture David Makovoz, 30/01/2006 Version 1.0 Table of Contents

POWER constraints are a well-known challenge in advanced

California, 4 University of California, Berkeley

c 2009 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

High Speed ADC Sampling Transients

THE USE OF CONVOLUTIONAL CODE FOR NARROWBAND INTERFERENCE SUPPRESSION IN OFDM-DVBT SYSTEM

Calculation of the received voltage due to the radiation from multiple co-frequency sources

Multichannel Frequency Comparator VCH-315. User Guide

Optimal Allocation of Static VAr Compensator for Active Power Loss Reduction by Different Decision Variables

Relevance of Energy Efficiency Gain in Massive MIMO Wireless Network

Review: Our Approach 2. CSC310 Information Theory

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

FULL RECONFIGURABLE INTERLEAVER ARCHITECTURE FOR HIGH-PERFORMANCE SDR APPLICATIONS

Prevention of Sequential Message Loss in CAN Systems

Figure 1. DC-DC Boost Converter

Control of Chaos in Positive Output Luo Converter by means of Time Delay Feedback

Space Time Equalization-space time codes System Model for STCM

Inverse Halftoning Method Using Pattern Substitution Based Data Hiding Scheme

Side-Match Vector Quantizers Using Neural Network Based Variance Predictor for Image Coding

The Dynamic Utilization of Substation Measurements to Maintain Power System Observability

Figure 1. DC-DC Boost Converter

CMOS Implementation of Lossy Integrator using Current Mirrors Rishu Jain 1, Manveen Singh Chadha 2 1, 2

A Preliminary Study on Targets Association Algorithm of Radar and AIS Using BP Neural Network

An Efficient Method for PAPR Reduction of OFDM Signal with Low Complexity

FPGA Implementation of Fuzzy Inference System for Embedded Applications

MODEL ORDER REDUCTION AND CONTROLLER DESIGN OF DISCRETE SYSTEM EMPLOYING REAL CODED GENETIC ALGORITHM J. S. Yadav, N. P. Patidar, J.

Design of an FPGA based TV-tuner test bench using MFIR structures

A new family of linear dispersion code for fast sphere decoding. Creative Commons: Attribution 3.0 Hong Kong License

Latency Insertion Method (LIM) for IR Drop Analysis in Power Grid

Block-wise Extraction of Rent s Exponents for an Extensible Processor

A Mathematical Model for Restoration Problem in Smart Grids Incorporating Load Shedding Concept

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Estimation of Critical Performance and Optimization of Scalable Joint Source/Channel Coder (SJSCC) For Time Varying Channels

Beam quality measurements with Shack-Hartmann wavefront sensor and M2-sensor: comparison of two methods

Accelerated Modular Multiplication Algorithm of Large Word Length Numbers with a Fixed Module

熊本大学学術リポジトリ. Kumamoto University Repositor

Hierarchical Generalized Cantor Set Modulation

Fully Redundant Decimal Arithmetic

DESIGN OF OPTIMIZED FIXED-POINT WCDMA RECEIVER

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 ISSN

AN ALL DIGITAL QAM MODULATOR WITH RADIO FREQUENCY OUTPUT

A Mathematical Solution to Power Optimal Pipeline Design by Utilizing Soft Edge Flip-Flops

Transcription:

Hgh Performance Integer DT Archtectures For HEV V.Sruth, V.Rekha,. Subtha,.Sugtha, S. Jeya Anusuya.E., V. Satheesh kumar.e.,,,, Dept Of Electroncs and ommuncaton Engneerng, Assocate professor, Dept Of Electroncs and ommuncaton Engneerng,,,,, T.J.S Engneerng ollege, Peruvoyal Abstract- Our proposed system proceeds VLSI archtecture for nteger Dscrete osne Transform (nteger DT), whch s used n real tme Hgh Effcency Vdeo odng (HEV) applcatons. It has -pont D-Integer DT archtecture, whch ncludes sgned confgurable carry save adder tree based multpler unt. So, the depth of the archtecture falls wthn the bounds of O (log ). The proposed D archtecture s used to perform one -pont or Integer DTs n parallel. The proposed D archtecture s used to desgn D folded and parallel desgns. The performance results show that the proposed archtecture gves better performance compared wth exstng archtectures usng nm OS TS lbrares. The proposed *-pont parallel Integer DT acheves 9.% of mprovement n worst path delay compared wth odd-even decomposton based archtecture. Keywords- Integer DT, HEV, D DT Archtecture, D DT Archtecture Page 9 I. ITRODUTIO Dgtal sgnal processors (DSPs) are very mportant for the real-tme processng of real-world dgtzed data to do hgh-speed numerc calculatons used for lot of applcatons from basc consumer electroncs to sophstcated ndustral nstrumentaton. The dscrete transform s used to change the representaton of a sgnal from one doman to another for reducng the complexty of a partcular dgtal sgnal processng applcaton. Dscrete cosne transform (DT) s very powerful transformaton used n mage compresson. The crcut complexty of DT s greater than nteger DT because DT s floatng pont and the nteger DT s fxed pont. So, the delay of the multpler adders used n the adder. The output can be stored at one partcular * Buffer. The outputs of I th, * -Buffer are b, b, b8, b, and b, whch are the resultants of,, 8,, and -pont Integer DTs respectvely. Each *-Buffer s made up of numbers of regsters and -to- multplexers wth common select lne. ultmeda communcaton typcally nvolves the transfer of large amount of data Therefore, compresson of vdeo, audo, and mage data s essental for a cost-effcent use of exstng communcaton channels and storage meda. The DT helps separate the mage nto parts of dfferng mportance wth respect to the mage's vsual qualty. The DT chp presented here wll form a part of one such mage compresson system. The system s based on a -D block cosne transform codng scheme, where the mage s of sze x and each block s of sze 8x8. There are two man computatonal task nvolved. The frst conssts of computng the -D DT on blocks of sze 8x8, whle the second task conssts of quantzng the transform coeffcents usng scalar quantzes. We present an mplementaton of a chp that computes the DT of an 8x8 element block. The DT applcaton can have many purposes Such as flterng, teleconferencng, hgh-defnton televson (HDTV), speech codng, mage codng, data compresson, and more. All of these use DT algorthm for compresson and/or flterng purposes. The DT has the energy packng capabltes and also approaches the statstcally optmal transform n decor relatng a sgnal. It was mplemented wth dscrete components at the board level. Ths was followed by ts mplementaton usng general purposes (DSP) chps. Also, mage compresson boards and multprocessor workstatons based on DT have been developed by ndustry. For our proect, t s usng the algorthm for mage compresson purpose. Wth hgh speed and low power desgn, t s best for handheld devce use. Such devce consumes power from ts battery. It s an mpact to have low power consumpton for the devce, because battery carry lmted power. Therefore, the desgn must have low power consumpton components to compose the chp. Otherwse, the devce wll be force to offlne due to nsuffcent power supply. Furthermore, hghspeed algorthm s necessary for urge of current software and operatng system. The performance of the chp s optmzed and specfed for mage compresson purposes. II. LITERATURE SURVE Hgh performance ultpler less DT Archtecture for HEV, Wenun Zhao, Takao Onoye, and Tan Song()

There are numerous vdeo compresson format for storage or transmsson of dgtal vdeo content. Hgh Effcency Vdeo odng (HEV) s a vdeo compresson standard, a successor to H./PEG- Advanced Vdeo odng (AV). In ths paper, we propose an effcent archtecture for the computaton of, 8, and pont DT used n HEV standard. The archtecture uses the anoncal Sgned Dgt (SD) representaton and ommon Subexpresson Elmnaton (SE) technque to perform the multplcaton wth shft-add operaton. A Reconfgurable ult-transform VLSI Archtecture Supportng Vdeo odec Desgn Kanwen Wang, Jaln hen, We ao, ng Wang, Lngl Wang. The proposed system for the real-tme processng of 8P HD vdeo, whch can support both forward and nverse transforms of PEG usng mult transform VLSI archtecture. The (R) algorthm s the multple constant multplcaton algorthm wth fusng strateges, whch s provded to generate constant multplers n the matrx calculaton blocks. ult-mode parallel and folded VLSI archtectures for Dfast Fourer transform ohamed Asan Basr and oor ahammad Sk. Ths paper proposes effcent FFT VLSI archtectures usng folded/parallel mplementaton. The folded FFT archtecture has number of cycles requred to complete the operaton s less than sngle/mult-path delay commutator (D) archtectures. -pont FFT s mplemented by usng one /-pont FFT wthout much extra hardware. Both the proposed archtectures are mplemented for radx-,. III. EISTIG SSTE In all the exstng archtectures, thread-shft network based multpler s used. So, the delay of the multpler s based on the number of adders used n the add-shft network. The exstng technque s add-shft network. It uses confgurable carry save addton Dsadvantages of Exstng System: In the proposed archtecture, confgurable carry save adder (SA) tree based multpler s used. It shows the seres of multplexers used for confgurable carry save addton based multplcaton n the proposed archtecture. The maxmum number of values to be added n the confgurable carry save addton based -pont Integer DT s log = log =.IV. The mathematcal representatons of the -D Forward DT and the -D IDT are represented n the followng: Formulae Forward DT F( u, v) ( u) ( v)[ ( ) ( ) x (x ) u (y) v f ( x, y)cos cos ] y Inverse DT ( ) ( ) (x ) u (y) v f ( x, y) [ ( u) ( v) F( u, v)cos cos ] u v Where: (u) =, (v) = for u,v = (u) = through -; =, 8, or, (v) = for u,v = In the desgn, = 8. F(u,v) s called the (u,v)th transform coeffcent. The above formula shows that the -D DT can be computed by applyng the -D DT to each of the columns of the matrx separately and then applyng the - D DT to each of the rows separately. Ths s the reparablty property of the -D DT. All the -D DT processors developed so far have made use of ths property of the -D DT. In ths report, we present the desgn of the -D DT functon under VLSI archtecture for mage processng. The desgn layout wll be at cells block level, whch t does not show n great detal for the entre chp desgn. DT Algorthm V. WORKIG PRIIPLE ultplcaton requrement s more. ore delay Hgh Power IV. PROPOSED SSTE -D DT Archtecture: The two dmensonal (-D) Dscrete osne Transform (DT) forms the cornerstone of many mage processng standards such as JPEG and PEG. any proposed solutons are based on row column decomposton mplementaton whch allows the -D DT to be mplemented by two one dmensonal (-D) DTs separated by a transposton memory. Page

Page -D DT Archtecture: The dervaton of the l-d DT archtecture can be more easly explaned by examnng the l- D DT n matrx form, gven as below: ] ]*[ [ ] [ () () () () Where (k) = os( K/). As multplers are m tmes more complex than adders, the am s to reduce the number of multplcatons at the expense of addtons. The sparse matrx approach acheves ths by manpulatng the terms n the nput matrx as shown n equaton. () () () () () () () () * () () The crcut conssts of 8 multplers and 8 adders and 8 subtracters connected n a regular matrx of cells. Bt seral logcal adders and subtracter cells have been used and the array multplers have been mplemented. The bt seral multpler wll be ppelned every two cells. By usng K-ap, The seral logcal adder and subtracter equaton s mplemented. The followng table showng the equatons: Sum = A B n Dfference = A B bn arry = AB + An +Bn Borrow = A B + A bn + Bbn onsder the two unsgned bnary numbers and that are and bts wde respectvely. and n a bnary representaton are as below: x x, y y Wth, {,}. The multplcaton operaton s then defned as follow: * k x z k y x z x y x y The multplcand s consecutvely multpled wth every bt of the multpler, resultng n a number of partal products. These ntermedate results are adder after the proper shftng has been appled. Use the algorthms of two bnary number multplcatons to mplement the array multpler. The array multpler conssts numerous of AD and full adder. Ths type of multpler requres -bt (ultplcand) x -bt (ultpler) number of AD gates and full adders. The transpose component s an array of 8x8, -bt shft regsters. It receves the output from the -D DT (row), and transposes the row to the column of the second -D DT nput. The shft regster used to store the bts nto regsters. Then, connect the metal plate to other shfter s nput, and shft the each bts arrange as column format. VI. FLOWHART Fg. Flow chart of D DT Archtecture Fg. Flow chart of D-DT Archtecture VII. PROPOSED DT ARHITETURE Proposed block archtecture used for -pont D- Integer DT. In -pont D-Integer DT, the co-effcent matrx s n the sze of _. The nput sgnal sample values should be multpled wth the co-effcent, whch forms the matrx-vector multpler. In all the exstng archtectures, the add-shft network based multpler s used. So, the delay of the

multpler s based on the number of adders used n the addshft network. In the proposed archtecture, confgurable carry save adder (SA) tree based multpler s used. Fg. (a) shows the seres of multplexers used for confgurable carry save addton based multplcaton n the proposed archtecture. Fg. Proposed Archtecture The maxmum number of values to be added n the confgurable carry save addton based -pont Integer DT s log = log =. For example, the multplcaton of the co-effcent 8 wth the nput sgnal sample value x s equal to 8x = x +x +x +x +x. The mnmum number of values to be added n the confgurable carry save addton based -pont Integer DT s. For example, the multplcaton of the co-effcent wth the nput sgnal sample value x s equal to x = x + x + x + x + x. So, the correspondng left-shfted (power of two) nput sgnal values are sent as the nput of the seres of multplexers used n Fg. (a), whch s named as ell. The maxmum possble cells used to obtan one multplcaton result s. Therefore, fve ells are used n Fg. (b). So, the maxmum possble levels of the confgurable carry save adder (SA) tree s log =. The Sum and arry from the fnal carry save adder are added. The proposed block archtecture (Block) used for - pont D-Integer DT wth (a) Seres of multplexers used for confgurable carry save addton based multplcaton (ell) (b) confgurable carry save adder tree based multplcaton unt (c) Seres of multplexers used to fnd the resultant sgn bts for the multplcaton. Fg. VLSI archtectures for proposed -pont D-Integer DT The overall archtecture of proposed -pont D- Integer DT, where the nputs are from numbers of Blocks as shown n Fg.. Therefore, log = levels of sgned fxed pont adders are used. Therefore, the crtcal path depth of the sgned adder tree (Tadd; pro delay) used n the -pont proposed Integer DT archtecture s (log)t(add). Here, T (add) represents the crtcal path depth of the sgned adder. The proposed -pont D archtecture s used to perform one -pont or two -pont or four 8-pont or eght -pont or sxteen -pont Integer DTs n parallel. The -pont Integer DT output s fous; oug. Fg. shows the -Buffer archtecture, where numbers of *-Buffers are used. The *-Buffer nputs are the outputs from the column of -to- multplexers, wth select lne se. Here, se = ; ; ;, and for ; ; 8; ; and -pont Integer DTs respectvely. Each *-Buffer s made up of numbers of regsters and -to- multplexers wth common select lne. The select lnes used n the *-Buffers,,..., and are en, en,...en, and en respectvely. The output from Fg. can be stored at one partcular *-Buffer wth correspondng select lne as. The *-Buffer archtecture s shown n Fg.. The outputs of th *-Buffer are b, b, b8, b, and b, whch are the resultants of,, 8,, and -pont Integer DTs respectvely. Here, en = to mantan the values ( values) stored n the buffer and en = f the the new value. Page

Fg. Proposed DT FORWARD TRASFOR (DT): Fg. VLSI archtectures for -Buffer Frst Stage of Forward Transform: The frst stage of the forward transform conssts of multplcaton of the result of the D. The nput nto the second stage of the forward transform s the output matrx from the frst stage of forward transform whch s a matrx wth only the D element. The output of multplcaton wth DT wll be a matrx wth frst column elements. onsequently, the scalng requred after the frst stage of the forward transform for the output to ft wthn bts s S T = -(B-+9). Fg. Archtecture of *-Buffer * -Buffer archtecture: The output from Fg. can be stored at one partcular *-Buffer wth correspondng select lne as. The *-Buffer archtecture s shown n Fg.. The outputs of th * -Buffer are b, b, b8, b, and b, whch are the resultants of,, 8,, and -pont Integer DTs respectvely. Here, en = to mantan the values ( values) stored n the buffer and en = f the the new value s obtaned. Second Stage of Forward Transform: The second stage of the forward transform conssts of multplcaton of the result of the frst transform stage wth D. The nput nto the second stage of the forward transform s the output from the frst stage whch s a matrx wth all elements n the frst row. All other elements wll be zero. The output of multplcaton wth wll be a matrx wth only a D value. Ths mples that the scalng requred after the second stage of transform s n S T = -(-B) order for the output to ft wthn bts. Page

VIII. TEHOLOG USED The multpler unt used n the latest -pont Integer DT archtectures s n the form of add-shft network, whereas n the proposed archtecture, sgned confgurable carry save adder tree s used. Therefore, the depth of the archtecture falls wthn the bounds of O(log ). The proposed D archtecture s used to perform one -pont or multple,,...- pont Integer DTs n parallel. The performance results show that the proposed archtecture gves better performance compared wth exstng archtectures usng nm OS TS lbrares. odel Sm software s used to check ths model. For I development they have three processes n lnx software. a. heck syntax b. Pn assgnment c. Implementaton heck syntax s used to check our desgn havng any error. After fnshng ths process we allocate the nput and output pns by usng pn assgnment. In the fnal process s mplementaton. Here we mplement the desgn nto our assgnng pns. Then we convert our code nto bt fle then after we dump ths bt fle n to FPGA spartan (S PQ8) and verfed t. professor, whose contrbuton n gvng suggestons and encouragement helped us for ths proect to complete. II. RESULTS AD OLUSIO The multplers are, at less, twce faster than the conventonal desgn, and consume half of the power. Ths can be done by gnore the zeros n the multply constant and the nsgnfcant parts of the answer. The crcut s further reduced. Therefore, t consumes less power. The reducton wll more stages, because the scares of resources. Ths can be countered by dong mult-stage n one perod. The multpler operaton wll take one perod and one or two adders operatons wll perform n one perod. Then, there wll be less power consumpton wthout comprsng the speed. The D DT and Transpose s fnshed and the smulaton s shown above. The area of the D DT chp s.8mm x.mm. The total delay 8.ns. The Transpose s 9ns. The area s.mmx.mmthe results s the same as the calculaton. We don t have tme to construct the D DT. But, t s smple. It ust need to connect two D DT to the transpose. The performance results show that the proposed archtecture gves good mprovement as compared wth exstng archtectures. The Snapshot below gves the clear elaboraton of applcaton. Devce Utlzaton Summary -Pont DT I. ADVATAGES Better ompresson Performance omputaton Performance s good Page. APPLIATIOS Used n health department. Human welfare It s used to montor ndustral radaton levels. I. AKOWLEDGEET We would lke to thank all those who provde us the possblty to propose ths proect. A specal grattude to our Proect Gude r. V. Satheesh kumar, Assstant professor and Proect coordnator rs. S. Jeya Anusuya, Assocate Fg.8 Smulaton Snapshot of -POIT DT REFEREES [] ohamed Asan Basr and oor ahammad Sk, ultmode Parallel and Folded VLSI Archtectures for D-Fast Fourer Transform, Integraton, the VLSI Journal, Elsever, vol., pp. -, Sept.. [] Fe Lang, ulan Peng, and Jzheng u, A lghtweght HEV encoder for mage codng, IEEE

Internatonal onference on Vsual ommuncatons and Image Processng (VIP), pp. -, ov.. [] Pramod Kumar eher, Sang oon Park, Basant Kumar ohanty, Khoon Seong Lm, and huohao eo,, Effcent Integer DT Archtectures for HEV, IEEE Transactons on rcuts and Systems for Vdeo Technology, vol., no., pp. 8-8, Jan.. [] Pa-Tse hang and Tan Sheuan hang, A Reconfgurable Inverse Transform Archtecture Desgn for HEV Decoder, IEEE Internatonal Symposum on rcuts and Systems (ISAS), pp. -9, ay.. [] Honggang Q, Qngmng Huang, and Wen Gao, A Low- ost Very Large Scale Integraton Archtecture for ult Standard Inverse Transform, IEEE Transactons on rcuts and Systems - II, Express Brefs, vol., no., pp. -, July. [] Khan Wahd, uhammad artuza, ousum Das, and arl crosky, Resource Shared Archtecture of ultple Transforms for ultple Vdeo odecs, IEEE Internatonal anadan onference on Electrcal and omputer Engneerng (EE), pp. 9-9, ay. [] Kanwen Wang, Jaln hen, We ao, ng Wang, Lngl Wang, and Jarong Tong, A Reconfgurable ult- Transform VLSI Archtecture Supportng Vdeo odec Desgn, IEEE Transactons on rcuts and Systems - II, Express Brefs, vol. 8, no., pp. -, July. [8] ao Zyou, He Wefeng, Hong Lang, He Guanghu, and ao Zhgang, Area and Throughput Effcent IDT/IDST Archtecture for HEV Standard, IEEE Internatonal Symposum on rcuts and Systems(ISAS), pp. -, June. [9] Hong Lang, He Wefeng, Zhu Hu, and ao Zhgang, A ost Effectve -D Adaptve Block Sze IDT Archtecture for HEV Standard, IEEE th Internatonal dwest Symposum on rcuts and Systems (WSAS), pp. 9-9, Aug.. [] Wenun Zhao, Takao Onoye, and Tan Song, Hgh- Performance ultplerless Transform Archtecture for HEV, IEEE Internatonal Symposum on rcuts and Systems, pp. 8-, ay. [] ohamed Asan Basr and oor ahammad Sk, An Effcent VLSI Archtecture for Dscrete Hadamard Transform, IEEE Internatonal VLSI Desgn onference, pp. -, Jan.. [] Rcardo Gonzalez, Benamn. Gordon, and ark A. Horowtz, Supply and Threshold Voltage Scalng for Low Power OS, IEEE Journal of Sold State rcuts, vol., no. 8, pp. -, Aug. 99.. Page