Ajmer, Sikar Road Ajmer,Rajasthan,India. Ajmer, Sikar Road Ajmer,Rajasthan,India.

Similar documents
Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

A Parallel Multiplier - Accumulator Based On Radix 4 Modified Booth Algorithms by Using Spurious Power Suppression Technique

Mahendra Engineering College, Namakkal, Tamilnadu, India.

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE

DESIGN OF LOW POWER / HIGH SPEED MULTIPLIER USING SPURIOUS POWER SUPPRESSION TECHNIQUE (SPST)

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

ASIC Design and Implementation of SPST in FIR Filter

DESIGNING OF MODIFIED BOOTH ENCODER WITH POWER SUPPRESSION TECHNIQUE

Area and Power Efficient Booth s Multipliers Based on Non Redundant Radix-4 Signed- Digit Encoding

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

ABSTRACT: Saroornagar Rangareddy, Telangana, India 3 Associate Professor, HOD,Dept of ECE, TKR College of Engineering and Technology,

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Implementation of Parallel MAC Unit in 8*8 Pre- Encoded NR4SD Multipliers

International Journal of Advanced Research in Computer Science and Software Engineering

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

An Optimized Design for Parallel MAC based on Radix-4 MBA

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Area Efficient NR4SD Encoding for Pre-Encoded Multipliers

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

Design of Parallel MAC Based On Radix-4 & Radix-8 Modified Booth Algorithm

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

Low-Power Multipliers with Data Wordlength Reduction

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

Digital Integrated CircuitDesign

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

Data Word Length Reduction for Low-Power DSP Software

Design of an optimized multiplier based on approximation logic

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

A Review on Different Multiplier Techniques

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

Implementation of Efficient 16-Bit MAC Using Modified Booth Algorithm and Different Adders

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure

DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

An Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

Review of Booth Algorithm for Design of Multiplier

DESIGN OF MULTIPLE CONSTANT MULTIPLICATION ALGORITHM FOR FIR FILTER

DESIGN OF AREA EFFICIENT TRUNCATED MULTIPLIER FOR DIGITAL SIGNAL PROCESSING APPLICATIONS

HIGH SPEED FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

Comparison of Conventional Multiplier with Bypass Zero Multiplier

Tirupur, Tamilnadu, India 1 2

ISSN Vol.07,Issue.08, July-2015, Pages:

Design and Performance Analysis of a Reconfigurable Fir Filter

ISSN Vol.03,Issue.02, February-2014, Pages:

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

A Survey on Power Reduction Techniques in FIR Filter

Keywords: Column bypassing multiplier, Modified booth algorithm, Spartan-3AN.

Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

Pre-Encoded Multipliers Based on Non-Redundant Radix-4 Signed-Digit Encoding

Review On Design Of Low Power Multiply And Accumulate Unit Using Baugh-Wooley Based Multiplier

Efficient FIR Filter Design Using Modified Carry Select Adder & Wallace Tree Multiplier

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

DESIGN OF LOW POWER MULTIPLIERS

Reconfigurable High Performance Baugh-Wooley Multiplier for DSP Applications

Design and Implementation of High Radix Booth Multiplier using Koggestone Adder and Carry Select Adder

Design of High Speed Carry Select Adder using Spurious Power Suppression Technique

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition

S.Nagaraj 1, R.Mallikarjuna Reddy 2

Structural VHDL Implementation of Wallace Multiplier

Low Power FIR Filter Structure Design Using Reversible Logic Gates for Speech Signal Processing

A New Architecture for Signed Radix-2 m Pure Array Multipliers

AN ADVANCED VLSI ARCHITECTURE OF PARALLEL MULTIPLIER BASED ON HIGHER ORDER MODIFIED BOOTH ALGORITHM

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) STUDY ON COMPARISON OF VARIOUS MULTIPLIERS

Vector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

Comparative Study of Different Variable Truncated Multipliers

International Journal of Computer Engineering and Applications, Volume XI, Issue XI, Nov. 17, ISSN

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits

DESIGN OF FIR FILTER ARCHITECTURE USING VARIOUS EFFICIENT MULTIPLIERS Indumathi M #1, Vijaya Bala V #2

Design and Implementation of Scalable Micro Programmed Fir Filter Using Wallace Tree and Birecoder

AN EFFICIENT DESIGN OF ROBA MULTIPLIERS 1 BADDI. MOUNIKA, 2 V. RAMA RAO M.Tech, Assistant professor

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

VLSI Designing of High Speed Parallel Multiplier Accumulator Based On Radix4 Booths Multiplier

DESIGN AND ANALYSIS OF LOW POWER MULTIPLY AND ACCUMULATE UNIT USING PIXEL PROPERTIES REUSABILITY TECHNIQUE FOR IMAGE PROCESSING SYSTEMS

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Design and Implementation of Radix-2 Modified Booth s Encoder Using FPGA and ASIC Methodology

Optimized FIR filter design using Truncated Multiplier Technique

Transcription:

DESIGN AND IMPLEMENTATION OF MAC UNIT FOR DSP APPLICATIONS USING VERILOG HDL Amit kumar 1 Nidhi Verma 2 amitjaiswalec162icfai@gmail.com 1 verma.nidhi17@gmail.com 2 1 PG Scholar, VLSI, Bhagwant University Ajmer, Sikar Road Ajmer,Rajasthan,India. 2 Assistant Professor, Bhagwant University Ajmer, Sikar Road Ajmer,Rajasthan,India. Abstract: In this paper, we proposed a new architecture of multiplier -and- accumulator (MAC) for high-speed arithmetic and low power. With the rapid advances in multimedia and communication system, high capacity signal processing are in demand, so High Speed MAC are essential to improve performance of signal processing System. Multiplication occurs frequently in finite impulse response filters, fast Fourier transforms, discrete cosine transforms, convolution, and other important DSP and multimedia kernels. The objective of a good multiplier and accumulator (MAC) is to provide a physically compact, good speed and low power consuming chip. The proposed SPST separates the target designs into two parts, i.e., the most significant part and least significant part (MSP and LSP), and turns off the MSP when it does not affect the computation results to save power. In this paper, we propose a high speed MAC adopting the new SPST implementing approach. This multiplier and accumulator is designed by equipping the Spurious Power Suppression Technique (SPST) on a modified Booth encoder which is controlled by a detection unit using an AND gate. The modified booth encoder will reduce the number of partial products generated by a factor of 2. The SPST adder will avoid the unwanted addition and thus minimize the switching power dissipation. Keywords: Booth encoder, computer arithmetic, digital signal processing, spurious power suppression technique, low power. I. INTRODUCTION One of the accompanying challenges in designing ICs for portable electrical devices is lowering down the power consumption to prolong the operating time on the basis of given limited energy supply from batteries. Owing to the vigorous development of the wireless infrastructure and the personal electronic devices like video mobile phones, mobile TV sets, PDAs, etc., multimedia and DSP applications have been adopted in wireless environments. Increasing demands of high speed data signal processing motivated the researchers to seek fastest processors. The multiplier and multiplier-and-accumulator (MAC) [1] are the building blocks of the processor and has a great impact on the speed of the processor. MAC is the necessary element of the digital signal and image/audio processing system such as filtering, convolution and inner products hence high speed is crucial to develop for real processing applications. Many researchers have attempted in designing MAC for high computational performance and low power consumption. High throughput MAC is always a key factor to achieve high performance digital signal processing applications for real time signal processing applications. Since the multiplier requires the longest delay among the basic operation in digital system, the critical path is limited by the multiplier. Multiplier basically consists of three operational steps: Booth Encoder, Partial product reduction network (Wallace Tree) and final adder. For high speed multiplication, Modified Booth Algorithm (MBA) [4] is most commonly used, in which partial product is generated from Multiplicand (X) and Multiplier (Y).Booth multiplication allows for the smaller,faster multiplication circuits through encoding the signed bits to 2 s complement which is also the

standard technique in chip design and provide substantial improvement by reducing the partial products. Although the partial products are further reduced by using higher radix (4, 8, 16, 32) Booth Encoder which increases complexity and improves the performance. II. OVER VIEW OF MAC In this section, basic MAC operation is introduced. A multiplier can be divided into three operational steps. The first is radix-2 Booth encoding in which a partial product is generated from the multiplicand (X ) and the multiplier (Y). The second is adder array or partial product compression to add all partial products and convert them into the form of sum and carry. The last is the final addition in which the final multiplication result is produced by adding the sum and the carry. If the process to accumulate the multiplied results is included, a MAC consists of four steps, as shown in Fig. 1, which shows the operational steps explicitly. CSA structure is proposed to reduce the critical path and improve the output rate. It uses MBA algorithm based on 1 s complement number system. A modified array structure for the sign bits is used to increase the density of the operands. A carry look-ahead adder (CLA) is inserted in the CSA tree to reduce the number of bits in the final adder. In addition, in order to increase the output rate by optimizing the pipeline efficiency, intermediate calculation results are accumulated in the form of sum and carry instead of the final adder outputs. A general hardware architecture of this MAC is shown in Fig. 2. It executes the multiplication operation by multiplying the input multiplier and the multiplicand. This is added to the previous multiplication result as the accumulation step. Fig. 2. Hardware architecture of general MAC. Modified Booth Encoder In order to achieve high-speed multiplication, multiplication algorithms using parallel counters, such as the modified Booth algorithm has been proposed, and some multipliers based on the algorithms have been implemented for practical use. This type of multiplier operates much faster than an array multiplier for longer operands because its computation time is proportional to the logarithm of the word length of operands. Figure 1: Basic Arithmetic Steps of Multiplication and Accumulation In this paper, a new architecture for a highspeed MAC is proposed. In this MAC, the computations of multiplication and accumulation are combined and a hybrid-type Fig 3. Modified Booth Encoder

Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding the numbers that are multiplied. It is possible to reduce the number of partial products by half, by using the technique of radix-4 Booth recoding. The basic idea is that, instead of shifting and adding for every column of the multiplier term and multiplying by 1 or 0, we only take every second column, and multiply by ±1, ±2, or 0, to obtain the same results. The advantage of this method is the halving of the number of partial products. To Booth recode the multiplier term, we consider the bits in blocks of three, such that each block overlaps the previous block by one bit. Grouping starts from the LSB, and the first block only uses two bits of the multiplier. Figure 3 shows the grouping of bits from the multiplier term for use in modified booth encoding. Fig.3.1 Grouping of bits from the multiplier term Each block is decoded to generate the correct partial product. The encoding of the multiplier Y, using the modified booth algorithm, generates the following five signed digits, -2, -1, 0, +1, +2. Each encoded digit in the multiplier performs a certain operation on the multiplicand, X, as illustrated in Table 1 reduce the number of partial products for roughly one half. For multiplication of 2 s complement numbers, the two-bit encoding using this algorithm scans a triplet of bits. When the multiplier B is divided into groups of two bits, the algorithm is applied to this group of divided bits. Figure 4, shows a computing example of Booth multiplying two numbers 2AC9 and 006A. The shadow denotes that the numbers in this part of Booth multiplication are all zero so that this part of the computations can be neglected. Saving those computations can significantly reduce the power consumption caused by the transient signals. According to the analysis of the multiplication shown in figure 4, we propose the SPST-equipped modified-booth encoder, which is controlled by a detection unit. The detection unit has one of the two operands as its input to decide whether the Booth encoder calculates redundant computations. As shown in figure 9. The latches can, respectively, freeze the inputs of MUX-4 to MUX-7 or only those of MUX-6 to MUX-7 when the PP4 to PP7 or the PP6 to PP7 are zero; to reduce the transition power dissipation. Figure 10, shows the booth partial product generation circuit. It includes AND/OR/EX-OR logic. III. Partial product generator: For the partial product generation, we adopt Radix-4 Modified Booth algorithm to Fig4.Booth partial product selector logic

The multiplication first step generates from A and X a set of bits whose weights sum is the product P. For unsigned multiplication, P most significant bit weight is positive, while in 2's complement it is negative. The partial product is generated by doing AND between a and b which are a 4 bit vectors as shown in fig. If we take, four bit multiplier and 4-bit multiplicand we get sixteen partial products in which the first partial product is stored in q. Similarly, the second, third and fourth partial products are stored in 4-bit vector n, x, y. The multiplication second step reduces the partial products from the preceding step into two numbers while preserving the weighted sum. The sough after product P is the sum of those two numbers. The two numbers will be added during the third step The "Wallace trees" synthesis follows the Dadda's algorithm, which assures of the minimum counter number. If on top of that we impose to reduce as late as (or as soon as) possible then the solution is unique. The two binary number to be added during the third step may also be seen a one number in CSA notation (2 bits per digit). Fig 6.Booth Decoder III. Existing System NR4SD - Encoding Scheme Fig.7. Block Diagram of the NR4SD - Encoding Scheme at the (a) Digit and (b) Word Level. The following Boolean equations summarize the HA* operation: Fig 5.Booth Encoder Calculate the value of the

Table 2 shows how the NR4SD digits are formed. The NR4SD encoding signals generated. of Table 2 are For the computation of the least and the most significant bits of the partial product we consider and respectively. Note that in case that, the number of the resulting partial products is and the most significant MB digit is formed based on sign extension of the initial 2 s complement number. After the partial products are generated, they are added, properly weighted, through a Carry-Save Adder (CSA) tree. Finally, the carry-save output of the Wallace CSA tree is leaded to a fast Carry Look Ahead (CLA) adder to form the final result Z = X. Y. NR4SD + Encoding Scheme Fig. 8. Block Diagram of the NR4SD+ Encoding Scheme at the (a) Digit and (b) Word Level. Calculate the value of the Table 3 shows how the NR4SD digits are formed. The NR4SD encoding signals of Table 3 are generated Fig.9.System Architecture of the NR4SD Multipliers In the pre-encoded MB multiplier scheme, the coefficient B is encoded off-line according to the conventional MB form (Table 1). The resulting encoding signals of B are stored in a ROM. The circled part of Fig. 3, which contains the ROM with coefficients in 2 s complement form and the MB encoding circuit, is now totally replaced by the ROM.The MB encoding blocks of Fig. 3 are omitted. The new ROM is used to store the encoding signals of B and feed them into the partial product generators (P Pj Generators - PPG) on each clock cycle. Targeting to decrease switching activity, the value 1 of s j in the last entry of Table 1 is replaced by 0. The sign s j is now given by the relation:

However, the ROM width is increased. Each digit requests three encoding bits (i.e., s, two and one (Table 1)) to be stored in the ROM. Since the n-bit coefficient B needs three bits per digit when encoded in MBform, the ROM width requirement is 3n/2 bits per coefficient. Thus, the width and the overall size of the ROM are increased by 50% compared to the ROM of the conventional scheme. The system architecture for the preencoded NR4SD multipliers is presented in Fig. 6. Two bits are now stored in ROM: n2j+1, n+2j(table 2) for the NR4SDor n+2j+1, n2j(table 3) for the NR4SD+form. In this way, we reduce the memory requirement to +1 bits per coefficient while the corresponding memory required for the pre-encoded MB scheme is 3n/2 bits per coefficient. Thus, the amount of stored bits is equal to that of the conventional MB design, except for the most significant digit that needs an extra bit as it is MB encoded. Compared to the pre-encoded MB multiplier, where the MB encoding blocks are omitted, the pre-encoded NR4SD multipliers need extra hardware to generate the signals of (6) and (8) for the NR4SD and NR4SD+ form, respectively. Each partial product of the pre-encoded NR4SD and NR4SD+ multipliers is implemented based on Fig. 4c and 4d, respectively, except for the P Pk 1 that corresponds to the most significant digit. As this digit is in MB form, we use the PPG of Fig. 4b applying the change mentioned in Section 4.2 for the s j bit. The partial products, properly weighted, and the correction term (COR) of (11) are fed into a CSA tree. The input carry cin;j of (11) is calculated as cin;j = twoj_ onej and cin;j = onej for the NR4SDand NR4SD+pre-encoded multipliers, respectively,based on Tables 2 and 3. The carry-save output of the CSA tree is finally summed using a fast CLA adder. IV.PROPOSED SPST Besides the explanations presented in our former studies, this paper provides further illustrations of the proposed SPST as described in the following sections. The SPST uses a detection logic circuit to detect the effective data range of arithmetic units, e.g., adders or multipliers. When a portion of data does not affect the final computing results, the data controlling circuits of the SPST latch this portion to avoid useless data transitions occurring inside the arithmetic units. Besides, there is a data asserting control realized by using registers to further filter out the useless spurious signals of arithmetic unit every time when the latched portion is being turned on. This asserting control brings evident power reduction. Figure 5 shows the design of low power adder/subtractor with SPST. Fig 10. Spurious transition cases in multimedia/ DSP processing AMSP = A[15:8]; BMSP = B[15:8] ; Aand Band = A[15] A[14] A[8]; = B[15] B[14] B[8];] The adder /subtractor is divided into two parts, the most significant part (MSP) and the least significant part (LSP). The MSP of the original adder/subtractor is modified to include detection logic circuits, data controlling circuits, sign extension circuits, logics for calculating

carry in and carry out signals. The most important part of this study is the design of the control signal asserting circuits, denoted as asserting circuits in Figure 5. Although this asserting circuit brings evident power reduction, it may induce additional delay. There are two implementing approaches for the control signal assertion circuits. The first implementing approach of control signal assertion circuit is using registers. This is illustrated in Figure 6. The three output signals of the detection logic are close, Carr_ctrl, sign. The three output signals the detection logic unit are given a certain amount of delay before they assert. The delay, used to assert the three output signals, must be set in a range of, denotes the data transient period the earliest required time of all the inputs. This will filter out the glitch signals as well as to keep the computation results correct. The restriction that must be greater than to guarantee the registers from latching the wrong values of control usually decreases the overall speed of the applied designs the data latches to let the data in. Hence, the delay caused by the detection-logic unit will contribute to the delay of the whole combinational circuitry, i.e., the16-bit adder/subtractor in this design example. When the detection-logic unit remains its decision: No matter whether the last decision is turning on or turning off the MSP, the delay of the detection logic is negligible because the path of the combinational circuitry (i.e., the 16-bit adder/subtractor in this design example) remains the same. From the analysis earlier, we can know that the total delay is affected only when the detection-logic unit turns on the MSP. However, the detection-logic unit should be a speed-oriented design. When the SPST is applied on combinational circuitries, we should first determine the longest transitions of the interested cross sections of each combinational circuitry, which is timing characteristic and is also related to the adopted technology. The longest transitions can be obtained from analyzing the timing differences between the earliest arrival and the latest arrival signals of the cross sections of a combinational circuitry. Then, a delay generator similar to the delay line used in the DLL Fig 11. Low-power adder/subtractor design example adopting the proposed SPST. When the detection-logic unit turns off the MSP: At this moment, the outputs of the MSP are directly compensated by the SE unit; therefore, the time saved from skipping the computations in the MSP circuits shall cancel out the delay caused by the detection-logic unit. Fig 12.SPST Modified Booth encoder V. Results Simulation Results of MAC: When the detection-logic unit turns on the MSP: The MSP circuits must wait for the notification of the detection-logic unit to turn on

Fig 13 Simulation Waveform of MAC heights that provides the minimum number of reduction stages for a given size multiplier. This sequence determined by working back from the final two row matrix, limit the height of each intermediate matrix to the largest integer that is no more than 1.5 times the height of its successor. Fig 14 Schematic with Basic Inputs and Output CONCLUSIONS In this project, we propose a high speed low-power multiplier and accumulator (MAC) adopting the newspst implementing approach. This MAC is designed by equipping the Spurious Power Suppression Technique (SPST) on a modified Booth encoder which is controlled by a detection unit using an AND gate. The modifiedbooth encoder will reduce the number of partial products generated by a factor of 2. The SPST adder will avoid the unwanted addition and thus minimize the switching power dissipation. The SPST MAC implementation with AND gates have an extremely high flexibility on adjusting the data asserting time. This facilitates the robustness of SPST can attain 30% speed improvement and 22% power reduction in the modified booth encoder. This design can be verified using Modelsim and Xilinx using verilog. Future Scope: The proposed system can be done using Dadda multiplier, by using this delay will be reduced. The process of Dadda multiplication is as follows: The entire 16 16 multiplication requires six stages. Always the first stage is partial products stage, which is obtained by simple multiplication of multiplicand with multiplier. The number of rows (height) present at this stage is 16. Now reduce the number of rows further in such a way that final stage contains only two rows. For this, Dadda introduces a sequence of intermediate matrix REFERENCES [1] T. Stockhammer, M. Hannuksela, and T. Wiegand, H.264/AVC in wireless environments, IEEE Trans. Circuits Syst. Video Technol., vol.13, no. 7, pp. 657 673, Jul. 2003. [2] R. Schafer, T. Wiegand, and H. Schwarz, The emerging H.264/AVC standard, EBU Technique Review Jan. 2003 [Online]. Available:http://www.ebu.ch/trev_293- schaefer.pdf [3] A. Bellaouar and M. I. Elmasry, Low-Power Digital VLSI Design"Circuitsand Systems. Norwell, MA: Kluwer, 1995. [4] A. P. Chandrakasan and R. W. Brodersen, Minimizing power consumption in digital CMOS circuits, Proc. IEEE, vol. 83, no. 4, pp. 498 523, Apr. 1995. [5] K. K. Parhi, Approaches to low-power implementations of DSP systems, IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 48, no.10, pp. 1214 1224, Oct. 2001. [6] K. Choi, R. Soma, and M. Pedram, Dynamic voltage and frequency scaling based on workload decomposition, in Proc. IEEE Int.

Symp.Low Power Electron. Des., 2004, pp. 174 179. [7] J. Choi, J. Jeon, and K. Choi, Power minimization of functional units by partially guarded computation, in Proc. IEEE Int. Symp. Low Power Electron. Des., 2000, pp. 131 136. [8] O. Chen, R. Sheen, and S. Wang, A lowpower adder operating on effective dynamic data ranges, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 10, no. 4, pp. 435 453, Aug. 2002. [9] O. Chen, S.Wang, and Y. W.Wu, Minimization of switching activities of partial products for designing low-power multipliers, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no. 3, pp. 418 433, Jun. 2003. [10] L. Benini, G. D. Micheli, A. Macii, E. Macii, M. Poncino, and R. Scarsi, Glitch power minimization by selective gate freezing, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 8, no. 3, pp. 287 298, Jun. 2000.