Proceedings of International Conference on Emerging Trends in Engineering & Technology (ICETET) 29th - 30 th September, 2014 Warangal, Telangana, India (SF0EC024) ISSN (online): 2349-0020 A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier 1 CH Murthy, and 2 T. Santhosh Kumar 1 2 Department. of ECE, MLR Institute of Technology, Dundigal, Hyderabad, India 1 sriram.t1984@gmail.com and 2 santoshkumar.tula@gmail.com ABSTRACT A Novel High Performance Design of 64 bit Multiplier-and-Accumulator (MAC) Unit is implemented in this paper by using Modified Wallace Tree Multiplier and Carry Save Adder. MAC unit performs many important Arithmetic and other operations in many of the digital signal processing (DSP) applications. This 64-bit MAC Unit is suitable to use in 64-bit DSP Processors. The Project is coded in Verilog-HDL. The Power Dissipation of entire MAC Unit is 182.312 mw. Key words Modified Wallace Multiplier, Carry Save Adder, Multiplier and Accumulator (MAC), Digital Signal Processor (DSP), Verilog-HDL. I. Introduction MAC Unit is a common step in many Digital Signal Processing (DSP) applications involving multiplications and/or accumulations. Modern computers have high performance dedicated MAC unit consisting of Multiplier, adder and an accumulator register to store the result [1]. Particularly this MAC Unit is used for high performance digital signal processing systems. Where, the DSP applications include Digital Filtering, Convolution, and Computing of Inner Products. This MAC Unit is Isolated from CPU so that operations such as multiplications and/or additions are done separately, by reducing the CPU Load. Fast Fourier Transform (FFT) Operations are performed by MAC Unit because FFT requires addition and/or Multiplication operations [2]. A MAC unit consists of a multiplier, adder and an accumulator containing the sum of the previous successive products. The MAC Unit obtain inputs from the memory location such as RAM and given to the Multiplier.MAC Unit is used in DSP Applications that uses discrete cosine transform (DCT) or discrete wavelet transforms (DWT). Where, Multiplication is accomplished by repetitive application of addition, the speed of the multiplication and addition arithmetic determines the execution speed and performance of the entire Calculation. The functionality of the MAC unit enables highspeed filtering and other processing which are typical for DSP applications. Particularly, in applications like optical Communication Systems which is based on DSP, require extremely fast processing of huge amount of digital data. The Design of MAC Unit in this Paper Consists of 64 bit Modified Wallace Tree Multiplier, 128 bit Carry Save Adder (CSA) and 129 bit Accumulator. This Paper is divided into Five Sections. In the First section the introduction about MAC unit is discussed. In the Second section the detailed operation of MAC unit is described. The Third section deals with the basic building blocks of MAC Unit. In the Fourth section, Results and Comparisons of various MAC units are given. Finally the Conclusion is made in the Fifth section. II. MAC UNIT The Multiplier-Accumulator (MAC) operation is the key operation not only in DSP applications but also in multimedia information processing and various other applications. As mentioned above, MAC unit consist of multiplier, adder and register/accumulator. ICETET 2014 SF0EC024 P a g e 142
In this paper, we used 64 bit modified Wallace multiplier. The MAC Unit take inputs from the memory location such as RAM and given to the multiplier block. This is very useful in 64 bit digital signal processor. The inputs which is being fed from the memory location is 64 bit. When the input is given to the multiplier it starts computing value for the given 64 bit input and hence the output will be 128 bits. The multiplier output is given as the input to carry save adder (CSA) which performs addition. The function of the MAC unit is given by the following equation 63 Y = Ai x Bi (1) i=0 where, Ai & Bi are two 64 bit input Operands, Y is the output of MAC Unit and i is a 64 bit value. This Equation performs Summation of partial products. The Carry Save Adder (CSA) produces 129 bit output. Since, one bit is for the carry (128 bits + 1 bit). Then, the output of CSA is given to the accumulator register. The accumulator used is designed with Parallel in Parallel out (PIPO) Type [3]. Because the CSA Produces output in Parallel form and also the bits are huge. PIPO register is used where the input bits are given in parallel and output is taken in parallel. The output of the accumulator register is taken out or fed back as one of the input to the CSA. Fig. 1 shows the basic architecture of MAC unit. A Wallace tree is an efficient hardware implementation of a digital circuit that multiplies two integers. The Wallace tree has three steps: Multiply (that is - AND) each bit of one of the arguments, by each bit of the other. Secondly, reduce the number of partial products to two by layers of full and half adders. Finally, Group the wires in two numbers, and add them with a conventional adder. The Benefit of Wallace Tree is that there are only few layers and thereby reducing the propagation delay. But, As the number of bits to multiply increases, no. of adders (full adders & half adders) increases thereby increasing propagation delay [4]. In this Paper, Modified Wallace Tree Multiplier is proposed where no. of half adders get reduced and therefore circuit complexity reduces, power dissipation also decreases at a reduced propagation delay. Modified Wallace Tree Multiplier has three stages, to achieve the desired functionality. Initially, the N x N Product Matrix is formed and rearranged to take the shape of inverted pyramid. Secondly, the rearranged product matrix is grouped into nonoverlapping group of three parts. Single bit and two bits column in the group will be passed to the next stage and three bits column is given to a full adder circuit[5]. The number of rows (R) in each stage of the reduction stage is calculated as R i+1 = 2 [R i / 3] + R i Mod 3 (2) If R i Mod 3 = 0, Then R i+1 = 2 [R i / 3] (3) If the Calculated value from the above equation for number of rows in the each stage in the 2 nd phase and the number of rows that are formed in each stage of 2 nd phase does not match, only then the Half Adder will be used [6]. Figure 1: Block Diagram of MAC Unit. III. Modified Wallace Multiplier And Carry Save Adder The Final Product of the 2nd Stage will be in the Height of two bits and passed on to the 3rd Stage. During the Third Stage the Output of Second Stage is then given to the carry propagation adder to generate the final output. ICETET 2014 SF0EC024 P a g e 143
Figure 2: A 10 - bit Modified Wallace Tree Multiplier. Figure 3: A Typical 8 bit Carry Save Adder Thus 64 bit Modified Wallace Tree Multiplier is constructed and the total number of stages in the second phase is ten. As per the equation the number of row in each of the ten stages was calculated and the use of half adders was restricted only to the Tenth stage. The total number of half adders used in the second phase is eight and the total number of full adders that was used during the second phase is slightly increased that in the Conventional Wallace Tree Multiplier [7]. Here, we compute the sum of two 128 bit binary numbers so 128 half adders at the first stage is required instead of 128 full adders. Since, we add bits of two binary numbers only [9]. If, P and Q are two 128 bit numbers then it produces the partial products and carry Si and Ci respectively. Where, Si = Pi Qi Ci = Pi. Qi Since, the 64 bit Modified Wallace Tree Multiplier Representation is very difficult, A Typical 10 bit by 10 bit reduction example is as shown in Figure 2. the Modified Wallace Tree shows better Performance when CSA is used in final stage instead of Ripple Carry Adder (RCA)[8]. The Carry Save Adder (CSA) is a type of Digital Adder, used to compute the sum of three or more number of bits in binary form. CSA gives less propagation delay and the Glitching problem in RCA is also avoided. Since, the Representation of 128 bit CSA is very difficult, A Typical example of 8 bit CSA is shown in Figure 3. However, a CSA Produces all the output values in parallel. so that, the computation time is reduced compared to RCA. Also, Parallel in Parallel Out (PIPO) is used in Accumulator Stage [10]. IV. RESULTS The Design is developed using Verilog - HDL and Synthesized using Xilinx 13.2 ISE. As a previous work different MAC Units were developed using different combination of multipliers and adders. A Comparison of different multipliers used such as (i) Modified Booth Algorithm, (ii) Dadda Multiplier, ICETET 2014 SF0EC024 P a g e 144
(iii) Wallace Multiplier. The different adders used are (i) Carry Save Adder, (ii) Carry Select Adder. The Area, Power Dissipation and Delay comparison of different MAC Units are shown in Figure 4 as a graph and in tabular form in Table 1. Figures 4 and 5 shows the cell area and total power dissipation of different MAC unit. These figures clearly indicates that the MAC unit which are developed using Wallace Tree Multiplier Requires less area, dissipate less power and also delay caused by it is also less when compared other MAC unit which are done either by Modified Booth or Dadda multiplier. Hence a 64 bit MAC unit is designed using Modified Wallace Multiplier and Carry Save Adder. The simulation waveform result of 64 bit MAC unit in shown in the figure 6. In the simulation result it is observed that there is a delay by one clock cycle, this is because it takes some time to compute since the multiplier used here is 64 bit and the adder used here is 128 bits. Table 1: Area and Delay Comparison of MAC Units Figure 4: Graph Comparison of Cell Area in µm 2 Figure 5: Comparison of Total Power Dissipation in mw. ICETET 2014 SF0EC024 P a g e 145
REFERENCES [1] C.S.Wallace " A Suggestion for a fast multipliers, "IEEE Transaction Electronic Computers, Vol 13, Feb 1967. [2] C.S.Wallace "Asuggestion for a fast multiplier" IEEE Transaction Electronic Computers, Vol EC-13, Feb 1964. Table 2: Power Dissipation of Different MAC Units [3] Young-Ho Seo and Dong-Wook Kim, "New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix -2 Algorithm. "IEEE Transactions on very large scale integration systems, Vol. 18, Feb 2010". [4] Shanthala S, Dr. Cyril Prasanna Raj, Dr. S Y Kulkarni, " Design and Implementation of Pipelined MAC " IEEE Int. Conf. on Emerging trends in Eng. & Tech., ICETET - 09. [5] Low Voltage Low Power VLSI Sub Systems by Kiat Seng Yeo Kaushik Roy. [6] Dadda, " Some Schemes for Parallel Multipliers," Alta N Frequenza, Vol,34, 1965. Figure 6: Simulation Waveform of 64-bit MAC Unit V. CONCLUSIONS Hence, a High Performance 64 bit MAC Unit is designed and implemented using Modified Wallace Tree Multiplier and Carry Save Adder. When compared to all other MAC Units which are developed earlier using different combinations of multipliers and adders, the designed Modified Wallace Tree Multiplier offers High Performance with Less Area, Less Power Dissipation with Less Propagation Delay, which further increases the overall speed of MAC Unit. This MAC Unit is designed using Verilog - HDL and Synthesized using Xilinx 13.2 ISE. [7] L. Dadda, "On Parallel Digital Multiplier ", Alta N Frequen Vol.45, pp 574-580, 1976. [8] V.G.Oklobdzija, "High-Speed VLSI Arithmetc Units:Adders and Multipliers" A.Chandrakasan, IEEE Press, 2000. [9] W.J. Townsend, E.E. Swartzlander Jr. and Dr. D.A Abraham "A Comparison of Dadda and Wallace Multiplier", 2003. [10] FabrizioLamberti and NikosAndrikos "Reducing the time of computation in two's compliment multipliers", vol 60, Feb 2011. ICETET 2014 SF0EC024 P a g e 146