DESIGN AND IMPLEMENTATION OF 128-BIT MAC UNIT USING ANALOG CADENCE TOOLS

DESIGN AND IMPLEMENTATION OF 128-BIT MAC UNIT USING ANALOG CADENCE TOOLS Mohammad Anwar Khan 1, Mrs. T. Subha Sri Lakshmi 2 M. Tech (VLSI-SD) Student, ECE Dept., CVR College of Engineering, Hyderabad, Telangana, India 1 Assistant Professor, ECE Dept., CVR College of Engineering, Hyderabad, Telangana, India 2 Abstract: Nowadays, the importance of low-power devices is increasing day by day because of changing trends, packaging style, cooling cost, portable system, and reliability. The low power VLSI design play s a vital role in every design. As it is becoming more important for designing of high performance and portable devices. In this paper, a low power, high speed and less area consuming 128-Bit Multiplier Accumulator (MAC) Unit are designed. MAC Unit is the basic building block in digital signal processing systems. To improve the performance and power consumption of MAC Unit, Vedic Multiplier and Kogge Stone Adder is used as they consume very less power & area and having high performance. The Vedic multiplier block is designed using Urdhva Tiryagbhya Sutra of Vedic Mathematics, which is considering as one of the easiest ways for multiplication. To compare the performance and power consumption of MAC unit, the multiplier and adder block is replaced with different multipliers and adders. The Schematic, Symbols & Test Benches of MAC Unit are designed and simulated using 45nm Technology libraries in Analog Cadence Virtuoso Tool and Layout is designed using Cadence ASSURA tool. Keywords: Baugh Wooley Multiplier, Carry Skip Adder, Kogge Stone Adder, Multiplier Accumulator Unit, Ripple Carry Adder, Vedic Multiplier, Wallace Tree Multiplier. I. INTRODUCTION In today s life demand for portable low-power devices is increased. Low power is the major consideration while designing the integrated circuit for applications like signal processing or digital signal processing system. The main function of Multiplier Accumulator (MAC) Unit is Multiplication and Addition. It is considered as the basic module in the processing unit. The MAC unit consists of Multiplier, Adder, and 1

Accumulator. This is to be of high speed and should consume less power [1]. The performance of the MAC unit can be increased by optimizing the basic building blocks that are used in the design of MAC unit. My proposed work mainly focuses on the high speed, low power consumption, and area of the MAC Unit. The project mainly focuses more on the use of Vedic Multiplier and Kogge Stone Adder due to its high speed, low area, and low power consumption. This paper is further divided into following sections. Section II describes the Multiplier, Section III describes the Adder architecture, Section IV describes the MAC Unit, and Section V describes the Result and simulation part and conclusion. II. MULTIPLIER ARCHITECTURE Multiplier plays an important role in our today s life, it is mostly used in digital signal processing and in various other applications with the use of advanced technology. The researcher was trying to implement and design such a multiplier which met at least one of the design targets such as high speed, low power consumption, regularity of layout and less area or even combination of them in one multiplier. The basic multiplier just uses add and Shift algorithm. In parallel multiplication the number of the partial product is added to get the result, it is the basic approach which states the performance of the multiplier. The multiplication algorithm for N bit multiplicand by N bit multiplier is shown below: Y = Yn-1 Yn-2.Y2 Y1 Y0 Multiplicand X = Xn-1 Xn-2.X2 X1 X0 Multiplier Generally Y = Yn-1 Yn-2 Y2 Y1 Y0 X = Xn-1 Xn-2 X2 X1 X0 =========================================================== Yn-1X0 Yn-2X0 Yn-3X0..Y1X0 Y0X0 Yn-1X1 Yn-2X1 Yn-3X1.Y1X1 Y0X1 Yn-1X2 Yn-2X2 Yn-3X2 Y1X2 Y0X2...... Yn-1Xn-2 Yn-2X0 n-2 Yn-3Xn-2 Y1Xn-2 Y0Xn-2 Yn-1Xn-1 Yn-2X0n-1 Yn-3Xn-1 Y1Xn-1 Y0Xn-1. P2n-1 P2n-2 P2n-3 P2 P1 P0 Figure 1: Vedic Mathematics Multiplication Algorithm A. Vedic Multiplier At the time of its inception, Vedic mathematics was written in Vedas in Sanskrit. This ancient system of Vedic mathematics was rediscovered by Sri Bharti Krishna Tirthaji (1880-1960), who translated the Sanskrit version from the Vedas and found 16 Sutras [4, 5, 6 ]. This multiplier uses Urdhva Tiryagbhya Sutra, which helps us to do almost all the numeric computations in easy and fast manner. Urdhva Tiryakbhyam means vertical and crosswise. This is used in all type of multiplication. 2

Figure 2: Vedic Multiplier 2X2 using Urdhva Tiryakbhyam Sutra The design of 2x2 block is shown in figure 2. It is the basic block of the Vedic multiplier and this 2x2 block will be used in 4x4 VM block while designing it. As in binary multiplication AND operation it performance each of two bits in a 2- input AND gate [4]. AND operation is performed on the bits first vertically which gives LSB then again AND operation is performed in the crosswise bits and the result is added using Half Adder. The carry output of the half adder is also added with the MSB of AND gate to get the final output. B. Wallace Tree Multiplier The Wallace tree multiplier is a most efficient multiplier which is implemented to perform multiplication between two integers. The Wallace tree multiplier is a digital circuit which works in three steps. Firstly, the two integers of the different bit are multiplied with each other. Each integer bit is multiplied with the other bits of multipliers and the partial product is obtained. Now in the second step, all the partial product is reduced with the help of half and full adder. A pair of 3 bit or 3 partial products is taken and given as input to the full adder which replaces the old value by the new values. If the partial product is of two-bit then these two bits are given to half adder. The obtained partial products are in two layers, make the group of each layer vertically and add them with an adder to get the final output. If there is only one partial product is left then connect that number to the next layer and by these way, the output is obtained. The inputs are fed to the AND gate which gives 2n products and forms tree-like structure. Then their inputs are further fed to the half adder and full adder to reduce the partial product. This process keeps on running until the partial product is completely reduced to one and then from the output is taken from each half adder and full adder. The benefit of using the Wallace tree multiplier is that it reduces the number of layers in a multiplication and the propagation delay. Whereas it increases the multiplication speed and performance. Figure 3: The 4x4 bit Wallace Tree Multiplier 3

Wallace tree multiplier can be designed till 32 bits, beyond that complexity of the circuit increases. Increasing the bit size which will increase the chip area, speed and power Consumption of the multiplier. C. Baugh Wooley Multiplier During the design of multipliers, the main problem faced is the size as basically the design of multiplier is in the complex shape and size which consumes more chip area and it is difficult for the designer to wire the internal connection as the internal circuitry connection complexity is more it takes more time to perform any mathematical or logical operation. To overcome this problem the design of Baugh Wooley multiplier is in the linear form and it is designed using AND gate, half adder and full adder. Which decreases the complexity of the circuits and increases the output results. This algorithm is used for both signed and unsigned binary multiplication. According to this algorithm all the input A and B should be AND together and terms are created first and then this AND terms are sent to the array of adders which are the half adder and full adder with their carryout connected to the next most significant bit of other adders at each level of additions. Baugh Wooley multiplier can also be used for multiplication of negative operands. Figure 4: Baugh Wooley Multiplier 4x4 Bit. The partial products are adjusted in such a way that the negative sign moves to the last step, which in turn maximizes the regularity of the multiplication array. III. ADDER ARCHITECTURE Adders are the basic building blocks of the complex arithmetic circuits. They are widely used in central processing unit (CPU), Arithmetic logic unit (ALU), and floating point units, for address generation in case of cache or memory access in digital signal processing. Adders with fast addition operation and low power along with [10] less area consumption is still a challenging issue. A. Kogge Stone Adder The parallel prefix adder is faster than ripple carry adder. For binary addition, parallel prefix adder is considered most efficient. [10] Their regular structure and fast performance make them particularly attractive for VLSI implementation. Kogge Stone Adder is a parallel prefix adder. It is considered as fastest and widely used in industrial purpose for highperformance arithmetic units. 4

Figure 5: Kogge Stone Adder for 4- Bit. Kogge Stone Adder employs 3 stage structure of the Carry Look Ahead Adder. In KSA, carry is computed fastly by computing the carries in parallel. It is often desirable to use an adder with good timing area and efficiency trade-off. This carry computation method leads to speed up the overall operation significantly and to reduce the area and increase the speed. a. Pre-Processing In this step computation of generating and propagate signals by corresponding to each pair of bits in A and B., These signals are given by the logic equation below. Pi = Ai xor Bi Gi = Ai and Bi b. Carry look Ahead Network In this step computation of carries corresponds to each bit. It uses the group propagation and generates signals as intermediate signals. The logical equation to explain about above situation is given below Pi:j = Pi:k+1 and Pk:j Gi:j = Gi:k+1 or (Pi:k+1 and Gk:j) c. Post Processing In this step, it involves the computation of sum bits. The logic equation is given below: Si = Pi xor Ci-1 B. Ripple Carry Adder The ripple carry adder is composed of a chain of the full adder with length n, where n is the length of the input operands. Implementation of RCA consists of a full adder. It is very useful for an addition that has multiple bits in each of its operands. It takes three inputs and creates two outputs a Sum and a Carryout. Sum = A xor B xor C Carry_out = AB + BC + CA Figure 6: 4-Bit Ripple Carry Adder. 5

In this paper, 128-bit ripple carry adder is used to design the 128 x 128-bit Vedic Multiplier and MAC unit. C. Carry Skip Adder Carry skip adder is the alternate form of carry look ahead adder schemes. Carry skip adder is first proposed by Kilbum Et Al to accelerate the carry propagation [7, 12]. It improves the performance by adding additional group carry bypass paths to its ripple path and carries can bypass the ripple path when the group propagates signally is high. Figure 7 shows the basic structure of N- Bit carry skip adder with K- bits per group. Figure 7: Basic Structure of Carry Skip Adder This scheme is the faster than ripple carry adder with a small overhead in adding bypass circuitries to accelerate the carry propagation delay. However, the delay is linearly dependent on the number of bits N. IV. MAC UNIT MAC unit consists of multiplier, adder, and accumulator. It is the basic and most frequently used component in DSP system to perform convolution, accelerate the FFT or FIR systems [5, 6]. To get high-speed MAC unit high-speed multiplier and adder circuits are required. Figure 8 shows the architecture of MAC unit. The inputs for MAC are fetched from the memory blocks, which will perform the multiplication and give the result to an adder which will accumulate the result and then will store the result into a memory location. This whole operation has to be achieved in one clock cycle. Figure 8: Block Diagram for Multiplier Accumulator Unit. The proposed design of MAC unit consists of one 128 x 128-bit Vedic Multiplier which will perform the multiplication operation and the output of 256 bit will be given as input to 6

the Kogge Stone Adder [9], which is of 256 Bit. Which will perform the addition and the result of the adder will be fed into the accumulator[10] which consists of Adder and memory unit where the output of memory unit is feedback to the adder circuit. Finally, the output is taken from the memory unit. This entire process should take only one clock cycle to get completed. Due to feedback, the current output with the next output is obtained. To compare the power and area consumption, the 32 - bit MAC Unit is designed by changing the multiplier and adder circuits by Baugh Wooley Multiplier[7], Wallace Tree Multiplier[8], Ripple carry Adder and [11]Carry skip Adder so that the most efficient and high-performance MAC Unit is obtained. V. RESULT & SIMULATION The proposed 128 bit and 32 Bit MAC Unit using Vedic Multiplier and Kogge Stone Adder has been successfully tested and synthesized in Cadence Virtuoso Tools using 45nm Technology with a supply voltage of 1.0 V. The power consumption of proposed 128 Bit and 32 Bit MAC Unit are calculated for all input conditions and the worst case power consumption is noted down. Figure 9: Vedic Multiplier 32x32 Bit Figure 10: Baugh Wooley Multiplier 32x32 Bit Figure 11: Wallace Tree Multiplier 32X32 Bit Figure 12: 32 Bit MAC Unit Using Vedic Multiplier 7

Figure 13: 32 Bit MAC Unit Using Wallace Tree Multiplier Figure 14: 32 Bit MAC Unit Using Baugh Wooley Multiplier. Figure 15: 128 Bit MAC Unit Using Vedic Multiplier Figure 16: Simulation Result for 32X32 Bit Vedic Multiplier Figure 17: Simulation Result For 32x32 Wallace Tree Multiplier. 8

Figure 18: Simulation Result for 32 Bit MAC Unit Using Vedic Multiplier Figure 19: 32 Bit MAC Unit Using Wallace Tree Multiplier Simulation Result Figure 20: Simulation for 32 Bit MAC Unit Using Baugh Wooley Multiplier Figure 21: Simulation Result of 128 Bit MAC Unit Using Vedic Multiplier The power consumption is compared with two other 32 Bit MAC unit using two different Multiplier and Adder [1, 3] the comparison is shown in Table 1. Table 1: comparisons between Three MAC Units 9

MAC Unit Using Vedic Multiplier MAC Unit using Baugh Wooley Multiplier MAC Unit using Wallace Tree Multiplier Area Less Medium Very Large Power Dissipation Less More More than BWM Complexity Easy Complex Very Complex Speed Fast Very Slow Faster than BWM From table 1. The power consumption, size, and complexity of proposed MAC Unit by using Vedic Multiplier is more efficient than another MAC Unit using different Multiplier and Adder. CONCLUSION The proposed design of MAC Unit using Vedic Multiplier and Kogge Stone Adder gives high efficiency and performance. In the proposed design the dynamic Power, area, complexity and Power Dissipation is less. This type of implementation of Multiplier Accumulator Unit can be extendable for a higher number of bits because of the advantage of consuming less area and power. It is used for real-time digital signal processing like audio signal processing, video/image processing, large capacity data processing system, real-time surveillance and radar communication. ACKNOWLEDGMENT It is my great pleasure to convey my gratitude to my principal Dr. Nayanathara K S and Prof. P. Viswanath, Head of the ECE Department CVRCE for arranging the necessary facilities for executing the project in college. My sincere gratitude to my guide, Mrs. T. Subha Sri Lakshmi, Assistant Professor, ECE Dept, CVRCE whose guidance and valuable suggestion have been indispensable to bring about the successful completion of my project and special thanks to the project coordinators Mrs. K. A. Jyotsna, Associate Professor, ECE Dept, CVRCE for assessing seminars, inspiration, moral support and giving me valuable suggestions in my project. 10

REFERENCES [01] R.K, Bathija, R. S. Meera, and S. Sarkar, Low-Power High-Speed 16x16 Bit Multiplier using Vedic Mathematics, International Journal of computer application, vol. 59-No. 6, pp. 41-44, December 2012. [02] Gitika Bhatia, Karanbir Singh Bhatia, Shashank Srivastava, Pradeep Kumar, Design and Implementation of MAC Unit Based on Vedic Square and Its Application, IEEE UP Section Conference on Electrical Computer and Electronics., 2015. [03] V.K. Karthik, Y. Govardhan, Design and multiply and Accumulate unit using Vedic Multiplier Technique, in IJSCR, vol. IV, June 2013, pp. 756-760. [04] Jagadguru Swami Sri Bharath, Krsna Tirathji, Vedic Mathematics or Sixteen Simple Sutras from The Vedas, Motilal Banarsidass, Varana-Si (India), 1986. [05] A.P. Nicholas, K.R Williams, J. Pickles, Application of Urdhava Sutra, Spiritual Study Group, Roorkee (India), 1984. [06] Jagadguru Swami Sri Bharati Krsna Tirthji Maharaja, Vedic Mathe-matics, Motilal Banarsidas, Varanasi, India, 1986. [07] Dakupati.Ravi Sankar, Shaik Ashraf Ali "Design of Wallace Tree Multiplier by Sklansky Adder"International Journal of Engineering Research and Applications. [08] Pramodini Mohanty, Rashmi Ranjan, An Efficient Baugh Wooley Architecture for Both Signed And unsigned Multiplication, International Journal of Computer Science & Engineering Technology, Vol. 3, No. 4, April 2012. [09] Indrayani patle, Akansha Bhargav, Implementation of Baugh Wooley Multiplier Based on Soft Core processor, IOSRJEN, Vol. 3, Issue 10, Oct 2013, pp. 01-07. [10] B. Tapasvi, K. Bala Sinduri, Implementation of 64-Bit Kogge Stone Carry Select Adder with ZFC for Efficient Area, IEEE, 2015. [11] Ugur Cini, Olcay Kurt A MAC Unit with Double Carry-Save Scheme Suitable for 6- Input LUT Based Reconfigurable Systems, International Conference on Design & Technology of Integrated Systems, 2015 [12] Aniruddha Kanhe, Shishir Kumar Das, and Ankit Kumar Singh, Design And Implementation Of Low-Power Multiplier Using Vedic Multiplication Technique, International Journal of Computer Science and Communication (IJCSC) Vol. 3, No. 1, January-June 2012, pp. 131-132. [13] Prabha S., Kasliwal, B.P. Patil, and D.K Gautam, Performance Evaluation of Squaring Operation by Vedic Mathematics, IETE Journal of Research, 57(1), Jan-Feb 2011. 11