An Efficient Design of Low Power Speculative Han-Carlson Adder Using Concurrent Subtraction S.Sangeetha II ME - VLSI Design Akshaya College of Engineering and Technology Coimbatore, India S.Kamatchi Assistant Professor, Department of ECE Akshaya College of Engineering and Technology Coimbatore, India Abstract The Binary addition is one of the mainly essential arithmetic function in VLSI systems. Adders are widely used in integrated circuits. Adders are not only necessary for addition, but it also needs for subtraction. This paper presents a speculative Han-Carlson adder using concurrent subtraction. The proposed adder uses speculation: the exact arithmetic function is substituted with an estimated one that provides right result most of the time but not forever. The estimated adder is augmented with an error detection network that states an error signal when approximation fails. It is based on parallel- prefix topology and uses subtraction process. The aim of the proposed technique is that to reducing the power. Simulation can be performed using ModelSim SE 6.3f and Xilinx ISE 8.1i Tool. Keywords Addition, arithmetic, speculative adder, power. I. INTRODUCTION Adders are basic efficient units in computer arithmetic. Binary adders are extensively used in microprocessor for addition and subtraction processes as well as for floating point multiplication and division. Therefore adders are fundamental components and improving their performance is one of the main challenges in digital designs. High speed adders are based on well established parallel-prefix architectures [1], [2], including Brent- Kung [3], Kogge-Stone [4], Sklansky [5], Han- Carlson [6], Ladner-Fischer [7], Knowles [8]. These standard architectures consume more power. Better performance can be achieved by using approximation circuits that have been recently proposed in literature [9]. Proposed adder uses speculation: the exact arithmetic function is substituted with an estimated one that gives the right result most of the time, but not forever. The approximated adder is augmented with an error detection network that states an error signal when approximation fails. In this case (misprediction), another clock cycle is needed to obtain the right result with the help of a correction stage. The paper is planned as follows. In section II we recall Literature review. In section III we will describe the existing speculative HCA in detail. In section IV we will describe the proposed speculative HCA using concurrent subtraction in detail. Section V shows simulation results. Conclusion is given in section VI. Future work is given in section VII. A. Kogge-Stone Adder II. LITERATURE REVIEW Kogge-Stone Adder (KSA) is a parallel-prefix structure of carry look ahead adder. KSA was introduced by P. M. Kogge and H. S. Stone [4] in 1973. Fig. 1 shows 16-bit KSA is a speedy adder design as it makes carry signal in O(log2 n) time and has the best performance in VLSI implementations. The work is generally focused on design time. It takes more area to implement than Brent-Kung adder but has lesser fan-out and wiring congestion is often a problem. B. Brent-Kung Adder Fig. 1. 16-bit Kogge-Stone Adder Brent-Kung Adder (BKA) was introduced by R. P. Brent and H. T. Kung [3] in 1982. BKA has greatest logic depth, minimum area and avoid blast of wires. The work is commonly focused on area ISSN: 2394-2584 http://www.internationaljournalssrg.org Page 1
optimization. Fig. 2 shows 16-bit BKA. The buffers used to reduce the fan-out but, in practice, the buffers are usually vanished. (4) P (i:k) = P (i:j). P (l:k) (3) G (i:k) = G (i:j) + (G (l:k). P (i:j) ) where i l j k. C i = G [i:0] + P [i:0] C -1 (5) Fig. 2. 16-bit Brent-Kung Adder where C -1 is the input carry of the n-bit adder. In the following, for the sake of ease, we assume that C -1 = 0, so that (5) simplifies as: C. Ladner-Fischer Adder Ladner-Fischer Adder (LFA) was introduced by R. E. Ladner and M. J. Fischer [7] in 1980. Ladner- Fischer prefix tree is an arrangement that sits between Brent-Kung and Sklansky prefix tree. The LFA has lowest logic depth but it has great fan-out. Fig. 3 shows the 16-bit LFA. (6) C i = G [i:0] 3)Post-Processing This is the final stage sum bit S i are computed by (7). (7) S i = P i xor C i-1 Fig. 3. 16-bit Ladner-Fischer Adder D. Non-Speculative Han-Carlson Adder Non-Speculative Han-Carlson Adder (HCA) was introduced by T. Han and D. A. Carlson [6] in 1987. This work 16-bit non-speculative HCA is presented. It consists of three stages: pre-processing, prefixprocessing and post-processing. Fig. 4 shows the 16- bit Non-Speculative HCA. 1)Pre-Processing In the pre-processing stage A & B are the inputs. The propagate and generate signals are given by (1) & (2). Where i = 0,1,...n-1 n-number of bits. (1) (2) P i = A i xor B i G i = A i. B i 2)Prefix-Processing In the prefix-processing stage propagate and generate are computed by (3) & (4). Fig. 4. 16-bit Non-Speculative Han-Carlson Adder The Han-Carlson trees are the family of networks between KSA and BKA. HCA can be viewed as a sparse version of KSA. This scheme is different from Kogge-Stone method in the sense that these performs carry-merge functions on even bits and generate/propagate functions on odd bits. At the last part, these odd bits recombine with even bits carry signals to produce the true carry bits. This adder has five stages in which the central three stages are resembles with the Kogge-Stone arrangement. The advantage of this adder is that it employs much smaller amount of cells and its shorter span wires than the KSA and thus there is decrease in complexity at the cost of an additional stage for carry-merge path. The main limitation of this adder is that increased power consumption. E. Approximation Circuits Approximation circuits was introduced by S.-L. Lu in 2004. The main concept [9] is that by using certain approximation circuits the microprocessor s clock frequency can be increased. The approximation is based on replacing the entire logic function by a simple circuit which has the capability to mimic the ISSN: 2394-2584 http://www.internationaljournalssrg.org Page 2
function of the main circuit and also provides the predicted results. This work is mostly focused in achieving reduction in delay and for these, approximate adders were used in the microprocessors that developed shorter carry chains; hence the speed can be increased. The speed in processors was monitored by analyzing its operations in microprocessors and micro architecture devices where the adder circuits were used to provide the approximate computation. The delay was found to be reduced to 50% and the process attained 65% accuracy for 32-bit addition using 4-bit carry and inputs. Handshaking-overhead caused during the finishing point of execution at each stage is a major drawback. F. Speculative Carry Select Addition Speculative Carry Select Adder [15] was developed by K. Du, P. Varman, and K. Mohanram in 2012. The basic idea is to slice the chain of propagate signals in addition into blocks of the same size. Distinctively, the input bits of addends are segmented into blocks, and the carry bits between blocks are selectively abbreviated to 0. All outputs of a block, instead of each output, are speculated together, which alleviates the area overhead problem. The speculative adder produces correct results in a single cycle in most cases, and error recovery offers correct results in an extra cycle. Extra cycle during error recovery, increased latency are some of the limitations. III. EXISTING SPECULATIVE HCA Speculative Han-Carlson Adder (HCA) was introduced by Darjn Esposito, Davide De Caro, Ettore Napoli, Nicola Petra and Antonio Giuseppe Maria Strollo in 2015. This work 16-bit Speculative HCA [19] is presented. This adder consists of five stages: pre-processing, prefix-processing, error detection and error correction and post-processing. Fig. 5 shows the 16-bit speculative HCA. A. Pre-Processing In the pre-processing stage the propagate P i and generate G i signals are computed as in (1), (2). B. Speculative Prefix-Processing The speculative prefix-processing stage is one of the main difference compared with the standard prefix adders recalled in previous section [1]-[8]. Instead of calculating all the P [i:0] and G [i:0] required in (5) to obtain the exact carry values, only a subset of block propagate and generate signals is calculated; in the post-processing stage estimated carry values are obtained from this subset. The output of speculativeprefix processing stage will also be used in the error detection and in the error correction stages discussed in the following. The fundamental assumption behind speculative prefix-processing stage is that carry signals propagate for no more than K bits, with K < n and K = O(log 2 (n)). Fig. 5. Han-Carlson speculative prefix-processing stage The last Kogge-Stone row of the n = 16 bit graph is reduced, resulting in a speculative prefixprocessing stage with K = 8 = n/2 is shown in Fig. 6. In common, one has K = n/2 P, where P is number of reduced levels. The length of the propagate chains is K = 8 only for i = 9,11,13,15, while for i = 10,12,14 the propagate chain length is K+1 = 9. The computed propagate and generate signals for the speculative Han-Carlson architecture are: (G, P) [i:0] for i K (G,P) [i:i-k+1] for i > K, i odd (8) (G, P) [i:i-k] for i > K, i even C. Post-Processing In the post-processing stage we first of all compute the approximate carries, C * i, and then use them to obtain the approximate sum bits S * i as follows: (9) S i * = P i xor C i-1 * Also to (6), approximate carries are obtained as the generate signals available in the last level of the prefix-processing stage. We have: (10) G [i:0] for i K * C i = G [i:i-k+1] for i > K, i odd D. Error Detection G [i:i-k] for i > K, i even The conditions in which in any case one of the approximate carries is wrong (misprediction) are signaled by the error detection stage. In case of misprediction, an error signal is stated by error detection stage and the output of the post-processing stage is discarded. The error correction stage will give the right sum in the next clock period. ISSN: 2394-2584 http://www.internationaljournalssrg.org Page 3
The error situation for carry C i can be obtained from (6), (10) as: (11) 0 for i K e i = P [i:i-k+1] G [i-k:0] i > K, i odd P [i:i-k] G [i-k-1:0] Thus error signal can be written as: i > K, i even E HC = P [i:i-k+1] G [i-k:i-k-1] (12) Let us consider, as an example, the prefixprocessing stage in Fig. 6. The error signal (12) is given by: E HC = P [9:2] G [1:0] +P [11:4] G [3:2] +.+P [15:8] G [7:6] (13) E. Error correction The error correction stage calculates the exact carry signals (6), to be used in case of misprediction. The error correction stage is composed by the levels of the prefix-processing stage pruned to obtain the speculative adder. The Fig. 6 demonstrates the error correction stage of the speculative HCA. IV. PROPOSED SPECULATIVE HCA USING CONCURRENT SUBTRACTION This work we have proposed 16-bit speculative Han-Carlson Adder (HCA) using Concurrent Subtraction. The operation of speculative HCA [19] which has been described in the previous section. Concurrent subtraction states that the parallel-prefix operation can be done which uses parallel-processing subtraction to reduce the power of speculative HCA for performing binary addition. Therefore, concurrent subtraction is essential in approximation circuits; Hence the sum and carry are computed in parallel to minimize power. A. Pre-Processing Subtraction In the pre-processing subtraction stage maximum size of bit that is single 16-bit one s can be taken which is subtracted from input bits A & B. After that, subtracted result of pre-processing stage can be taken as inputs then pre-processing is performed to compute propagate P i and generate G i signals as in (1), (2). B. Speculative Prefix-Processing In the speculative prefix-processing stage the propagate P i and generate G i signals are computed as in (8). C. Post-Processing In the post-processing stage sum is computed as in (9) and the 16 th bit is carry. 1) Post-Processing Subtraction In the post-processing subtraction stage maximum size of bit that is two 16-bit one s can be added. After that the result can be subtracted from post-processing sum and carry. D. Error Detection In the error detection stage error signal is computed as in (13). Fig. 6. Error correction and detection stages for the 16-bit speculative Han-Carlson Adder of Fig. 5. F. Post-Processing The approximated carries are already obtainable at the output of the prefix-processing stage. The postprocessing, according to (7), is equal to the nonspeculative HCA and consists of n xor gates. E. Error Correction The error correction stage calculates the correct carry signals (6), to be used in case of misprediction. The error correction stage is composed by the levels of the prefix-processing stage the propagate P i and generate G i signals as in (3), (4). F. Post-Processing The estimated carries are already obtainable at the output of the prefix-processing stage. The postprocessing, according to (7), is equal to the nonspeculative HCA and consists of n xor gates. ISSN: 2394-2584 http://www.internationaljournalssrg.org Page 4
V. SIMULATION RESULTS The simulation results for the comparisons of 16- bit speculative HCA and speculative HCA using concurrent subtraction were simulated using ModelSim SE 6.3f (Fig. 7 to Fig. 10) and Power (Fig. 11 and Fig. 12) is analyzed using Xilinx 8.1i as shown in Table I. Fig. 10. Simulation result of 16-bit speculative Han-Carlson Adder using concurrent subtraction with error Fig. 7. Simulation result of 16-bit speculative Han-Carlson Adder without error Fig. 8. Simulation result of 16-bit speculative Han-Carlson Adder with error Fig. 11. Power summary of 16-bit speculative Han-Carlson Adder Fig. 9. Simulation result of 16-bit speculative Han-Carlson Adder using concurrent subtraction without error Fig. 12. Power summary of 16-bit speculative Han-Carlson Adder using concurrent subtraction TABLE I. COMPARISON OF SPECULATIVE HCA AND SPECULATIVE HCA USING CONCURRENT SUBTRACTION ISSN: 2394-2584 http://www.internationaljournalssrg.org Page 5
16-bit HCA Adder speculative 16-bit speculative HCA using concurrent subtraction Dynamic Power (mw) 162.23 264 57.24 159 VI. CONCLUSION Total Power (mw) In this paper comparison of power values of 16- bit speculative Han-Carlson Adder and speculative Han-Carlson Adder using concurrent subtraction were analyzed using Xilinx ISE 8.1i Tool. The main advantage of this proposed speculative HCA using concurrent subtraction is that reduction in power. VII. FUTURE WORK Future work includes speculative HCA can be designed with new technique to minimize more power for performing binary addition. References [1] I. Koren, Computer Arithmetic Algorithms. Natick, MA, USA: A K Peters, 2002. [2] R. Zimmermann, Binary adder architectures for cell-based VLSI and their synthesis, Ph.D. thesis, Swiss Federal Institute of Technology, (ETH) Zurich, Zurich, Switzerland, 1998, Hartung-Gorre Verlag. [3] R. P. Brent and H. T. Kung, A regular layout for parallel adders, IEEE Trans. Comput., vol. C-31, no. 3, pp. 260 264, Mar. 1982. [4] P. M. Kogge and H. S. Stone, A parallel algorithm for the efficient solution of a general class of recurrence equations, IEEE Trans. Comput., vol. C-22, no. 8, pp. 786 793, Aug. 1973. [5] J. Sklansky, Conditional-sum addition logic, IRE Trans. Electron. Comput., vol. EC-9, pp. 226 231, Jun. 1960. [6] T. Han and D. A. Carlson, Fast area-efficient VLSI adders, in Proc. IEEE 8th Symp. Comput. Arith. (ARITH), May 18 21, 1987, pp. 49 56. [7] R. E. Ladner and M. J. Fischer, Parallel prefix computation, J. ACM, vol. 27, no. 4, pp. 831 838, Oct. 1980. [8] S. Knowles, A Family of Adders, in Proc. 14th IEEE Symp. Comput. Arith., Vail, CO, USA, Jun. 2001, pp. 277 281. [9] S.-L. Lu, Speeding up processing with approximation circuits, Computer, vol. 37, no. 3, pp. 67 73, Mar. 2004. [10] T. Liu and S.-L. Lu, Performance improvement with circuitlevel speculation, in Proc. 33rd Annu. IEEE/ACM Int. Symp. Microarchit. (MICRO-33), 2000, pp. 348 355. [11] N. Zhu, W.-L. Goh, and K.-S. Yeo, An enhanced low-power high speed Adder For Error-Tolerant application, in Proc. 2009 12th Int.Symp. Integr. Circuits (ISIC '09), Dec.14 16, 2009, pp. 69 72. [12] S. M. Nowick, Design of a low-latency asynchronous adder using speculative completion, IEE Proc. Comput. Digit. Tech., vol. 143, no.5, pp. 301 307, Sep. 1996. [13] A. K. Verma, P. Brisk, and P. Ienne, Variable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit Design, in Proc. Design, Autom., Test Eur. (DATE '08), Mar. 2008, pp. 1250 1255. [14] A. Cilardo, A new speculative addition architecture suitable for two's complement operations, in Proc. Design, Autom., Test Eur. Conf. Exhib. (DATE '09), Apr. 2009, pp. 664 669. [15] K. Du, P. Varman, and K. Mohanram, High performance reliable variable latency carry select addition, in Proc. Design, Autom., Test Eur. Conf. Exhib. (DATE '12), Mar. 2012, pp. 1257 1262. [16] S. K. Mathew, R. K. Krishnamurthy, M. A. Anders, R. Rios, K. R.Mistry, and K. Soumyanath, Sub-500-ps 64-b ALUs in 0.18- m SOI/ bulk CMOS: Design and scaling trends, IEEE J. Solid-State Circuits, vol. 36, no. 11, pp. 1636 1646, Nov. 2001. [17] B. Parhami, Computer Arithmetic: Algorithms and Hardware Design. New York: Oxford Univ. Press, 2000. [18] A. Tyagi, A reduced-area scheme for carry-select adders, IEEE Trans. Comput., vol. 42, no. 10, pp. 1163 1170, Oct. 1993. [19] Darjn Esposito, Davide De Caro, Ettore Napoli, Nicola Petra, and Antonio Giuseppe Maria Strollo, Variable Latency Speculative Han-Carlson Adder, IEEE Trans. Circuits and Systems, vol. 62, no. 5, pp. 1353 1361, May. 2015. ISSN: 2394-2584 http://www.internationaljournalssrg.org Page 6