An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor

Size: px
Start display at page:

Download "An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor"

Transcription

1 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor HADI PARANDEH-AFSHAR, PHILIP BRISK, and PAOLO IENNE Ecole Polytechnique Federale de Lausanne (EPFL) To improve FPGA performance for arithmetic circuits that are dominated by multi-input addition operations, an FPGA logic block is proposed that can be configured as a 6:2 or 7:2 compressor. Compressors have been used successfully in the past to realize parallel multipliers in VLSI technology; however, the peculiar structure of FPGA logic blocks, coupled with the high cost of the routing network relative to ASIC technology, renders compressors ineffective when mapped onto the general logic of an FPGA. On the other hand, current FPGA logic cells have already been enhanced with carry chains to improve arithmetic functionality, for example, to realize fast ternary carry-propagate addition. The contribution of this article is a new FPGA logic cell that is specialized to help realize efficient compressor trees on FPGAs. The new FPGA logic cell has two variants that can respectively be configured as a 6:2 or a 7:2 compressor using additional carry chains that, coupled with lookup tables, provide the necessary functionality. Experiments show that the use of these modified logic cells significantly reduces the delay of compressor trees synthesized on FPGAs compared to state-of-the-art synthesis techniques, with a moderate increase in area and power consumption. Categories and Subject Descriptors: B.7.1 [Integrated Circuits]: Types and Design Styles Gate Arrays; G.1.0 [Numerical Analysis]: General Computer Arithmetic General Terms: Algorithms, Performance Additional Key Words and Phrases: FPGA, carry chain, compressor tree, 6:2 compressor, 7:2 compressor ACM Reference Format: Parandeh-Afshar, H., Brisk, P., and Ienne, P An FPGA logic cell and carry chain configurable as a 6:2 or 7:2 compressor. ACM Trans. Reconfig. Techn. Syst. 2, 3, Article 19 (September 2009), 42 pages. DOI = / P. Brisk is currently affiliated with the Department of Computer Science and Engineering in the Bourns College of Engineering at the University of California, Riverside. Author s address: H. Parandeh-Afshar, hadi.parandehafshar@epfl.ch; P. Brisk, Philip.brisk@gmail.com; P. Ienne, paolo.ienne@epfl.ch. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY USA, fax +1 (212) , or permissions@acm.org. c 2009 ACM /2009/09-ART19 $10.00 DOI: /

2 19: 2 H. Parandeh-Afshar et al. 1. INTRODUCTION Due to their inherent reconfigurability, FPGAs are one feasible hardware platform for low-volume markets, where vendors cannot justify the design, testing, and verification costs of an ASIC. Although an FPGA implementation of a circuit will outperform traditional software, a noticeable performance gap between FPGAs and ASICs remains [Kuon and Rose 2007]. One important area that is ripe for improvement is arithmetic dominated circuits; in particular, due to the peculiar logic cell structure and carry chains in modern FPGAs, addition and multiplication-dominated circuits cannot take advantage of the carry-save representation. One of the fundamental results in computer arithmetic is that addition scales well when the number of inputs increases beyond 2; this was first observed by Wallace [1964] in the context of parallel multiplier design. The key is not to use trees of traditional carry-propagate adders, that is, circuits that produce the sum of two (signed) binary integers; instead, the integers are aggregated together using a circuit called a compressor tree. Numerous methods for compressor tree generation have been published since their introduction in the early 1960s [Wallace 1964; Dadda 1965; Swartzlander 1973; Stenzel et al. 1977; Weinberger 1981; Santoro and Horowitz 1988; Song and De Micheli 1991; Fadavi-Arkedani 1993; Oklobdzija and Villeger 1995; Stelling and Oklobdzija 1996; Stelling et al. 1998; Kwon et al. 2002; Um and Kim 2002; Mora Mora et al. 2006; Verma and Ienne 2007a], mostly in the context of parallel multiplication; more generally, these circuits can also sum k > 2 integers. The architecture of modern FPGAs is generally not well suited to compressor trees. The logic clusters of the Altera Stratix II-IV and Xilinx Virtex-5 FPGAs can be configured to implement ternary (3-input) addition using fast carry chains [Cherepacha and Lewis 1996; Hauck et al. 2000; Frederick and Somani 2006]. The primary advantage of the carry chains is that the carry bits are propagated directly from one cell to its adjacent neighbor, thereby avoiding the overhead of the routing network. This design point favors the use of ternary adder trees rather than compressor trees. Parandeh-Afshar et al. [2008b, 2008c] showed that compressor trees can be synthesized on FPGAs using a circuit called a Generalized Parallel Counter (GPC) [Stenzel et al. 1977]. This GPC Mapping approach yields compressor trees whose delay is significantly lower than ternary adder trees, despite the latter s use of the carry chains; however, there is some noticeable increase in the number of logic cells required. This article, an extension of prior work by Parandeh-Afshar et al. [2008a], introduces and evaluates a new logic cell, based on the Altera Adaptive Logic Module (ALM), that has an additional carry chain, which allows it to be configured as a 6:2 or 7:2 compressor; this compressor belongs to a well-known class of circuits that have been used for successful synthesis of ASIC multipliers in the past [Weinberger 1981; Song and De Micheli 1991; Oklobdzija and Villeger 1995]. By combining the strengths of the GPC mapping with the use of 6:2 or 7:2 compressors, when possible, faster compressor trees can be realized on the FPGA. Additionally, we compare the power consumption of compressor trees

3 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 3 Fig. 1. (a) A ripple-carry adder; (b) a carry-save adder. mapped onto the proposed logic cells with compressor trees synthesized using ternary adder trees and GPC mapping. The article is organized as follows. Section 2 begins by introducing a collection of arithmetic primitives (counters, compressors, compressor trees) that are required to understand the remaining sections of the paper. Section 3 summarizes related work in the field of FPGA architecture and mapping, focusing specifically on features designed for enhanced arithmetic performance. Section 4 presents the new logic cell, and Section 5 describes the approach that we used to map circuits onto FPGAs containing the new cell. Our experimental platform, methodology, and results are presented in Sections 6 8. Section 9 concludes the article. 2. ARITHMETIC AND FPGA PRIMITIVES 2.1 Full and Half Adders At the bit level, a half-adder (HA) is a 2-input, 2-output circuit that computes the sum of two bits and outputs the result as an unsigned binary integer. A full-adder (FA) computes a similar sum for 3 input bits. The lower-order output bit is called a sum, and the higher-order output bit is called a carry. Inthecase of an FA, one of the inputs is called a carry-in bit and the high-order output is called a carry-out. Many arithmetic circuits, including adders and multipliers are comprised primarily of HAs and FAs. 2.2 Ripple-Carry and Carry-Save Adders A Carry Propagate Adder (CPA) is a circuit that adds two binary integers; if the integers are signed, two s complement form is assumed. Numerous architectures for carry-propagate adders have been proposed in the past. In modern CMOS technologies, significant differences in critical path delay among the different adder architectures generally do not manifest themselves for small bitwidths, that is, 8-bits or less. The most straightforward CPA architecture is the Ripple-Carry Adder (RCA), which generally has the smallest area but highest delay compared to the alternatives. Figure 1(a) shows a 4-bit RCA constructed from FA cells; the carry-in of the least significant FA is 0, so an HA can be used instead of an FA. As shown in Figure 1(a), an RCA is a 1-dimensional array of FAs, where the carry-out of each FA is connected directly to the carry-in of the next; thus, the worst-case critical path delay is through all of the FAs in the design. If an RCA adds two k-bit numbers, the complexity of the critical path delay is O(k). Many faster, but larger, alternative adders have been designed, most with a critical path delay of O(log k).

4 19: 4 H. Parandeh-Afshar et al. Fig. 2. Two implementations of a 4-bit ternary adder using (a) an adder tree, i.e., two RCAs; and (b) a compressor tree, i.e., a CSA followed by an RCA. The compressor tree implementation eliminates the delay of one half adder (HA) from the critical path. A Carry-Save Adder (CSA), shown in Figure 1(b), breaks the carry chain; in fact, it is a 1-dimensional array of disconnected FAs. CSAs are generally used in conjunction with CPAs in order to perform efficient n-input addition for n > Adder and Compressor Trees Suppose that we want to compute the sum of n > 2 binary integers. One approach is to use an Adder Tree, that is, a tree of CPAs; the alternative is to build a tree of carry-save adders instead, only using a CPA at the end. Figure 2 shows an example where three four-bit binary integers are added. In Figure 2(a), two RCAs are used; in Figure 2(b), a CSA is followed by an RCA. Let d FA and d HA are the respective delays of full and half adders. The critical path delay of the circuit in Figure 2(a) is 4d FA +2d HA, while the critical path delay of the circuit in Figure 2(b) is 3d FA +2d HA, an overall savings of d FA compared to Figure 2(a). This savings occurs because the use of the CSA instead of the RCA permits the elimination of one bit from the RCA in Figure 2(b). The idea of using carry-save addition for fast accumulation dates back to the work of Wallace [1964] and Dadda [1965] who designed fast parallel multipliers; however, the fundamental ideas generalize quite elegantly to multiinput addition as well. Formally, let A 1, A 2,..., A n be a set of binary integers to sum. A Compressor Tree is a circuit that produces two values, sum (S) and carry (C), such that: n S+ C = A i. (1) A CPA then performs the final addition, S+ C. i=1

5 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 5 Fig. 3. Illustration of the critical path delay through a compressor tree of a multiplier, including that of the final CPA. The critical path typically includes the j most significant bits of the final CPA; the portion of the final CPA that computes the m j least significant bits can be optimized for area rather than for speed, as long as it does not become critical. Wallace and Dadda trees are two specific compressor tree architectures; many others have also been proposed [Swartzlander 1973; Stenzel et al. 1977; Weinberger 1981; Santoro and Horowitz 1988; Song and De Micheli 1991; Fadavi-Arkedani 1993; Stelling and Oklobdzija 1996; Stelling et al. 1998; Kwon et al. 2002; Um and Kim 2002; Mora Mora et al. 2006; Verma and Ienne 2007a]. The superiority of compressor trees over adder trees is one of the most fundamental results of digital arithmetic. Intuitively, it may seem that this is because an adder tree pays the penalty of a carry chain at each level; this is, however, a fallacy, as illustrated by Figure 2 in the preceding discussion. In actuality, the benefit of compressor trees arises from their ability to reduce the bitwidth of the final CPA in the case of multiinput addition. Parallel multipliers in ASIC technology, however, are more complicated. In multiinput addition, the number of bits to sum at each position is the same. This is not true in the case of parallel multiplication: after a partial product generation or Booth encoding stage, the number of bits to sum tends to be greater among the bit positions in the middle. As illustrated conceptually by Figure 3, the lower-order bits of the final CPA are generally not on the critical path, as the bits that arrive at these positions go through fewer layers of logic within the compressor tree. In other words, the arrival time of the bits at the final CPA is nonuniform, unlike the case of multiinput addition. Based on this observation, Oklobdzija and Villeger [1995] argued that the final CPA of a multiplier should be implemented as a hybrid adder, which uses a small and slow CPA, such as an RCA, for the low-order bits, and a faster adder, such as a carry-select adder for the higher-order bits. Carry-select adders are particularly useful when the arrival time of bits is nonuniform. Carry-select adders can start to add the bits as soon as they arrive. RCAs, in contrast, cannot, as the output bit at position i depends on the carry-out bit computed at position i 1. That being said, carry-select adders can be constructed from smaller-bitwidth RCAs as building blocks. The work summarized in this section targets ASIC design methodologies; FPGAs, in contrast, possess fast carry chains, whose usage often dictates the types of adders that perform well on specific device families.

6 19: 6 H. Parandeh-Afshar et al. 2.4 Parallel Counters An m:n parallel counter (or single-column counter) is a circuit that takes m input bits, counts the number of input bits that are set to one, and outputs the value as an n-bit binary unsigned integer. The output range is [0, m], so the number of output bits is: n = log 2 (m +1). (2) In the context of compressor trees, HAs and FAs are 2:2 and 3:2 counters respectively. Verma and Ienne [2007a], for example, described an integer linear programming formulation for compressor tree design that uses a library of m:n counters, for 2 m 8. Let B = b k 1 b k 2...b 0 be a k-bit unsigned binary integer, where b k 1 is the most significant bit, and b 0 is the least significant bit. Each bit b r contributes atotalvalueofb r 2 r to the total value of B, i.e., b r contributes 2 r if it is set, and 0 otherwise. In this context, r is called the rank of b r. When an m:n counter is used to synthesize a compressor tree, all of its inputs have the same rank. A Generalized Parallel Counter (GPC) is an extension of an m:n counter that can sum bits of multiple ranks [Stenzel et al. 1977]. For example, a (2, 3; 3) GPC can sum up to 2 bits of rank 1 and 3 bits of rank 0; the maximum output value is = 7, so3 output bits are required. The general form of a GPC is (k t 1, k t 2,..., k 0 ; s), where k r is the maximum numberofbitsofrankr that can be summed, and s is the number of output bits. Similar to an m:n counter, a GPC must satisfy the following property: ( ) t 1 s = log 2 1+ k r 2 r. (3) In fact, a sufficiently large m:n counter can implement a GPC (although many other implementations also exist). Each GPC input bit of rank r is connected to 2 r inputs of the m:n counter; any unused input bits of the m:n counter are then driven to 0. GPCs map efficiently onto FPGAs [Parandeh-Afshar et al. 2008b, 2008c]. Specifically, if the FPGA has k-input LUTs, then k-input GPCs can be mapped onto the LUTs (one LUT is used per GPC output bit) using one logic level. 2.5 Compressors Compressors (not to be confused with compressor trees) are arithmetic components, similar in principle to parallel counters, but with two distinct differences: (1) they have explicit carry-in and carry-out bits; and (2) there may be some redundancy among the ranks of the sum and carry-output bits. The 4:2 compressor (also called a 4:2 CSA), illustrated in Figure 4, was introduced by Weinberger [1981]; at first sight, this name may appear to be somewhat of a misnomer: although it has 4 input bits and produces 2 sum output bits (out 0 and out 1 ), it also has a carry-in (c in ) and a carry-out (c out ) bit (thus, the total number of input/output bits are 5 and 3); however, it is not the same circuit as a 5:3 compressor. All input bits, including c in,haverank0; thetwo output bits have ranks 0 and 1 respectively, while c out has rank 1 as well. Thus, r=0

7 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 7 Fig. 4. (a) 4:2 compressor I/O diagram; (b) 4:2 compressor architecture; (c) 4-ary adder built from an array of 4:2 compressors followed by an RCA; (d) illustration of the interconnect between consecutive 4:2 compressors: although the array has the appearance of an RCA in Figure 4(c), the carry chain only goes through two compressors. the output of the 4:2 compressor is a redundant number; for example, out 1 = 0 and c out = 1 is equivalent to out 1 = 1 and c out = 0 in all cases. When k4:2compressors are connected in a carry chain, a total of 4k input bits are compressed down to 2k output bits plus one additional carry-out bit; the carry-in bit of the first compressor is set to 0. The primary difference between compressors and counters are the presence of carry bits in the former; it is also important to recognize that a compressor tree can be constructed from compressors, counters, or both. Figure 4(a) shows the inputs and outputs of the 4:2 compressor labeled with their ranks; Figure 4(b) shows one 4:2 compressor architecture, which is constructed using two 3:2 counters. Figure 4(c) shows a 4-bit 4-input adder, consisting of four 4:2 compressors in a 1-dimensional array followed by a four-bit RCA. At first glance, the array of 4:2 compressors appears to have the same structure as an RCA, as the c out bit of each 4:2 compressor is connected to the c in bit of the subsequent one; however, this is not actually the case, as shown in

8 19: 8 H. Parandeh-Afshar et al. Fig. 5. (a)/(b) 6:2/7:2 compressor I/O diagram; (c)/(d) 6:2/7:2 compressor architecture; (e) illustration of the interconnection pattern between consecutive 6:2 compressors (it is the same for 7:2 compressors). Figure 4(d); the fact that there is no direct path from a carry-in to a carry-out prevents the formation of a ripple-carry structure. The new FPGA logic cell described in this paper has two variants that can respectively be configured as a 6:2 or a 7:2 compressor, which generalize the 4:2 compressor cell whose use is shown in Figure 4. Figure 5(a) and (b) show the basic I/O structure of the 6:2 and 7:2 compressors. Figure 5(c) and (d) show the circuit-level architecture; the only difference is that a 2:2 counter in the 6:2 compressor is upgraded to an 3:2 counter in the 7:2 compressor, and the 7 th input is connected to one of the inputs the aforementioned 3:2 counter.

9 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 9 Fig. 6. (a) Covering a set of columns with 6:3 counters yields 3 bits per column in the output; (b) using 6:2 compressors reduces the number of bits per column to 2. Contiguous columns covered with 6:3 counters can be converted to 6:2 compressors. Figure 5(e) shows the interconnect structure. Consider the i th compressor in sequence. The rank1 carry output bit (c out,0 ) connects to carry-input c in,0 of the (i + 1) st compressor; also, the rank 2 carry output bit (c out,1 ) connects to carry-input c in,1 of the (i + 2) nd compressor. From Figure 5(c) and (d), we can see that there is no direct path from either of the carry-in bits of the 6:2 or 7:2 compressor to one of the carry-out bits; similar in principle to Figure 4(d), this prevents the formation of a ripple-carry chain between compressors. 2.6 Compression Ratio Let I and O be the number of inputs and outputs produced by a counter, GPC, or compressor; for compressors, I and O do not include the carry-in and carryout bits. The compression ratio (CR) is defined as CR = I/O. For example, a 6-input, 3-output GPC has CR=6/3=2, while a 6:2 compressor has CR = 6/2 = 3. TheCR tendstobehigherforcompressors than counters. Figure 6(a) shows compression using 6:3 counters; which produce three output bits per column, while 6:2 compressors, shown in Figure 6(b), produce two output bits per column; the other output bits are propagated down the carry chain. 3. RELATED WORK 3.1 Compressor Tree Synthesis in ASIC Technology Compressor trees for partial product accumulation were introduced by Wallace [1964] and Dadda [1965], who built them from CSAs; HAs were used at points where only 2 bits in the same column need to be compressed. Fadavi-Ardekani [1993] recognized that the bits produced by a compressor tree may arrive at different times at the final adder, and designed a specific adder for this purpose; however, this work assumed that all partial product bits arrive to the compressor tree at the same time. Stelling et al. [1996, 1998] relaxed this

10 19: 10 H. Parandeh-Afshar et al. assumption, and developed appropriate techniques to build the compressor tree and designed the final adder appropriately. Due to the importance of wire delays in deep submicron technology, Um and Kim [2002] proposed a two-phase layout-aware compressor tree synthesis technique that strives for a much more regular interconnect topology than the compressor trees produced by the 3-greedy algorithm of Stelling et al. [1998]. Verma and Ienne [2007a] developed an integer linear program (ILP) that could optimally synthesize compressor trees from a library of m:n counters. To bound the runtime of the synthesis procedure, they limited m to the range [2, 8]. Previously,m:n counters, like compressor trees, were built from CSAs, or libraries of smaller m:n counters. Through efficient logic synthesis techniques for arithmetic circuits [Verma and Ienne 2007b], they found that better m:n counters could be constructed from basic gates, rather than smaller counters. The availability of a library of highly optimized counters was important to the success of their ILP formulation; another contributing factor was that the ILP could optimize for the delay profile of any final adder. GPCs have also been used in the past to build efficient compressor trees for parallel multipliers [Stenzel et al. 1977]. Mora Mora et al. [2006] described a multiplier generation approach for ASICs that implemented GPCs using ROMs, with the restriction that all input columns to the GPC have the same number of bits. The 4:2 compressor [Weinberger 1981], was subsequently used by Santoro and Horowitz [1988] in a parallel multiplier. Over the years, various researchers have proposed the use of larger compressors and counters as well, including Kwon et al. [2002] (5:2, 5:3) and Song and De Micheli [1991] (9:2, 27:5). A column is a set of bits having the same rank, r, at some level in a compressor tree; all of the inputs to a FA or an HA in a compressor tree belong to the same column. The FA or HA produces two output bits, one of rank r, oneof rank r + 1. The delay through the FA or HA to the rank r output is called the vertical propagation delay, as the delay is confined to one column; the delay of the rank r + 1 output is called the horizontal propagation delay, as it passes from one column to the next. The use of compressors in favor of counters shifts some of the vertical propagation delay into horizontal propagation delay. Thus, the critical path through a compressor tree travels in both the horizontal and vertical direction before arriving at the final CPA. The compressor cells can be designed in order to minimize the difference between horizontal and vertical delays. Interestingly enough, a CPA actually has a higher compression ratio than an m:n counter, a GPC, or a compressor. To take advantage of this fact, Oklobdzija and Villeger [1995] advocate the inclusion of CPAs within compressor trees: the vertical propagation delay will dominate; however, at places where the horizontal propagation delay is noncritical, the use of internal CPAs within the compressor tree maximizes the compression ratio. This technique has some notable ramifications for FPGAs: due to the presence of carry chains within logic clusters (see Sections 3.3 and 3.4), horizontal propagation is naturally faster than vertical propagation, which must use the FPGA routing

11 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 11 network. This differentiates compressor tree synthesis on FPGAs from the same problem in VLSI. The challenge is that we cannot take advantage of the fast horizontal propagation on the critical paths of the compressor tree without resorting to CPAs. To address this concern, we design and evaluate new logic blocks that can be configured as 6:2 and 7:2 compressors. These logic blocks have a higher compression ratio than m:n counters and GPCs, and employ new carry chains that can exploit fast horizontal propagation. 3.2 FPGA Architecture This section describes a number of proposals to improve the arithmetic and logical capabilities of FPGA logic cells. The most enduring idea has been the integration of carry chains into FPGA logic cells along with LUTs. Carry chains include fast connections between adjacent logic cells that are used for carry propagation; this permits the elimination of most of the routing delays that would otherwise be present. The Altera Stratix II-IV Adaptive Logic Module (ALM) employs a carry chain based on ripple-carry adders (RCAs). The new logic cells proposed in this work features a new type of carry chain intended to allow a logic cell, such as the ALM, to be configured as a 6:2 or 7:2 compressor; the ALM will be described in greater detail in Section 3.3. The carry chains used in the configurable logic blocks (CLBs) of the Xilinx Virtex-4/5 include programmable multiplexors and xor gates to send propagate and generate signals to adjacent CLBs to enable parallel-prefix style addition [Parhami 1999]. Hauck et al. [2000] proposed more complicated carry chains that can implement Brent-Kung, carry-select, andcarry-lookahead addition. Different logical constructs were needed for different cells in the chain, making them nonuniform. This creates integration challenges because it is difficult to lay out a regular fabric consisting of irregular cells. This would require a large manual effort to design each individual cell at the transistor level, and would complicate the layout process for the entire chip. Frederick and Somani [2006] proposed a uniform logic block with carry chains that could efficiently implement a carry-skip adder; a similar bidirectional carry-skip chain was earlier proposed by Cherepacha and Lewis [1996, Figure 6]. Kaviani et al. [1998] and Leijten-Nowak and Van Meerbergen [2003] developed ALU-like blocks that support arithmetic functions such as addition, subtraction and (partial) multiplication. Distributed Arithmetic (DA) [Mirzaei et al. 2006] is a paradigm for implementing effective hardware for DSP systems that uses LUTs instead of multipliers. Grover et al. [2002] developed a special DA-oriented LUT structure (DALUT) specifically for multiply-accumulate (MAC) operations. In addition to two 4-input LUTs, their DALUT cell included arrays of xor gates, bit-level adders and shift accumulators, shift registers, and a CPA to add partial summations and carries. Brisk et al. [2007] reported that DSP/MAC blocks are not good candidates for implementing multioperand addition. The logic cell described here is intended to address this shortcoming.

12 19: 12 H. Parandeh-Afshar et al. Most FPGAs are hybrid-reconfigurable, as they embed ASIC components such as multipliers, more complex DSP blocks, and standard I/O interfaces into a reconfigurable fabric Zuchowski et al. [2002]. Kastner et al. [2002] developed techniques for a compiler to examine a set of applications to identify good candidates for these embedded cores. Their analysis, however, was limited to 2-operation combinations of addition and multiplication, and they did not use compressor trees for multioperand addition. A K-input macro gate [Cong and Huang 2005] is similar to a LUT, but it cannot implement all 2 K logic functions, and therefore has reduced delay and area. Hu et al. [2007] suggested that FPGA cells could benefit from the inclusion of both LUTs and macro gates. Similar to Kastner et al., they developed an automated method to profile a set of applications to find good macro-gate candidates. They did not, however, consider arithmetic-dominated functions or fast carry chains between macro gates. The Field Programmable Counter Array (FPCA) [Brisk et al. 2007; Cevrero et al. 2008] is a programmable IP used to accelerate multi-input addition in FPGAs. The FPCA is similar to an FPGA, but replaces LUTs with m:n counters instead. In a hybrid FPGA/FPCA, a compressor tree is mapped onto the FPCA, while all other operations are mapped onto the FPGA. As suggested by Kuon and Rose [2007], the cost of routing data to and from the FPCA may limit its performance benefit. The new FPGA cell proposed here is much less ambitious, and exploits carry chains rather than logical structures for effective local routing; furthermore, the I/O interface to the logic cell does not change. 3.3 The Altera Stratix II-IV Adaptive Logic Module (ALM) This new logic cell proposed in this article is a modified version of the Adaptive Logic Module (ALM) employed the Altera Stratix II-IV series of FPGAs. Each ALM contains an Adaptive LUT (ALUT). An ALUT is comprised of two sixinput LUTs (6-LUTs) with four shared inputs and shared configuration bits; in other words, they must implement the same logic function. Additionally, the ALM contains a carry chain that performs efficient ripple carry addition, and bypassable flip-flops that facilitate either combinational or sequential circuits. The two 6-LUTs are also fracturable, meaning that each can be decomposed into two or more smaller LUTs. The ALM also includes a 7 th input bit, but can only implement a selected set of 7-input functions. The ALM has four operating modes, two of which use the carry chains. In Arithmetic Mode, each 6-LUT is decomposed into two independent 4-LUTs, which perform a small amount of pre-adder logic, followed by the carry chains. Arithmetic mode implements effective adders, (sequential) counters, accumulators, parity functions, and comparators. In Shared Arithmetic Mode, the ALM is configured as a 2-bit ternary adder. The fracturable LUTs are configured as a carry-save adder (CSA), that is, a 3:2 compressor, and the carry chain functions as the final adder. Shared arithmetic mode was designed to efficiently implement soft multipliers (as opposed to using DSP blocks) and correlators.

13 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 13 Fig. 7. Mode. The Altera Stratix II/III Adaptive Logic Module (ALM) shown in Shared Arithmetic Figure 7 illustrates the ALM configured in shared arithmetic mode. It is important to note that the 6-LUTs in the ALM are decomposed into smaller LUTs of 3- and4-inputs; only the smaller LUTs are shown in Figure 7. The modification to the ALM proposed in this article is similar to shared arithmetic mode, but implements a 6:2 or 7:2 compressor. Similar to shared arithmetic mode, the fracturable LUTs are configured as a CSA; but the interconnection of FAs in the carry chain differs from the ripple-carry chain. We chose to provide a second carry chain in addition to the ripple-carry chain; in principle, both carry chains could be merged, but this would introduce multiplexers into the ripple-carry chain. We opted for the second carry chain in order to achieve better performance. 3.4 Synthesizing Compressor Trees on FPGAs The compressor tree synthesis techniques summarized in Section 3.1 are intended for ASIC design flows. Due to the specific logic and routing architectures of modern high performance FPGAs, these techniques are not likely to yield favorable results if used in a synthesis flow targeting an FPGA. Since the primary role of carry chains has been to facilitate efficient carry-propagate addition, conventional wisdom held that adder trees would yield better results than compressor trees synthesized on an FPGA. This is not necessarily true. Poldre and Tammemae [1999] synthesized 4:2 compressors onto the four input LUTs of the Xilinx Virtex FPGAs, exploiting the carry chains to propagate the carry-in/carry-out bits. Parandeh-Afshar et al. [2008b, 2008c] developed a general compressor tree synthesis method that mapped GPCs with 6 inputs and 3 or 4 outputs onto FPGA logic cells built from 6-LUTs. Limiting the number of GPC inputs to 6 ensuresthatatmostonelayerofluts is required to implement each GPC. On an Altera Stratix II, the delay of a compressor tree built from GPCs was 27% faster than that that of an

14 19: 14 H. Parandeh-Afshar et al. Fig. 8. (a) (0, 6; 3) and (b) (2, 3; 3) GPCs mapped onto two ALMs using shared arithmetic mode. adder tree. The GPC mapping, however, increased the ALM count by 47%, on average. In principle, a 6-input, k-output GPC could be synthesized on k6-luts, where each 6-LUT computes a single output bit. As the two 6-LUTs in a Stratix II-IV ALM must implement the same function, this would require k ALMs, where only one of the two 6-LUTs available in each ALM is used. Parandeh- Afshar et al. [2009], however, proposed a more efficient mapping that uses LUTs in conjunction with carry chains, reducing the number of ALMs required to k/2. In many cases, it is possible to map these components onto ALMs using either arithmetic or shared arithmetic mode. Figures 8 and 9, for example, shows three 6-input, 3-output GPCs mapped onto two ALMs using shared arithmetic mode. In fact, these are the only three

15 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 15 Fig. 9. (a) a (1, 5; 3) GPC implemented using full and half adders, and (b) mapped onto two ALMs using shared arithmetic mode. The internal signals S0, C0, S1, C1, and C2 in (a) are computed by LUTs in (b). Signals C2 and D are never 1 at the same time, so the carry output of the adder that produces output bit z2 is always 0. 6-input, 3-output GPCs that will be used by the GPC mapping heuristic, which is described in Section 5.3. Specifically, these are the only 6-input, 3-output covering GPCs; the definition of a covering GPC will be formalized in Section 5.1. The GPC mapping heuristic only employs covering GPCs; all other GPCs are either redundant or unreasonable, for reasons that will be discussed in Section NEW FPGA LOGIC CELL AND CARRY CHAIN Figure 10(a) shows our proposed new FPGA logic cell, which is presented as an extension of the ALM used in Altera s Stratix II-IV line of high-performance FPGAs. The components required for shared arithmetic mode are also shown in this figure. The left-hand side of Figure 10(a) shows four 3-LUTs, which are part of Altera s fracturable 6-LUT architecture. The carry chain on the right-hand-side is the traditional carry chain that is used to implement ternary

16 19: 16 H. Parandeh-Afshar et al. Fig. 10. (a) Enhanced version of the Shared Arithmetic Mode of the Altera ALM; a new carry chain, shown in gray, allows the ALM to be configured as a 6:2 or 7:2 compressor. Two additional multiplexers are required to select between the two sum outputs of the 6:2 compressor and ternary adder (already present in the ALM); (b) pattern of carry-propagation for the 6:2 and 7:2 compressor. addition, using the four 3-LUTs configured as a carry-save adder. The novel features of the new logic cell are the carry chain in the center (gray background), which can implement a 6:2 or 7:2 compressor, and the two multiplexers shown in gray on the right-hand side of Figure 10(a), which selects between the outputs of the two carry chains. Similar to ternary addition, the new carry chain requires the four 3-LUTs to be configured as a carry-save adder. To implement a 7:2 compressor, three FAs (and a 7 th LUT input) are required; to implement a 6:2 compressor, one of the FAs (outlined with a dashed line) becomes a half (two-input) adder, and the 7 th input bit is not used.

17 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 17 Three carry-in/carry-out bits are also required; they are labeled X, Y, and Z in Figure 10(a). The carry-out labeled X /Y/Z connects to the corresponding carry-in labeled X /Y/Z of the next compressor in the chain. A detailed picture of the carry chains across several logic cells is shown in Figure 10(b). In principle, the FAs used in the two carry chains could be shared; this design choice was illustrated by Parandeh-Afshar et al. [2008a, Figure 5(a)]; although doing this could slightly reduce area, it requires that multiplexers be inserted into the carry chain, significantly increasing the critical path delay; as our goal is to increase performance, this design point is nonideal, especially since the area of the multiplexers offsets the area savings from sharing FAs. There are two primary advantages of providing an FPGA logic cell that can be configured as a compressor compared to synthesizing GPCs on LUTs. The first advantage, which was illustrated in Figure 6, is that a k:2 compressor will have a higher compression ratio than a k input GPC. In some but certainly not all cases, this can reduce the number of levels of logic in the compressor tree. The second advantage involves area utilization. Each ALM contains two 6-LUTs with dependent inputs. A GPC with six inputs and three outputs, including a 6:3 counter, requires two ALMs, while only one of our proposed logic cells, which is marginally larger than an ALM, is required to realize a 6:2 compressor. Reducing the number of logic cells, moreover, may allow for a tighter placement of logic cells on the device, which, in turn, reduces wirelength and routing delay; our experiments confirm this hypothesis. Using similar reasoning, the use of 7:2 rather than 6:2 compressors further increases the compression ratio, and may also reduce the number of logic cells required since each cell can consume an additional bit. Consider the i th compressor in the chain. Carry-in bits c in,0 and c in,1 are driven by the the rank 1 carry-out of the (i 1) st compressor and the rank 2 carry-out of the (i 2) nd compressor, respectively; likewise, the rank 1 and carry-out of the i th compressor drives carry-in, c in,0,ofthe(i + 1) st compressor, and the rank 2 carry-out drives carry-in, c in,1,ofthe(i + 2) nd. When an ALM is configured as a two-bit ternary adder in shared arithmetic mode, six input bits are used, so no modifications to the I/O interface arerequiredtoimplementa6:2 compressor. The 7:2 compressor, in contrast, requires an extra input bit. This is not a problem, as the ALM contains eight architecturally visible inputs; either of the two remaining inputs can be used as the seventh input when the ALM is configured as a 7:2 compressor. 5. COMPRESSOR TREE SYNTHESIS ON THE NEW LOGIC CELL This section describes a mapping heuristic that can synthesize compressor trees targeting the logic cell shown in Figure 10(a). This heuristic is an extension of an earlier one proposed by Parandeh-Afshar et al. [2008b], which targeted the Altera Stratix II FPGA. Compressor trees synthesized using an ASIC design flow produce two outputs that are summed using a CPA. Since ternary CPAs are available in Stratix II for the same delay and area as binary CPAs, the heuristic outputs compressor trees that produce three outputs instead of two. The remainder of the

18 19: 18 H. Parandeh-Afshar et al. compressor tree is synthesized using GPCs. The number of outputs per GPC was limited to four, ensuring that each GPC can be implemented using at most four 6-LUTs (or fewer, if shared arithmetic mode can be exploited). This section extends the mapping heuristic to include the possibility of configuring the logic cells as 6:2 or 7:2 compressors as well. 5.1 GPC Classification By convention, we require that a GPC must have at least 2-input bits. For example, (0,1;1)and (1,0;2)are not GPCs. Some GPCs are considered unreasonable by the heuristic because they can always be replaced with another more sensible choice. GPCs, such as (3, 1; 3), have one rank 0 input bit, which is always passed directly to the least significant output bit, that is, the value of the input bit determines whether the output is odd/even; such a GPC is considered to be unreasonable. Another class of unreasonable GPCs are those for which the number of input bits is less than or equal to the number of output bits, for example, (2, 1; 3); thesegpcsare unreasonable because they do not perform any compression. A third class of unreasonable GPCs are those that have no rank 0 input bits, for example, (2, 0; 3). Inthiscase,therank 1 input bits could be converted to rank 0 input bits of a smaller counter that produces fewer output bits, for example, (0,2;2). A primitive GPC is one that satisfies input/output constraints of M and N and is reasonable. In theory, the number of primitive GPCs is exponential in M and N; limiting M and N to small constant values ensures tractability. With N output bits, the sum, where input bits are weighted by rank, of the input bits cannot exceed 2 N 1; this ensures that the number of primitive GPCs is finite. A covering GPC is a primitive GPC whose functionality cannot be implemented by another primitive GPC. For example, a (2, 3; 3) GPC can implement a (1, 3; 3) GPC by setting one rank 1 input bit to zero. For example, there are just three covering GPCs having six inputs and three outputs: (0, 6; 3), (1,5;3),and(2, 3; 3) (see Figures 8 and 9). All other GPCs satisfying these I/O constraints are either unreasonable, for example, (3, 1; 3), or can be covered by one of the three covering GPCs already listed. 5.2 GPC Library Construction The mapping heuristic uses a library of GPCs having at most M inputs and N outputs. This library is computed once for each target FPGA and stored in a text file. The library is read from the text file each time a set of compressor trees are synthesized. First, the primitive GPCs are enumerated and added to the library. Second, the set of covering GPCs are identified and marked as such. Third, the primitive GPCs are sorted in nondecreasing order of compression ratio. Each set of primitive GPCs having the same compression ratio is sorted in nondecreasing order of the number of inputs. The total ordering of primitive

19 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 19 GPCs favors a high compression ratio as the first criterion and the number of bits consumed as a second. Parandeh-Afshar et al. [2008b] used M = 6 and N = 4 to target the Altera Stratix II FPGA. Limiting the number of inputs to M = 6 ensures that only one layer of ALMs is required to implement the counter, regardless of whether the GPC is synthesized on LUTs or uses shared arithmetic mode, that is, Figures 8 and 9. Limiting the number of outputs to N = 4 ensures that at most four ALMs are required for each GPC, under the worst case assumption that each output bit is computed using a 6-LUT; fewer ALMs are required when shared arithmetic mode can be used [Parandeh-Afshar et al. 2009]. The mapping heuristic, described in the following section, converts chains of consecutive (0, 6; 3) GPCs (6:3 counters) into 6:2 compressors, whenever possible. Unfortunately, this approach cannot be used for 7:2 compressors, as M = 6 prevents 7-input GPCs from inclusion in the library. To support 7:2 compressors, a (0, 7; 3) GPC is added to the library, but no other 7-input GPCs are included. Chains of (0, 7; 3) GPCs are converted to 7:2 compressors; when a 7-input GPC is not contained in a chain, it is converted to GPCs with at most 6 inputs, as described in the following section. 5.3 Mapping Heuristic The input to the mapping heuristic is: (1) an ordered array of integers, k i, where the i th integer is the number of bits of rank i to sum, e.g., k 0 bits of rank 0, k 1 bits of rank 1, etc.; (2) a library of GPCs, as described in the preceding section; and (3) a flag called mode which takes one of three values, ALM, 6:2, or 7:2. Ifmode = ALM, then we are targeting an FPGA containing traditional ALMs that cannot be configured as 6:2 or 7:2 compressors; if mode = 6:2 or 7:2, then we are targeting an FPGA whose logic cells can be configured as a 6:2 or 7:2 compressor, for example, Figure 10(a). The mapping heuristic generates one level of the compressor tree at a time. A subset of the input bits is covered by GPCs and possibly 6:2 or 7:2 compressors. The output bits produced by each GPC are propagated to the next level of the compressor tree, along with the bits from the current level that are not covered. Since the rank of each GPC output bit is known, a new set of columns (array of integers) is generated for each level of the tree. Pseudocode for the mapping heuristic is shown in Figure 11. A new level in the tree is generated until there are at most three rows of bits remaining, that is, each column of the next level has at most three input bits. A ternary CPA completes the tree. The remainder of this section focuses on the process of producing one level of the compressor tree, that is, how to cover a set of columns with GPCs. The following process is applied until no remaining (primitive) GPCs can cover any bits in the current level of the tree. The column having the most noncovered input bits in the current level is always selected; ties are broken arbitrarily. Selecting the column with the largest number of bits tends to favor the use of GPCs with higher compression ratios and a large number of input bits. To find the best GPC for the selected

20 19: 20 H. Parandeh-Afshar et al. Fig. 11. Pseudocode for GPC mapping heuristic [Parandeh-Afshar et al. 2008a] with extensions to exploit 6:2 and 7:2 compressors, where appropriate. column, the set of primitive GPCs is searched. The first GPC that fits the base columns and its following or previous columns is selected. If mode = 7:2 and the column contains at least seven input bits, then a (0, 7; 3) GPC is always used, and a (0, 6; 3) GPCisalwaysusedifthecolumn contains six input bits. If mode = 6:2 or ALM and the column contains at least six input bits, then a (0,6;3)GPC is always used. Otherwise, the selected column contains fewer bits than the maximum input bandwidth of the largest GPC in the library; in this case, GPCs that cover bits from columns that are immediately adjacent to the selected column can be used as well. A forward search looks for a GPC under the assumption that the bits in the selected column will have rank 0. If the selected column is c, then the forward search will attempt to include bits from columns c + 1, c + 2,..., etc. A backward search assumes that the bits in the selected column will be of the

21 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 21 Fig. 12. Illustration of the forward (a) and backward (b) search using GPC mapping. highest rank in the GPC that covers them. If the selected column is c, then the backward search will attempt to include bits from columns c 1, c 2,..., etc. In both searches, the first GPC that fits the selected column and the additional columns are selected. Among these two GPCs, the one with the highest priority, according to the sorted order, is selected. The forward and backward searches are particularly useful when the distribution of column heights is asymmetric. This occurs quite frequently for constant multipliers, including FIR filters. Figure 12 illustrates the forward and backward search. In Figure 12(a), a forward search finds a (1, 4; 3) GPC while the backward search in Figure 12(b) finds a (4, 3; 4) GPC. Since the compression ratios are 5/3 = 1.67 and 7/4 = 1.75, respectively, the GPC found by the backward search is selected. After selecting a column and a GPC, the bits that have been selected are removed from the current set of columns. The output bits produced by the GPC are added to the set of columns for the next level in the tree. This process repeats a column and GPC are selected until either all bits at the current level have been covered, or no primitive GPC in the list can cover more than a single bit. Once the current level is completely covered, the heuristic attempts toreplacesomegpcswith6:2 or 7:2 compressors. If mode = 6:2, each contiguous sequence of (0,6;3)GPCs is replaced with a contiguous sequence of 6:2 compressors, similar in principle to Figure 6. Note that this transformation reduces the number of bits in the following level; aggregated over several levels, the use of compressors rather than counters can reduce the total number of logic levels in the compressor tree. If mode = 7:2, then each contiguous sequence of (0, 7; 3) GPCs is replaced with a sequence of 7:2 compressors, just similar to what was done for 6:2 compressors. Each remaining (0,7;3)GPC is replaced by a (0, 6; 3) GPC and one unmapped bit that is propagated to the next level of the tree. The reason for doing this is that (0, 7; 3) GPCs do not map efficiently onto ALMs, so we replace them with a more favorable component. Next, the current level of the compressor tree is mapped onto logic cells. GPCs are mapped onto ALMs, while 6:2 and 7:2 compressors require the logic cell to be configured to use the carry chain shown in Figure 10(a). Additionally, the outputs of the GPCs and compressors from the preceding level of the compressor tree are connected to the inputs of the GPCs and compressors in the current level. The last step is to generate the columns for the next level of the compressor tree.

A Novel FPGA Logic Block for Improved Arithmetic Performance

A Novel FPGA Logic Block for Improved Arithmetic Performance A Novel FPGA Logic Block for Improved Arithmetic Performance Hadi Parandeh-Afshar Philip Brisk Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) School of Computer and Communication Sciences

More information

Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance

Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance Hadi Parandeh-Afshar and Paolo Ienne Ecole

More information

Architectural Improvements for Field Programmable Counter Arrays: Enabling Efficient Synthesis of Fast Compressor Trees on FPGAs

Architectural Improvements for Field Programmable Counter Arrays: Enabling Efficient Synthesis of Fast Compressor Trees on FPGAs Architectural Improvements for Field Programmable Counter Arrays: Enabling Efficient Synthesis of Fast Compressor Trees on FPGAs Alessandro Cevrero,2, Panagiotis Athanasopoulos,2, Hadi Parandeh-Afshar

More information

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER 1 ZUBER M. PATEL 1 S V National Institute of Technology, Surat, Gujarat, Inida E-mail: zuber_patel@rediffmail.com Abstract- This paper presents

More information

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES 69 CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES 4.1 INTRODUCTION Multiplication is one of the basic functions used in digital signal processing. It requires more

More information

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors T.N.Priyatharshne Prof. L. Raja, M.E, (Ph.D) A. Vinodhini ME VLSI DESIGN Professor, ECE DEPT ME VLSI DESIGN

More information

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology Inf. Sci. Lett. 2, No. 3, 159-164 (2013) 159 Information Sciences Letters An International Journal http://dx.doi.org/10.12785/isl/020305 A New network multiplier using modified high order encoder and optimized

More information

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS 1 T.Thomas Leonid, 2 M.Mary Grace Neela, and 3 Jose Anand

More information

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL E.Sangeetha 1 ASP and D.Tharaliga 2 Department of Electronics and Communication Engineering, Tagore College of Engineering and Technology,

More information

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN An efficient add multiplier operator design using modified Booth recoder 1 I.K.RAMANI, 2 V L N PHANI PONNAPALLI 2 Assistant Professor 1,2 PYDAH COLLEGE OF ENGINEERING & TECHNOLOGY, Visakhapatnam,AP, India.

More information

A Novel Approach For Designing A Low Power Parallel Prefix Adders

A Novel Approach For Designing A Low Power Parallel Prefix Adders A Novel Approach For Designing A Low Power Parallel Prefix Adders R.Chaitanyakumar M Tech student, Pragati Engineering College, Surampalem (A.P, IND). P.Sunitha Assistant Professor, Dept.of ECE Pragati

More information

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES 44 CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES 3.1 INTRODUCTION The design of high-speed and low-power VLSI architectures needs efficient arithmetic processing units,

More information

Design of an optimized multiplier based on approximation logic

Design of an optimized multiplier based on approximation logic ISSN:2348-2079 Volume-6 Issue-1 International Journal of Intellectual Advancements and Research in Engineering Computations Design of an optimized multiplier based on approximation logic Dhivya Bharathi

More information

ISSN Vol.07,Issue.08, July-2015, Pages:

ISSN Vol.07,Issue.08, July-2015, Pages: ISSN 2348 2370 Vol.07,Issue.08, July-2015, Pages:1397-1402 www.ijatir.org Implementation of 64-Bit Modified Wallace MAC Based On Multi-Operand Adders MIDDE SHEKAR 1, M. SWETHA 2 1 PG Scholar, Siddartha

More information

Digital Integrated CircuitDesign

Digital Integrated CircuitDesign Digital Integrated CircuitDesign Lecture 13 Building Blocks (Multipliers) Register Adder Shift Register Adib Abrishamifar EE Department IUST Acknowledgement This lecture note has been summarized and categorized

More information

An Optimized Design for Parallel MAC based on Radix-4 MBA

An Optimized Design for Parallel MAC based on Radix-4 MBA An Optimized Design for Parallel MAC based on Radix-4 MBA R.M.N.M.Varaprasad, M.Satyanarayana Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India Abstract In this paper a novel architecture

More information

High Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree

High Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree High Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree Alfiya V M, Meera Thampy Student, Dept. of ECE, Sree Narayana Gurukulam College of Engineering, Kadayiruppu, Ernakulam,

More information

A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS

A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS G RAMESH et al, Volume 2, Issue 7, PP:, SEPTEMBER 2014. A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS G.Ramesh 1*, K.Naga Lakshmi 2* 1. II. M.Tech (VLSI), Dept of ECE, AM Reddy Memorial College

More information

High-speed Multiplier Design Using Multi-Operand Multipliers

High-speed Multiplier Design Using Multi-Operand Multipliers Volume 1, Issue, April 01 www.ijcsn.org ISSN 77-50 High-speed Multiplier Design Using Multi-Operand Multipliers 1,Mohammad Reza Reshadi Nezhad, 3 Kaivan Navi 1 Department of Electrical and Computer engineering,

More information

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder High Speed Vedic Multiplier Designs Using Novel Carry Select Adder 1 chintakrindi Saikumar & 2 sk.sahir 1 (M.Tech) VLSI, Dept. of ECE Priyadarshini Institute of Technology & Management 2 Associate Professor,

More information

On Built-In Self-Test for Adders

On Built-In Self-Test for Adders On Built-In Self-Test for s Mary D. Pulukuri and Charles E. Stroud Dept. of Electrical and Computer Engineering, Auburn University, Alabama Abstract - We evaluate some previously proposed test approaches

More information

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing Yelle Harika M.Tech, Joginpally B.R.Engineering College. P.N.V.M.Sastry M.S(ECE)(A.U), M.Tech(ECE), (Ph.D)ECE(JNTUH), PG DIP

More information

VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic

VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic DAVID NEUHÄUSER Friedrich Schiller University Department of Computer Science D-07737 Jena GERMANY dn@c3e.de

More information

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS JDT-002-2013 EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS E. Prakash 1, R. Raju 2, Dr.R. Varatharajan 3 1 PG Student, Department of Electronics and Communication Engineeering

More information

Exploring New Architectures for Recongurable Hardware

Exploring New Architectures for Recongurable Hardware Swiss Federal Institute of Technology Lausanne Microelectronic Systems Laboratory Exploring New Architectures for Recongurable Hardware Master Diploma Work Student: Alessandro Cevrero Project Supervisors:

More information

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools K.Sravya [1] M.Tech, VLSID Shri Vishnu Engineering College for Women, Bhimavaram, West

More information

Implementation and Performance Evaluation of Prefix Adders uing FPGAs

Implementation and Performance Evaluation of Prefix Adders uing FPGAs IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) ISSN: 2319 4200, ISBN No. : 2319 4197 Volume 1, Issue 1 (Sep-Oct. 2012), PP 51-57 Implementation and Performance Evaluation of Prefix Adders uing

More information

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER American Journal of Applied Sciences 11 (2): 180-188, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.180.188 Published Online 11 (2) 2014 (http://www.thescipub.com/ajas.toc) AREA

More information

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

Design A Redundant Binary Multiplier Using Dual Logic Level Technique Design A Redundant Binary Multiplier Using Dual Logic Level Technique Sreenivasa Rao Assistant Professor, Department of ECE, Santhiram Engineering College, Nandyala, A.P. Jayanthi M.Tech Scholar in VLSI,

More information

IJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN

IJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN High throughput Modified Wallace MAC based on Multi operand Adders : 1 Menda Jaganmohanarao, 2 Arikathota Udaykumar 1 Student, 2 Assistant Professor 1,2 Sri Vekateswara College of Engineering and Technology,

More information

Faster and Low Power Twin Precision Multiplier

Faster and Low Power Twin Precision Multiplier Faster and Low Twin Precision V. Sreedeep, B. Ramkumar and Harish M Kittur Abstract- In this work faster unsigned multiplication has been achieved by using a combination High Performance Multiplication

More information

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques. Introduction EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Techniques Cristian Grecu grecuc@ece.ubc.ca Course web site: http://courses.ece.ubc.ca/353/ What have you learned so far?

More information

A Review on Different Multiplier Techniques

A Review on Different Multiplier Techniques A Review on Different Multiplier Techniques B.Sudharani Research Scholar, Department of ECE S.V.U.College of Engineering Sri Venkateswara University Tirupati, Andhra Pradesh, India Dr.G.Sreenivasulu Professor

More information

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL E.Deepthi, V.M.Rani, O.Manasa Abstract: This paper presents a performance analysis of carrylook-ahead-adder and carry

More information

Adder (electronics) - Wikipedia, the free encyclopedia

Adder (electronics) - Wikipedia, the free encyclopedia Page 1 of 7 Adder (electronics) From Wikipedia, the free encyclopedia (Redirected from Full adder) In electronics, an adder or summer is a digital circuit that performs addition of numbers. In many computers

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Project Background High speed multiplication is another critical function in a range of very large scale integration (VLSI) applications. Multiplications are expensive and slow

More information

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm Vijay Dhar Maurya 1, Imran Ullah Khan 2 1 M.Tech Scholar, 2 Associate Professor (J), Department of

More information

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm V.Sandeep Kumar Assistant Professor, Indur Institute Of Engineering & Technology,Siddipet

More information

An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products

An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products 21st International Conference on VLSI Design An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products Sabyasachi Das Synplicity Inc Sunnyvale, CA, USA Email: sabya@synplicity.com

More information

Implementing Logic with the Embedded Array

Implementing Logic with the Embedded Array Implementing Logic with the Embedded Array in FLEX 10K Devices May 2001, ver. 2.1 Product Information Bulletin 21 Introduction Altera s FLEX 10K devices are the first programmable logic devices (PLDs)

More information

Design and Implementation of High Speed Carry Select Adder

Design and Implementation of High Speed Carry Select Adder Design and Implementation of High Speed Carry Select Adder P.Prashanti Digital Systems Engineering (M.E) ECE Department University College of Engineering Osmania University, Hyderabad, Andhra Pradesh -500

More information

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter Dr.N.C.sendhilkumar, Assistant Professor Department of Electronics and Communication Engineering Sri

More information

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing 2015 International Conference on Computer Communication and Informatics (ICCCI -2015), Jan. 08 10, 2015, Coimbatore, INDIA Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing S.Padmapriya

More information

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST ǁ Volume 02 - Issue 01 ǁ January 2017 ǁ PP. 06-14 Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST Ms. Deepali P. Sukhdeve Assistant Professor Department

More information

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm M. Suhasini, K. Prabhu Kumar & P. Srinivas Department of Electronics & Comm. Engineering, Nimra College of Engineering

More information

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen Abstract A new low area-cost FIR filter design is proposed using a modified Booth multiplier based on direct form

More information

Architectures and Algorithms for Synthesizable Embedded Programmable Logic Cores

Architectures and Algorithms for Synthesizable Embedded Programmable Logic Cores Architectures and Algorithms for Synthesizable Embedded Programmable Logic Cores Noha Kafafi, Kimberly Bozman, Steven J.E. Wilton Department of Electrical and Computer Engineering University of British

More information

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Vijay Kumar Ch 1, Leelakrishna Muthyala 1, Chitra E 2 1 Research Scholar, VLSI, SRM University, Tamilnadu, India 2 Assistant Professor,

More information

DESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA

DESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA International Journal of Advanced Research in Engineering and Technology (IJARET) Volume 10, Issue 1, January February 2019, pp. 88 94, Article ID: IJARET_10_01_009 Available online at http://www.iaeme.com/ijaret/issues.asp?jtype=ijaret&vtype=10&itype=1

More information

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER K. RAMAMOORTHY 1 T. CHELLADURAI 2 V. MANIKANDAN 3 1 Department of Electronics and Communication

More information

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 07, 2015 ISSN (online): 2321-0613 Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse

More information

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering FPGA Fabrics Reference Wayne Wolf, FPGA-Based System Design Pearson Education, 2004 CPLD / FPGA CPLD Interconnection of several PLD blocks with Programmable interconnect on a single chip Logic blocks executes

More information

Low-Power Multipliers with Data Wordlength Reduction

Low-Power Multipliers with Data Wordlength Reduction Low-Power Multipliers with Data Wordlength Reduction Kyungtae Han, Brian L. Evans, and Earl E. Swartzlander, Jr. Dept. of Electrical and Computer Engineering The University of Texas at Austin Austin, TX

More information

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India, ISSN 2319-8885 Vol.03,Issue.30 October-2014, Pages:5968-5972 www.ijsetr.com Low Power and Area-Efficient Carry Select Adder THANNEERU DHURGARAO 1, P.PRASANNA MURALI KRISHNA 2 1 PG Scholar, Dept of DECS,

More information

Comparative Analysis of Various Adders using VHDL

Comparative Analysis of Various Adders using VHDL International Journal of Engineering and Technical Research (IJETR) ISSN: 2321-0869, Volume-3, Issue-4, April 2015 Comparative Analysis of Various s using VHDL Komal M. Lineswala, Zalak M. Vyas Abstract

More information

Efficient Implementation of Parallel Prefix Adders Using Verilog HDL

Efficient Implementation of Parallel Prefix Adders Using Verilog HDL Efficient Implementation of Parallel Prefix Adders Using Verilog HDL D Harish Kumar, MTech Student, Department of ECE, Jawaharlal Nehru Institute Of Technology, Hyderabad. ABSTRACT In Very Large Scale

More information

Mahendra Engineering College, Namakkal, Tamilnadu, India.

Mahendra Engineering College, Namakkal, Tamilnadu, India. Implementation of Modified Booth Algorithm for Parallel MAC Stephen 1, Ravikumar. M 2 1 PG Scholar, ME (VLSI DESIGN), 2 Assistant Professor, Department ECE Mahendra Engineering College, Namakkal, Tamilnadu,

More information

Performance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems

Performance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder Implementation of 5-bit High Speed and Area Efficient Carry Select Adder C. Sudarshan Babu, Dr. P. Ramana Reddy, Dept. of ECE, Jawaharlal Nehru Technological University, Anantapur, AP, India Abstract Implementation

More information

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE R.ARUN SEKAR 1 B.GOPINATH 2 1Department Of Electronics And Communication Engineering, Assistant Professor, SNS College Of Technology,

More information

Computer Arithmetic (2)

Computer Arithmetic (2) Computer Arithmetic () Arithmetic Units How do we carry out,,, in FPGA? How do we perform sin, cos, e, etc? ELEC816/ELEC61 Spring 1 Hayden Kwok-Hay So H. So, Sp1 Lecture 7 - ELEC816/61 Addition Two ve

More information

Design of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi

Design of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 105 Design of Baugh Wooley Multiplier with Adaptive Hold Logic M.Kavia, V.Meenakshi Abstract Mostly, the overall

More information

Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique

Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique TALLURI ANUSHA *1, and D.DAYAKAR RAO #2 * Student (Dept of ECE-VLSI), Sree Vahini Institute of Science and Technology,

More information

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog 1 P.Sanjeeva Krishna Reddy, PG Scholar in VLSI Design, 2 A.M.Guna Sekhar Assoc.Professor 1 appireddigarichaitanya@gmail.com,

More information

A Survey on Power Reduction Techniques in FIR Filter

A Survey on Power Reduction Techniques in FIR Filter A Survey on Power Reduction Techniques in FIR Filter 1 Pooja Madhumatke, 2 Shubhangi Borkar, 3 Dinesh Katole 1, 2 Department of Computer Science & Engineering, RTMNU, Nagpur Institute of Technology Nagpur,

More information

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers Dharmapuri Ranga Rajini 1 M.Ramana Reddy 2 rangarajini.d@gmail.com 1 ramanareddy055@gmail.com 2 1 PG Scholar, Dept

More information

Low-Power Approximate Unsigned Multipliers with Configurable Error Recovery

Low-Power Approximate Unsigned Multipliers with Configurable Error Recovery SUBMITTED FOR REVIEW 1 Low-Power Approximate Unsigned Multipliers with Configurable Error Recovery Honglan Jiang*, Student Member, IEEE, Cong Liu*, Fabrizio Lombardi, Fellow, IEEE and Jie Han, Senior Member,

More information

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1 Design Of Low Power Approximate Mirror Adder Sasikala.M 1, Dr.G.K.D.Prasanna Venkatesan 2 ME VLSI student 1, Vice Principal, Professor and Head/ECE 2 PGP college of Engineering and Technology Nammakkal,

More information

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER JDT-003-2013 LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER 1 Geetha.R, II M Tech, 2 Mrs.P.Thamarai, 3 Dr.T.V.Kirankumar 1 Dept of ECE, Bharath Institute of Science and Technology

More information

Design and Implementation of Complex Multiplier Using Compressors

Design and Implementation of Complex Multiplier Using Compressors Design and Implementation of Complex Multiplier Using Compressors Abstract: In this paper, a low-power high speed Complex Multiplier using compressor circuit is proposed for fast digital arithmetic integrated

More information

SPIRO SOLUTIONS PVT LTD

SPIRO SOLUTIONS PVT LTD VLSI S.NO PROJECT CODE TITLE YEAR ANALOG AMS(TANNER EDA) 01 ITVL01 20-Mb/s GFSK Modulator Based on 3.6-GHz Hybrid PLL With 3-b DCO Nonlinearity Calibration and Independent Delay Mismatch Control 02 ITVL02

More information

Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition

Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition Thoka. Babu Rao 1, G. Kishore Kumar 2 1, M. Tech in VLSI & ES, Student at Velagapudi Ramakrishna

More information

Design of 8-4 and 9-4 Compressors Forhigh Speed Multiplication

Design of 8-4 and 9-4 Compressors Forhigh Speed Multiplication American Journal of Applied Sciences 10 (8): 893-900, 2013 ISSN: 1546-9239 2013 R. Marimuthu et al., This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/ajassp.2013.893.900

More information

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. PP 42-46 www.iosrjournals.org Design and Simulation of Convolution Using Booth Encoded Wallace

More information

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website: International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages-3529-3538 June-2015 ISSN (e): 2321-7545 Website: http://ijsae.in Efficient Architecture for Radix-2 Booth Multiplication

More information

Analysis of Parallel Prefix Adders

Analysis of Parallel Prefix Adders Analysis of Parallel Prefix Adders T.Sravya M.Tech (VLSI) C.M.R Institute of Technology, Hyderabad. D. Chandra Mohan Assistant Professor C.M.R Institute of Technology, Hyderabad. Dr.M.Gurunadha Babu, M.Tech,

More information

Design of Efficient Han-Carlson-Adder

Design of Efficient Han-Carlson-Adder Design of Efficient Han-Carlson-Adder S. Sri Katyayani Dept of ECE Narayana Engineering College, Nellore Dr.M.Chandramohan Reddy Dept of ECE Narayana Engineering College, Nellore Murali.K HoD, Dept of

More information

Efficient Multi-Operand Adders in VLSI Technology

Efficient Multi-Operand Adders in VLSI Technology Efficient Multi-Operand Adders in VLSI Technology K.Priyanka M.Tech-VLSI, D.Chandra Mohan Assistant Professor, Dr.S.Balaji, M.E, Ph.D Dean, Department of ECE, Abstract: This paper presents different approaches

More information

International Journal of Scientific & Engineering Research, Volume 7, Issue 3, March-2016 ISSN

International Journal of Scientific & Engineering Research, Volume 7, Issue 3, March-2016 ISSN ISSN 2229-5518 159 EFFICIENT AND ENHANCED CARRY SELECT ADDER FOR MULTIPURPOSE APPLICATIONS A.RAMESH Asst. Professor, E.C.E Department, PSCMRCET, Kothapet, Vijayawada, A.P, India. rameshavula99@gmail.com

More information

A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE

A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE R.Mohanapriya #1, K. Rajesh*² # PG Scholar (VLSI Design), Knowledge Institute of Technology, Salem * Assistant

More information

AREA AND POWER EFFICIENT CARRY SELECT ADDER USING BRENT KUNG ARCHITECTURE

AREA AND POWER EFFICIENT CARRY SELECT ADDER USING BRENT KUNG ARCHITECTURE AREA AND POWER EFFICIENT CARRY SELECT ADDER USING BRENT KUNG ARCHITECTURE S.Durgadevi 1, Dr.S.Anbukarupusamy 2, Dr.N.Nandagopal 3 Department of Electronics and Communication Engineering Excel Engineering

More information

EC 1354-Principles of VLSI Design

EC 1354-Principles of VLSI Design EC 1354-Principles of VLSI Design UNIT I MOS TRANSISTOR THEORY AND PROCESS TECHNOLOGY PART-A 1. What are the four generations of integrated circuits? 2. Give the advantages of IC. 3. Give the variety of

More information

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier M.Shiva Krushna M.Tech, VLSI Design, Holy Mary Institute of Technology And Science, Hyderabad, T.S,

More information

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA Shruti Dixit 1, Praveen Kumar Pandey 2 1 Suresh Gyan Vihar University, Mahaljagtapura, Jaipur, Rajasthan, India 2 Suresh Gyan Vihar University,

More information

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA #1 NANGUNOORI THRIVENI Pursuing M.Tech, #2 P.NARASIMHULU - Associate Professor, SREE CHAITANYA COLLEGE OF ENGINEERING, KARIMNAGAR,

More information

DESIGN OF LOW POWER MULTIPLIERS

DESIGN OF LOW POWER MULTIPLIERS DESIGN OF LOW POWER MULTIPLIERS GowthamPavanaskar, RakeshKamath.R, Rashmi, Naveena Guided by: DivyeshDivakar AssistantProfessor EEE department Canaraengineering college, Mangalore Abstract:With advances

More information

AN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION

AN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION AN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION K.Mahesh #1, M.Pushpalatha *2 #1 M.Phil.,(Scholar), Padmavani Arts and Science College. *2 Assistant Professor, Padmavani Arts

More information

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers IOSR Journal of Business and Management (IOSR-JBM) e-issn: 2278-487X, p-issn: 2319-7668 PP 43-50 www.iosrjournals.org A Survey on A High Performance Approximate Adder And Two High Performance Approximate

More information

A Compact Design of 8X8 Bit Vedic Multiplier Using Reversible Logic Based Compressor

A Compact Design of 8X8 Bit Vedic Multiplier Using Reversible Logic Based Compressor A Compact Design of 8X8 Bit Vedic Multiplier Using Reversible Logic Based Compressor 1 Viswanath Gowthami, 2 B.Govardhana, 3 Madanna, 1 PG Scholar, Dept of VLSI System Design, Geethanajali college of engineering

More information

Reduced Redundant Arithmetic Applied on Low Power Multiply-Accumulate Units

Reduced Redundant Arithmetic Applied on Low Power Multiply-Accumulate Units Reduced Redundant Arithmetic Applied on Low Power Multiply-Accumulate Units DAVID NEUHÄUSER Friedrich Schiller University Department of Computer Science D-7737 Jena GERMANY david.neuhaeuser@uni-jena.de

More information

Area Delay Efficient Novel Adder By QCA Technology

Area Delay Efficient Novel Adder By QCA Technology Area Delay Efficient Novel Adder By QCA Technology 1 Mohammad Mahad, 2 Manisha Waje 1 Research Student, Department of ETC, G.H.Raisoni College of Engineering, Pune, India 2 Assistant Professor, Department

More information

PUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER CSEA2012 ISSN: ; e-issn:

PUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER   CSEA2012 ISSN: ; e-issn: New BEC Design For Efficient Multiplier NAGESWARARAO CHINTAPANTI, KISHORE.A, SAROJA.BODA, MUNISHANKAR Dept. of Electronics & Communication Engineering, Siddartha Institute of Science And Technology Puttur

More information

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique G. Sai Krishna Master of Technology VLSI Design, Abstract: In electronics, an adder or summer is digital circuits that

More information

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA 1. Vijaya kumar vadladi,m. Tech. Student (VLSID), Holy Mary Institute of Technology and Science, Keesara, R.R. Dt. 2.David Solomon Raju.Y,Associate

More information

[Devi*, 5(4): April, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785

[Devi*, 5(4): April, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY DESIGN OF HIGH SPEED FIR FILTER ON FPGA BY USING MULTIPLEXER ARRAY OPTIMIZATION IN DA-OBC ALGORITHM Palepu Mohan Radha Devi, Vijay

More information

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay 1. K. Nivetha, PG Scholar, Dept of ECE, Nandha Engineering College, Erode. 2.

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 6a High-Speed Multiplication - I Israel Koren ECE666/Koren Part.6a.1 Speeding Up Multiplication

More information

Area Efficient and Low Power Reconfiurable Fir Filter

Area Efficient and Low Power Reconfiurable Fir Filter 50 Area Efficient and Low Power Reconfiurable Fir Filter A. UMASANKAR N.VASUDEVAN N.Kirubanandasarathy Research scholar St.peter s university, ECE, Chennai- 600054, INDIA Dean (Engineering and Technology),

More information

Design and implementation of Parallel Prefix Adders using FPGAs

Design and implementation of Parallel Prefix Adders using FPGAs IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 6, Issue 5 (Jul. - Aug. 2013), PP 41-48 Design and implementation of Parallel Prefix Adders

More information

Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates

Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates Objectives In this chapter, you will learn about The binary numbering system Boolean logic and gates Building computer circuits

More information