An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor
|
|
- Sophia Lloyd
- 5 years ago
- Views:
Transcription
1 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor HADI PARANDEH-AFSHAR, PHILIP BRISK, and PAOLO IENNE Ecole Polytechnique Federale de Lausanne (EPFL) To improve FPGA performance for arithmetic circuits that are dominated by multi-input addition operations, an FPGA logic block is proposed that can be configured as a 6:2 or 7:2 compressor. Compressors have been used successfully in the past to realize parallel multipliers in VLSI technology; however, the peculiar structure of FPGA logic blocks, coupled with the high cost of the routing network relative to ASIC technology, renders compressors ineffective when mapped onto the general logic of an FPGA. On the other hand, current FPGA logic cells have already been enhanced with carry chains to improve arithmetic functionality, for example, to realize fast ternary carry-propagate addition. The contribution of this article is a new FPGA logic cell that is specialized to help realize efficient compressor trees on FPGAs. The new FPGA logic cell has two variants that can respectively be configured as a 6:2 or a 7:2 compressor using additional carry chains that, coupled with lookup tables, provide the necessary functionality. Experiments show that the use of these modified logic cells significantly reduces the delay of compressor trees synthesized on FPGAs compared to state-of-the-art synthesis techniques, with a moderate increase in area and power consumption. Categories and Subject Descriptors: B.7.1 [Integrated Circuits]: Types and Design Styles Gate Arrays; G.1.0 [Numerical Analysis]: General Computer Arithmetic General Terms: Algorithms, Performance Additional Key Words and Phrases: FPGA, carry chain, compressor tree, 6:2 compressor, 7:2 compressor ACM Reference Format: Parandeh-Afshar, H., Brisk, P., and Ienne, P An FPGA logic cell and carry chain configurable as a 6:2 or 7:2 compressor. ACM Trans. Reconfig. Techn. Syst. 2, 3, Article 19 (September 2009), 42 pages. DOI = / P. Brisk is currently affiliated with the Department of Computer Science and Engineering in the Bourns College of Engineering at the University of California, Riverside. Author s address: H. Parandeh-Afshar, hadi.parandehafshar@epfl.ch; P. Brisk, Philip.brisk@gmail.com; P. Ienne, paolo.ienne@epfl.ch. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY USA, fax +1 (212) , or permissions@acm.org. c 2009 ACM /2009/09-ART19 $10.00 DOI: /
2 19: 2 H. Parandeh-Afshar et al. 1. INTRODUCTION Due to their inherent reconfigurability, FPGAs are one feasible hardware platform for low-volume markets, where vendors cannot justify the design, testing, and verification costs of an ASIC. Although an FPGA implementation of a circuit will outperform traditional software, a noticeable performance gap between FPGAs and ASICs remains [Kuon and Rose 2007]. One important area that is ripe for improvement is arithmetic dominated circuits; in particular, due to the peculiar logic cell structure and carry chains in modern FPGAs, addition and multiplication-dominated circuits cannot take advantage of the carry-save representation. One of the fundamental results in computer arithmetic is that addition scales well when the number of inputs increases beyond 2; this was first observed by Wallace [1964] in the context of parallel multiplier design. The key is not to use trees of traditional carry-propagate adders, that is, circuits that produce the sum of two (signed) binary integers; instead, the integers are aggregated together using a circuit called a compressor tree. Numerous methods for compressor tree generation have been published since their introduction in the early 1960s [Wallace 1964; Dadda 1965; Swartzlander 1973; Stenzel et al. 1977; Weinberger 1981; Santoro and Horowitz 1988; Song and De Micheli 1991; Fadavi-Arkedani 1993; Oklobdzija and Villeger 1995; Stelling and Oklobdzija 1996; Stelling et al. 1998; Kwon et al. 2002; Um and Kim 2002; Mora Mora et al. 2006; Verma and Ienne 2007a], mostly in the context of parallel multiplication; more generally, these circuits can also sum k > 2 integers. The architecture of modern FPGAs is generally not well suited to compressor trees. The logic clusters of the Altera Stratix II-IV and Xilinx Virtex-5 FPGAs can be configured to implement ternary (3-input) addition using fast carry chains [Cherepacha and Lewis 1996; Hauck et al. 2000; Frederick and Somani 2006]. The primary advantage of the carry chains is that the carry bits are propagated directly from one cell to its adjacent neighbor, thereby avoiding the overhead of the routing network. This design point favors the use of ternary adder trees rather than compressor trees. Parandeh-Afshar et al. [2008b, 2008c] showed that compressor trees can be synthesized on FPGAs using a circuit called a Generalized Parallel Counter (GPC) [Stenzel et al. 1977]. This GPC Mapping approach yields compressor trees whose delay is significantly lower than ternary adder trees, despite the latter s use of the carry chains; however, there is some noticeable increase in the number of logic cells required. This article, an extension of prior work by Parandeh-Afshar et al. [2008a], introduces and evaluates a new logic cell, based on the Altera Adaptive Logic Module (ALM), that has an additional carry chain, which allows it to be configured as a 6:2 or 7:2 compressor; this compressor belongs to a well-known class of circuits that have been used for successful synthesis of ASIC multipliers in the past [Weinberger 1981; Song and De Micheli 1991; Oklobdzija and Villeger 1995]. By combining the strengths of the GPC mapping with the use of 6:2 or 7:2 compressors, when possible, faster compressor trees can be realized on the FPGA. Additionally, we compare the power consumption of compressor trees
3 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 3 Fig. 1. (a) A ripple-carry adder; (b) a carry-save adder. mapped onto the proposed logic cells with compressor trees synthesized using ternary adder trees and GPC mapping. The article is organized as follows. Section 2 begins by introducing a collection of arithmetic primitives (counters, compressors, compressor trees) that are required to understand the remaining sections of the paper. Section 3 summarizes related work in the field of FPGA architecture and mapping, focusing specifically on features designed for enhanced arithmetic performance. Section 4 presents the new logic cell, and Section 5 describes the approach that we used to map circuits onto FPGAs containing the new cell. Our experimental platform, methodology, and results are presented in Sections 6 8. Section 9 concludes the article. 2. ARITHMETIC AND FPGA PRIMITIVES 2.1 Full and Half Adders At the bit level, a half-adder (HA) is a 2-input, 2-output circuit that computes the sum of two bits and outputs the result as an unsigned binary integer. A full-adder (FA) computes a similar sum for 3 input bits. The lower-order output bit is called a sum, and the higher-order output bit is called a carry. Inthecase of an FA, one of the inputs is called a carry-in bit and the high-order output is called a carry-out. Many arithmetic circuits, including adders and multipliers are comprised primarily of HAs and FAs. 2.2 Ripple-Carry and Carry-Save Adders A Carry Propagate Adder (CPA) is a circuit that adds two binary integers; if the integers are signed, two s complement form is assumed. Numerous architectures for carry-propagate adders have been proposed in the past. In modern CMOS technologies, significant differences in critical path delay among the different adder architectures generally do not manifest themselves for small bitwidths, that is, 8-bits or less. The most straightforward CPA architecture is the Ripple-Carry Adder (RCA), which generally has the smallest area but highest delay compared to the alternatives. Figure 1(a) shows a 4-bit RCA constructed from FA cells; the carry-in of the least significant FA is 0, so an HA can be used instead of an FA. As shown in Figure 1(a), an RCA is a 1-dimensional array of FAs, where the carry-out of each FA is connected directly to the carry-in of the next; thus, the worst-case critical path delay is through all of the FAs in the design. If an RCA adds two k-bit numbers, the complexity of the critical path delay is O(k). Many faster, but larger, alternative adders have been designed, most with a critical path delay of O(log k).
4 19: 4 H. Parandeh-Afshar et al. Fig. 2. Two implementations of a 4-bit ternary adder using (a) an adder tree, i.e., two RCAs; and (b) a compressor tree, i.e., a CSA followed by an RCA. The compressor tree implementation eliminates the delay of one half adder (HA) from the critical path. A Carry-Save Adder (CSA), shown in Figure 1(b), breaks the carry chain; in fact, it is a 1-dimensional array of disconnected FAs. CSAs are generally used in conjunction with CPAs in order to perform efficient n-input addition for n > Adder and Compressor Trees Suppose that we want to compute the sum of n > 2 binary integers. One approach is to use an Adder Tree, that is, a tree of CPAs; the alternative is to build a tree of carry-save adders instead, only using a CPA at the end. Figure 2 shows an example where three four-bit binary integers are added. In Figure 2(a), two RCAs are used; in Figure 2(b), a CSA is followed by an RCA. Let d FA and d HA are the respective delays of full and half adders. The critical path delay of the circuit in Figure 2(a) is 4d FA +2d HA, while the critical path delay of the circuit in Figure 2(b) is 3d FA +2d HA, an overall savings of d FA compared to Figure 2(a). This savings occurs because the use of the CSA instead of the RCA permits the elimination of one bit from the RCA in Figure 2(b). The idea of using carry-save addition for fast accumulation dates back to the work of Wallace [1964] and Dadda [1965] who designed fast parallel multipliers; however, the fundamental ideas generalize quite elegantly to multiinput addition as well. Formally, let A 1, A 2,..., A n be a set of binary integers to sum. A Compressor Tree is a circuit that produces two values, sum (S) and carry (C), such that: n S+ C = A i. (1) A CPA then performs the final addition, S+ C. i=1
5 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 5 Fig. 3. Illustration of the critical path delay through a compressor tree of a multiplier, including that of the final CPA. The critical path typically includes the j most significant bits of the final CPA; the portion of the final CPA that computes the m j least significant bits can be optimized for area rather than for speed, as long as it does not become critical. Wallace and Dadda trees are two specific compressor tree architectures; many others have also been proposed [Swartzlander 1973; Stenzel et al. 1977; Weinberger 1981; Santoro and Horowitz 1988; Song and De Micheli 1991; Fadavi-Arkedani 1993; Stelling and Oklobdzija 1996; Stelling et al. 1998; Kwon et al. 2002; Um and Kim 2002; Mora Mora et al. 2006; Verma and Ienne 2007a]. The superiority of compressor trees over adder trees is one of the most fundamental results of digital arithmetic. Intuitively, it may seem that this is because an adder tree pays the penalty of a carry chain at each level; this is, however, a fallacy, as illustrated by Figure 2 in the preceding discussion. In actuality, the benefit of compressor trees arises from their ability to reduce the bitwidth of the final CPA in the case of multiinput addition. Parallel multipliers in ASIC technology, however, are more complicated. In multiinput addition, the number of bits to sum at each position is the same. This is not true in the case of parallel multiplication: after a partial product generation or Booth encoding stage, the number of bits to sum tends to be greater among the bit positions in the middle. As illustrated conceptually by Figure 3, the lower-order bits of the final CPA are generally not on the critical path, as the bits that arrive at these positions go through fewer layers of logic within the compressor tree. In other words, the arrival time of the bits at the final CPA is nonuniform, unlike the case of multiinput addition. Based on this observation, Oklobdzija and Villeger [1995] argued that the final CPA of a multiplier should be implemented as a hybrid adder, which uses a small and slow CPA, such as an RCA, for the low-order bits, and a faster adder, such as a carry-select adder for the higher-order bits. Carry-select adders are particularly useful when the arrival time of bits is nonuniform. Carry-select adders can start to add the bits as soon as they arrive. RCAs, in contrast, cannot, as the output bit at position i depends on the carry-out bit computed at position i 1. That being said, carry-select adders can be constructed from smaller-bitwidth RCAs as building blocks. The work summarized in this section targets ASIC design methodologies; FPGAs, in contrast, possess fast carry chains, whose usage often dictates the types of adders that perform well on specific device families.
6 19: 6 H. Parandeh-Afshar et al. 2.4 Parallel Counters An m:n parallel counter (or single-column counter) is a circuit that takes m input bits, counts the number of input bits that are set to one, and outputs the value as an n-bit binary unsigned integer. The output range is [0, m], so the number of output bits is: n = log 2 (m +1). (2) In the context of compressor trees, HAs and FAs are 2:2 and 3:2 counters respectively. Verma and Ienne [2007a], for example, described an integer linear programming formulation for compressor tree design that uses a library of m:n counters, for 2 m 8. Let B = b k 1 b k 2...b 0 be a k-bit unsigned binary integer, where b k 1 is the most significant bit, and b 0 is the least significant bit. Each bit b r contributes atotalvalueofb r 2 r to the total value of B, i.e., b r contributes 2 r if it is set, and 0 otherwise. In this context, r is called the rank of b r. When an m:n counter is used to synthesize a compressor tree, all of its inputs have the same rank. A Generalized Parallel Counter (GPC) is an extension of an m:n counter that can sum bits of multiple ranks [Stenzel et al. 1977]. For example, a (2, 3; 3) GPC can sum up to 2 bits of rank 1 and 3 bits of rank 0; the maximum output value is = 7, so3 output bits are required. The general form of a GPC is (k t 1, k t 2,..., k 0 ; s), where k r is the maximum numberofbitsofrankr that can be summed, and s is the number of output bits. Similar to an m:n counter, a GPC must satisfy the following property: ( ) t 1 s = log 2 1+ k r 2 r. (3) In fact, a sufficiently large m:n counter can implement a GPC (although many other implementations also exist). Each GPC input bit of rank r is connected to 2 r inputs of the m:n counter; any unused input bits of the m:n counter are then driven to 0. GPCs map efficiently onto FPGAs [Parandeh-Afshar et al. 2008b, 2008c]. Specifically, if the FPGA has k-input LUTs, then k-input GPCs can be mapped onto the LUTs (one LUT is used per GPC output bit) using one logic level. 2.5 Compressors Compressors (not to be confused with compressor trees) are arithmetic components, similar in principle to parallel counters, but with two distinct differences: (1) they have explicit carry-in and carry-out bits; and (2) there may be some redundancy among the ranks of the sum and carry-output bits. The 4:2 compressor (also called a 4:2 CSA), illustrated in Figure 4, was introduced by Weinberger [1981]; at first sight, this name may appear to be somewhat of a misnomer: although it has 4 input bits and produces 2 sum output bits (out 0 and out 1 ), it also has a carry-in (c in ) and a carry-out (c out ) bit (thus, the total number of input/output bits are 5 and 3); however, it is not the same circuit as a 5:3 compressor. All input bits, including c in,haverank0; thetwo output bits have ranks 0 and 1 respectively, while c out has rank 1 as well. Thus, r=0
7 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 7 Fig. 4. (a) 4:2 compressor I/O diagram; (b) 4:2 compressor architecture; (c) 4-ary adder built from an array of 4:2 compressors followed by an RCA; (d) illustration of the interconnect between consecutive 4:2 compressors: although the array has the appearance of an RCA in Figure 4(c), the carry chain only goes through two compressors. the output of the 4:2 compressor is a redundant number; for example, out 1 = 0 and c out = 1 is equivalent to out 1 = 1 and c out = 0 in all cases. When k4:2compressors are connected in a carry chain, a total of 4k input bits are compressed down to 2k output bits plus one additional carry-out bit; the carry-in bit of the first compressor is set to 0. The primary difference between compressors and counters are the presence of carry bits in the former; it is also important to recognize that a compressor tree can be constructed from compressors, counters, or both. Figure 4(a) shows the inputs and outputs of the 4:2 compressor labeled with their ranks; Figure 4(b) shows one 4:2 compressor architecture, which is constructed using two 3:2 counters. Figure 4(c) shows a 4-bit 4-input adder, consisting of four 4:2 compressors in a 1-dimensional array followed by a four-bit RCA. At first glance, the array of 4:2 compressors appears to have the same structure as an RCA, as the c out bit of each 4:2 compressor is connected to the c in bit of the subsequent one; however, this is not actually the case, as shown in
8 19: 8 H. Parandeh-Afshar et al. Fig. 5. (a)/(b) 6:2/7:2 compressor I/O diagram; (c)/(d) 6:2/7:2 compressor architecture; (e) illustration of the interconnection pattern between consecutive 6:2 compressors (it is the same for 7:2 compressors). Figure 4(d); the fact that there is no direct path from a carry-in to a carry-out prevents the formation of a ripple-carry structure. The new FPGA logic cell described in this paper has two variants that can respectively be configured as a 6:2 or a 7:2 compressor, which generalize the 4:2 compressor cell whose use is shown in Figure 4. Figure 5(a) and (b) show the basic I/O structure of the 6:2 and 7:2 compressors. Figure 5(c) and (d) show the circuit-level architecture; the only difference is that a 2:2 counter in the 6:2 compressor is upgraded to an 3:2 counter in the 7:2 compressor, and the 7 th input is connected to one of the inputs the aforementioned 3:2 counter.
9 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 9 Fig. 6. (a) Covering a set of columns with 6:3 counters yields 3 bits per column in the output; (b) using 6:2 compressors reduces the number of bits per column to 2. Contiguous columns covered with 6:3 counters can be converted to 6:2 compressors. Figure 5(e) shows the interconnect structure. Consider the i th compressor in sequence. The rank1 carry output bit (c out,0 ) connects to carry-input c in,0 of the (i + 1) st compressor; also, the rank 2 carry output bit (c out,1 ) connects to carry-input c in,1 of the (i + 2) nd compressor. From Figure 5(c) and (d), we can see that there is no direct path from either of the carry-in bits of the 6:2 or 7:2 compressor to one of the carry-out bits; similar in principle to Figure 4(d), this prevents the formation of a ripple-carry chain between compressors. 2.6 Compression Ratio Let I and O be the number of inputs and outputs produced by a counter, GPC, or compressor; for compressors, I and O do not include the carry-in and carryout bits. The compression ratio (CR) is defined as CR = I/O. For example, a 6-input, 3-output GPC has CR=6/3=2, while a 6:2 compressor has CR = 6/2 = 3. TheCR tendstobehigherforcompressors than counters. Figure 6(a) shows compression using 6:3 counters; which produce three output bits per column, while 6:2 compressors, shown in Figure 6(b), produce two output bits per column; the other output bits are propagated down the carry chain. 3. RELATED WORK 3.1 Compressor Tree Synthesis in ASIC Technology Compressor trees for partial product accumulation were introduced by Wallace [1964] and Dadda [1965], who built them from CSAs; HAs were used at points where only 2 bits in the same column need to be compressed. Fadavi-Ardekani [1993] recognized that the bits produced by a compressor tree may arrive at different times at the final adder, and designed a specific adder for this purpose; however, this work assumed that all partial product bits arrive to the compressor tree at the same time. Stelling et al. [1996, 1998] relaxed this
10 19: 10 H. Parandeh-Afshar et al. assumption, and developed appropriate techniques to build the compressor tree and designed the final adder appropriately. Due to the importance of wire delays in deep submicron technology, Um and Kim [2002] proposed a two-phase layout-aware compressor tree synthesis technique that strives for a much more regular interconnect topology than the compressor trees produced by the 3-greedy algorithm of Stelling et al. [1998]. Verma and Ienne [2007a] developed an integer linear program (ILP) that could optimally synthesize compressor trees from a library of m:n counters. To bound the runtime of the synthesis procedure, they limited m to the range [2, 8]. Previously,m:n counters, like compressor trees, were built from CSAs, or libraries of smaller m:n counters. Through efficient logic synthesis techniques for arithmetic circuits [Verma and Ienne 2007b], they found that better m:n counters could be constructed from basic gates, rather than smaller counters. The availability of a library of highly optimized counters was important to the success of their ILP formulation; another contributing factor was that the ILP could optimize for the delay profile of any final adder. GPCs have also been used in the past to build efficient compressor trees for parallel multipliers [Stenzel et al. 1977]. Mora Mora et al. [2006] described a multiplier generation approach for ASICs that implemented GPCs using ROMs, with the restriction that all input columns to the GPC have the same number of bits. The 4:2 compressor [Weinberger 1981], was subsequently used by Santoro and Horowitz [1988] in a parallel multiplier. Over the years, various researchers have proposed the use of larger compressors and counters as well, including Kwon et al. [2002] (5:2, 5:3) and Song and De Micheli [1991] (9:2, 27:5). A column is a set of bits having the same rank, r, at some level in a compressor tree; all of the inputs to a FA or an HA in a compressor tree belong to the same column. The FA or HA produces two output bits, one of rank r, oneof rank r + 1. The delay through the FA or HA to the rank r output is called the vertical propagation delay, as the delay is confined to one column; the delay of the rank r + 1 output is called the horizontal propagation delay, as it passes from one column to the next. The use of compressors in favor of counters shifts some of the vertical propagation delay into horizontal propagation delay. Thus, the critical path through a compressor tree travels in both the horizontal and vertical direction before arriving at the final CPA. The compressor cells can be designed in order to minimize the difference between horizontal and vertical delays. Interestingly enough, a CPA actually has a higher compression ratio than an m:n counter, a GPC, or a compressor. To take advantage of this fact, Oklobdzija and Villeger [1995] advocate the inclusion of CPAs within compressor trees: the vertical propagation delay will dominate; however, at places where the horizontal propagation delay is noncritical, the use of internal CPAs within the compressor tree maximizes the compression ratio. This technique has some notable ramifications for FPGAs: due to the presence of carry chains within logic clusters (see Sections 3.3 and 3.4), horizontal propagation is naturally faster than vertical propagation, which must use the FPGA routing
11 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 11 network. This differentiates compressor tree synthesis on FPGAs from the same problem in VLSI. The challenge is that we cannot take advantage of the fast horizontal propagation on the critical paths of the compressor tree without resorting to CPAs. To address this concern, we design and evaluate new logic blocks that can be configured as 6:2 and 7:2 compressors. These logic blocks have a higher compression ratio than m:n counters and GPCs, and employ new carry chains that can exploit fast horizontal propagation. 3.2 FPGA Architecture This section describes a number of proposals to improve the arithmetic and logical capabilities of FPGA logic cells. The most enduring idea has been the integration of carry chains into FPGA logic cells along with LUTs. Carry chains include fast connections between adjacent logic cells that are used for carry propagation; this permits the elimination of most of the routing delays that would otherwise be present. The Altera Stratix II-IV Adaptive Logic Module (ALM) employs a carry chain based on ripple-carry adders (RCAs). The new logic cells proposed in this work features a new type of carry chain intended to allow a logic cell, such as the ALM, to be configured as a 6:2 or 7:2 compressor; the ALM will be described in greater detail in Section 3.3. The carry chains used in the configurable logic blocks (CLBs) of the Xilinx Virtex-4/5 include programmable multiplexors and xor gates to send propagate and generate signals to adjacent CLBs to enable parallel-prefix style addition [Parhami 1999]. Hauck et al. [2000] proposed more complicated carry chains that can implement Brent-Kung, carry-select, andcarry-lookahead addition. Different logical constructs were needed for different cells in the chain, making them nonuniform. This creates integration challenges because it is difficult to lay out a regular fabric consisting of irregular cells. This would require a large manual effort to design each individual cell at the transistor level, and would complicate the layout process for the entire chip. Frederick and Somani [2006] proposed a uniform logic block with carry chains that could efficiently implement a carry-skip adder; a similar bidirectional carry-skip chain was earlier proposed by Cherepacha and Lewis [1996, Figure 6]. Kaviani et al. [1998] and Leijten-Nowak and Van Meerbergen [2003] developed ALU-like blocks that support arithmetic functions such as addition, subtraction and (partial) multiplication. Distributed Arithmetic (DA) [Mirzaei et al. 2006] is a paradigm for implementing effective hardware for DSP systems that uses LUTs instead of multipliers. Grover et al. [2002] developed a special DA-oriented LUT structure (DALUT) specifically for multiply-accumulate (MAC) operations. In addition to two 4-input LUTs, their DALUT cell included arrays of xor gates, bit-level adders and shift accumulators, shift registers, and a CPA to add partial summations and carries. Brisk et al. [2007] reported that DSP/MAC blocks are not good candidates for implementing multioperand addition. The logic cell described here is intended to address this shortcoming.
12 19: 12 H. Parandeh-Afshar et al. Most FPGAs are hybrid-reconfigurable, as they embed ASIC components such as multipliers, more complex DSP blocks, and standard I/O interfaces into a reconfigurable fabric Zuchowski et al. [2002]. Kastner et al. [2002] developed techniques for a compiler to examine a set of applications to identify good candidates for these embedded cores. Their analysis, however, was limited to 2-operation combinations of addition and multiplication, and they did not use compressor trees for multioperand addition. A K-input macro gate [Cong and Huang 2005] is similar to a LUT, but it cannot implement all 2 K logic functions, and therefore has reduced delay and area. Hu et al. [2007] suggested that FPGA cells could benefit from the inclusion of both LUTs and macro gates. Similar to Kastner et al., they developed an automated method to profile a set of applications to find good macro-gate candidates. They did not, however, consider arithmetic-dominated functions or fast carry chains between macro gates. The Field Programmable Counter Array (FPCA) [Brisk et al. 2007; Cevrero et al. 2008] is a programmable IP used to accelerate multi-input addition in FPGAs. The FPCA is similar to an FPGA, but replaces LUTs with m:n counters instead. In a hybrid FPGA/FPCA, a compressor tree is mapped onto the FPCA, while all other operations are mapped onto the FPGA. As suggested by Kuon and Rose [2007], the cost of routing data to and from the FPCA may limit its performance benefit. The new FPGA cell proposed here is much less ambitious, and exploits carry chains rather than logical structures for effective local routing; furthermore, the I/O interface to the logic cell does not change. 3.3 The Altera Stratix II-IV Adaptive Logic Module (ALM) This new logic cell proposed in this article is a modified version of the Adaptive Logic Module (ALM) employed the Altera Stratix II-IV series of FPGAs. Each ALM contains an Adaptive LUT (ALUT). An ALUT is comprised of two sixinput LUTs (6-LUTs) with four shared inputs and shared configuration bits; in other words, they must implement the same logic function. Additionally, the ALM contains a carry chain that performs efficient ripple carry addition, and bypassable flip-flops that facilitate either combinational or sequential circuits. The two 6-LUTs are also fracturable, meaning that each can be decomposed into two or more smaller LUTs. The ALM also includes a 7 th input bit, but can only implement a selected set of 7-input functions. The ALM has four operating modes, two of which use the carry chains. In Arithmetic Mode, each 6-LUT is decomposed into two independent 4-LUTs, which perform a small amount of pre-adder logic, followed by the carry chains. Arithmetic mode implements effective adders, (sequential) counters, accumulators, parity functions, and comparators. In Shared Arithmetic Mode, the ALM is configured as a 2-bit ternary adder. The fracturable LUTs are configured as a carry-save adder (CSA), that is, a 3:2 compressor, and the carry chain functions as the final adder. Shared arithmetic mode was designed to efficiently implement soft multipliers (as opposed to using DSP blocks) and correlators.
13 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 13 Fig. 7. Mode. The Altera Stratix II/III Adaptive Logic Module (ALM) shown in Shared Arithmetic Figure 7 illustrates the ALM configured in shared arithmetic mode. It is important to note that the 6-LUTs in the ALM are decomposed into smaller LUTs of 3- and4-inputs; only the smaller LUTs are shown in Figure 7. The modification to the ALM proposed in this article is similar to shared arithmetic mode, but implements a 6:2 or 7:2 compressor. Similar to shared arithmetic mode, the fracturable LUTs are configured as a CSA; but the interconnection of FAs in the carry chain differs from the ripple-carry chain. We chose to provide a second carry chain in addition to the ripple-carry chain; in principle, both carry chains could be merged, but this would introduce multiplexers into the ripple-carry chain. We opted for the second carry chain in order to achieve better performance. 3.4 Synthesizing Compressor Trees on FPGAs The compressor tree synthesis techniques summarized in Section 3.1 are intended for ASIC design flows. Due to the specific logic and routing architectures of modern high performance FPGAs, these techniques are not likely to yield favorable results if used in a synthesis flow targeting an FPGA. Since the primary role of carry chains has been to facilitate efficient carry-propagate addition, conventional wisdom held that adder trees would yield better results than compressor trees synthesized on an FPGA. This is not necessarily true. Poldre and Tammemae [1999] synthesized 4:2 compressors onto the four input LUTs of the Xilinx Virtex FPGAs, exploiting the carry chains to propagate the carry-in/carry-out bits. Parandeh-Afshar et al. [2008b, 2008c] developed a general compressor tree synthesis method that mapped GPCs with 6 inputs and 3 or 4 outputs onto FPGA logic cells built from 6-LUTs. Limiting the number of GPC inputs to 6 ensuresthatatmostonelayerofluts is required to implement each GPC. On an Altera Stratix II, the delay of a compressor tree built from GPCs was 27% faster than that that of an
14 19: 14 H. Parandeh-Afshar et al. Fig. 8. (a) (0, 6; 3) and (b) (2, 3; 3) GPCs mapped onto two ALMs using shared arithmetic mode. adder tree. The GPC mapping, however, increased the ALM count by 47%, on average. In principle, a 6-input, k-output GPC could be synthesized on k6-luts, where each 6-LUT computes a single output bit. As the two 6-LUTs in a Stratix II-IV ALM must implement the same function, this would require k ALMs, where only one of the two 6-LUTs available in each ALM is used. Parandeh- Afshar et al. [2009], however, proposed a more efficient mapping that uses LUTs in conjunction with carry chains, reducing the number of ALMs required to k/2. In many cases, it is possible to map these components onto ALMs using either arithmetic or shared arithmetic mode. Figures 8 and 9, for example, shows three 6-input, 3-output GPCs mapped onto two ALMs using shared arithmetic mode. In fact, these are the only three
15 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 15 Fig. 9. (a) a (1, 5; 3) GPC implemented using full and half adders, and (b) mapped onto two ALMs using shared arithmetic mode. The internal signals S0, C0, S1, C1, and C2 in (a) are computed by LUTs in (b). Signals C2 and D are never 1 at the same time, so the carry output of the adder that produces output bit z2 is always 0. 6-input, 3-output GPCs that will be used by the GPC mapping heuristic, which is described in Section 5.3. Specifically, these are the only 6-input, 3-output covering GPCs; the definition of a covering GPC will be formalized in Section 5.1. The GPC mapping heuristic only employs covering GPCs; all other GPCs are either redundant or unreasonable, for reasons that will be discussed in Section NEW FPGA LOGIC CELL AND CARRY CHAIN Figure 10(a) shows our proposed new FPGA logic cell, which is presented as an extension of the ALM used in Altera s Stratix II-IV line of high-performance FPGAs. The components required for shared arithmetic mode are also shown in this figure. The left-hand side of Figure 10(a) shows four 3-LUTs, which are part of Altera s fracturable 6-LUT architecture. The carry chain on the right-hand-side is the traditional carry chain that is used to implement ternary
16 19: 16 H. Parandeh-Afshar et al. Fig. 10. (a) Enhanced version of the Shared Arithmetic Mode of the Altera ALM; a new carry chain, shown in gray, allows the ALM to be configured as a 6:2 or 7:2 compressor. Two additional multiplexers are required to select between the two sum outputs of the 6:2 compressor and ternary adder (already present in the ALM); (b) pattern of carry-propagation for the 6:2 and 7:2 compressor. addition, using the four 3-LUTs configured as a carry-save adder. The novel features of the new logic cell are the carry chain in the center (gray background), which can implement a 6:2 or 7:2 compressor, and the two multiplexers shown in gray on the right-hand side of Figure 10(a), which selects between the outputs of the two carry chains. Similar to ternary addition, the new carry chain requires the four 3-LUTs to be configured as a carry-save adder. To implement a 7:2 compressor, three FAs (and a 7 th LUT input) are required; to implement a 6:2 compressor, one of the FAs (outlined with a dashed line) becomes a half (two-input) adder, and the 7 th input bit is not used.
17 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 17 Three carry-in/carry-out bits are also required; they are labeled X, Y, and Z in Figure 10(a). The carry-out labeled X /Y/Z connects to the corresponding carry-in labeled X /Y/Z of the next compressor in the chain. A detailed picture of the carry chains across several logic cells is shown in Figure 10(b). In principle, the FAs used in the two carry chains could be shared; this design choice was illustrated by Parandeh-Afshar et al. [2008a, Figure 5(a)]; although doing this could slightly reduce area, it requires that multiplexers be inserted into the carry chain, significantly increasing the critical path delay; as our goal is to increase performance, this design point is nonideal, especially since the area of the multiplexers offsets the area savings from sharing FAs. There are two primary advantages of providing an FPGA logic cell that can be configured as a compressor compared to synthesizing GPCs on LUTs. The first advantage, which was illustrated in Figure 6, is that a k:2 compressor will have a higher compression ratio than a k input GPC. In some but certainly not all cases, this can reduce the number of levels of logic in the compressor tree. The second advantage involves area utilization. Each ALM contains two 6-LUTs with dependent inputs. A GPC with six inputs and three outputs, including a 6:3 counter, requires two ALMs, while only one of our proposed logic cells, which is marginally larger than an ALM, is required to realize a 6:2 compressor. Reducing the number of logic cells, moreover, may allow for a tighter placement of logic cells on the device, which, in turn, reduces wirelength and routing delay; our experiments confirm this hypothesis. Using similar reasoning, the use of 7:2 rather than 6:2 compressors further increases the compression ratio, and may also reduce the number of logic cells required since each cell can consume an additional bit. Consider the i th compressor in the chain. Carry-in bits c in,0 and c in,1 are driven by the the rank 1 carry-out of the (i 1) st compressor and the rank 2 carry-out of the (i 2) nd compressor, respectively; likewise, the rank 1 and carry-out of the i th compressor drives carry-in, c in,0,ofthe(i + 1) st compressor, and the rank 2 carry-out drives carry-in, c in,1,ofthe(i + 2) nd. When an ALM is configured as a two-bit ternary adder in shared arithmetic mode, six input bits are used, so no modifications to the I/O interface arerequiredtoimplementa6:2 compressor. The 7:2 compressor, in contrast, requires an extra input bit. This is not a problem, as the ALM contains eight architecturally visible inputs; either of the two remaining inputs can be used as the seventh input when the ALM is configured as a 7:2 compressor. 5. COMPRESSOR TREE SYNTHESIS ON THE NEW LOGIC CELL This section describes a mapping heuristic that can synthesize compressor trees targeting the logic cell shown in Figure 10(a). This heuristic is an extension of an earlier one proposed by Parandeh-Afshar et al. [2008b], which targeted the Altera Stratix II FPGA. Compressor trees synthesized using an ASIC design flow produce two outputs that are summed using a CPA. Since ternary CPAs are available in Stratix II for the same delay and area as binary CPAs, the heuristic outputs compressor trees that produce three outputs instead of two. The remainder of the
18 19: 18 H. Parandeh-Afshar et al. compressor tree is synthesized using GPCs. The number of outputs per GPC was limited to four, ensuring that each GPC can be implemented using at most four 6-LUTs (or fewer, if shared arithmetic mode can be exploited). This section extends the mapping heuristic to include the possibility of configuring the logic cells as 6:2 or 7:2 compressors as well. 5.1 GPC Classification By convention, we require that a GPC must have at least 2-input bits. For example, (0,1;1)and (1,0;2)are not GPCs. Some GPCs are considered unreasonable by the heuristic because they can always be replaced with another more sensible choice. GPCs, such as (3, 1; 3), have one rank 0 input bit, which is always passed directly to the least significant output bit, that is, the value of the input bit determines whether the output is odd/even; such a GPC is considered to be unreasonable. Another class of unreasonable GPCs are those for which the number of input bits is less than or equal to the number of output bits, for example, (2, 1; 3); thesegpcsare unreasonable because they do not perform any compression. A third class of unreasonable GPCs are those that have no rank 0 input bits, for example, (2, 0; 3). Inthiscase,therank 1 input bits could be converted to rank 0 input bits of a smaller counter that produces fewer output bits, for example, (0,2;2). A primitive GPC is one that satisfies input/output constraints of M and N and is reasonable. In theory, the number of primitive GPCs is exponential in M and N; limiting M and N to small constant values ensures tractability. With N output bits, the sum, where input bits are weighted by rank, of the input bits cannot exceed 2 N 1; this ensures that the number of primitive GPCs is finite. A covering GPC is a primitive GPC whose functionality cannot be implemented by another primitive GPC. For example, a (2, 3; 3) GPC can implement a (1, 3; 3) GPC by setting one rank 1 input bit to zero. For example, there are just three covering GPCs having six inputs and three outputs: (0, 6; 3), (1,5;3),and(2, 3; 3) (see Figures 8 and 9). All other GPCs satisfying these I/O constraints are either unreasonable, for example, (3, 1; 3), or can be covered by one of the three covering GPCs already listed. 5.2 GPC Library Construction The mapping heuristic uses a library of GPCs having at most M inputs and N outputs. This library is computed once for each target FPGA and stored in a text file. The library is read from the text file each time a set of compressor trees are synthesized. First, the primitive GPCs are enumerated and added to the library. Second, the set of covering GPCs are identified and marked as such. Third, the primitive GPCs are sorted in nondecreasing order of compression ratio. Each set of primitive GPCs having the same compression ratio is sorted in nondecreasing order of the number of inputs. The total ordering of primitive
19 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 19 GPCs favors a high compression ratio as the first criterion and the number of bits consumed as a second. Parandeh-Afshar et al. [2008b] used M = 6 and N = 4 to target the Altera Stratix II FPGA. Limiting the number of inputs to M = 6 ensures that only one layer of ALMs is required to implement the counter, regardless of whether the GPC is synthesized on LUTs or uses shared arithmetic mode, that is, Figures 8 and 9. Limiting the number of outputs to N = 4 ensures that at most four ALMs are required for each GPC, under the worst case assumption that each output bit is computed using a 6-LUT; fewer ALMs are required when shared arithmetic mode can be used [Parandeh-Afshar et al. 2009]. The mapping heuristic, described in the following section, converts chains of consecutive (0, 6; 3) GPCs (6:3 counters) into 6:2 compressors, whenever possible. Unfortunately, this approach cannot be used for 7:2 compressors, as M = 6 prevents 7-input GPCs from inclusion in the library. To support 7:2 compressors, a (0, 7; 3) GPC is added to the library, but no other 7-input GPCs are included. Chains of (0, 7; 3) GPCs are converted to 7:2 compressors; when a 7-input GPC is not contained in a chain, it is converted to GPCs with at most 6 inputs, as described in the following section. 5.3 Mapping Heuristic The input to the mapping heuristic is: (1) an ordered array of integers, k i, where the i th integer is the number of bits of rank i to sum, e.g., k 0 bits of rank 0, k 1 bits of rank 1, etc.; (2) a library of GPCs, as described in the preceding section; and (3) a flag called mode which takes one of three values, ALM, 6:2, or 7:2. Ifmode = ALM, then we are targeting an FPGA containing traditional ALMs that cannot be configured as 6:2 or 7:2 compressors; if mode = 6:2 or 7:2, then we are targeting an FPGA whose logic cells can be configured as a 6:2 or 7:2 compressor, for example, Figure 10(a). The mapping heuristic generates one level of the compressor tree at a time. A subset of the input bits is covered by GPCs and possibly 6:2 or 7:2 compressors. The output bits produced by each GPC are propagated to the next level of the compressor tree, along with the bits from the current level that are not covered. Since the rank of each GPC output bit is known, a new set of columns (array of integers) is generated for each level of the tree. Pseudocode for the mapping heuristic is shown in Figure 11. A new level in the tree is generated until there are at most three rows of bits remaining, that is, each column of the next level has at most three input bits. A ternary CPA completes the tree. The remainder of this section focuses on the process of producing one level of the compressor tree, that is, how to cover a set of columns with GPCs. The following process is applied until no remaining (primitive) GPCs can cover any bits in the current level of the tree. The column having the most noncovered input bits in the current level is always selected; ties are broken arbitrarily. Selecting the column with the largest number of bits tends to favor the use of GPCs with higher compression ratios and a large number of input bits. To find the best GPC for the selected
20 19: 20 H. Parandeh-Afshar et al. Fig. 11. Pseudocode for GPC mapping heuristic [Parandeh-Afshar et al. 2008a] with extensions to exploit 6:2 and 7:2 compressors, where appropriate. column, the set of primitive GPCs is searched. The first GPC that fits the base columns and its following or previous columns is selected. If mode = 7:2 and the column contains at least seven input bits, then a (0, 7; 3) GPC is always used, and a (0, 6; 3) GPCisalwaysusedifthecolumn contains six input bits. If mode = 6:2 or ALM and the column contains at least six input bits, then a (0,6;3)GPC is always used. Otherwise, the selected column contains fewer bits than the maximum input bandwidth of the largest GPC in the library; in this case, GPCs that cover bits from columns that are immediately adjacent to the selected column can be used as well. A forward search looks for a GPC under the assumption that the bits in the selected column will have rank 0. If the selected column is c, then the forward search will attempt to include bits from columns c + 1, c + 2,..., etc. A backward search assumes that the bits in the selected column will be of the
21 An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor 19: 21 Fig. 12. Illustration of the forward (a) and backward (b) search using GPC mapping. highest rank in the GPC that covers them. If the selected column is c, then the backward search will attempt to include bits from columns c 1, c 2,..., etc. In both searches, the first GPC that fits the selected column and the additional columns are selected. Among these two GPCs, the one with the highest priority, according to the sorted order, is selected. The forward and backward searches are particularly useful when the distribution of column heights is asymmetric. This occurs quite frequently for constant multipliers, including FIR filters. Figure 12 illustrates the forward and backward search. In Figure 12(a), a forward search finds a (1, 4; 3) GPC while the backward search in Figure 12(b) finds a (4, 3; 4) GPC. Since the compression ratios are 5/3 = 1.67 and 7/4 = 1.75, respectively, the GPC found by the backward search is selected. After selecting a column and a GPC, the bits that have been selected are removed from the current set of columns. The output bits produced by the GPC are added to the set of columns for the next level in the tree. This process repeats a column and GPC are selected until either all bits at the current level have been covered, or no primitive GPC in the list can cover more than a single bit. Once the current level is completely covered, the heuristic attempts toreplacesomegpcswith6:2 or 7:2 compressors. If mode = 6:2, each contiguous sequence of (0,6;3)GPCs is replaced with a contiguous sequence of 6:2 compressors, similar in principle to Figure 6. Note that this transformation reduces the number of bits in the following level; aggregated over several levels, the use of compressors rather than counters can reduce the total number of logic levels in the compressor tree. If mode = 7:2, then each contiguous sequence of (0, 7; 3) GPCs is replaced with a sequence of 7:2 compressors, just similar to what was done for 6:2 compressors. Each remaining (0,7;3)GPC is replaced by a (0, 6; 3) GPC and one unmapped bit that is propagated to the next level of the tree. The reason for doing this is that (0, 7; 3) GPCs do not map efficiently onto ALMs, so we replace them with a more favorable component. Next, the current level of the compressor tree is mapped onto logic cells. GPCs are mapped onto ALMs, while 6:2 and 7:2 compressors require the logic cell to be configured to use the carry chain shown in Figure 10(a). Additionally, the outputs of the GPCs and compressors from the preceding level of the compressor tree are connected to the inputs of the GPCs and compressors in the current level. The last step is to generate the columns for the next level of the compressor tree.
A Novel FPGA Logic Block for Improved Arithmetic Performance
A Novel FPGA Logic Block for Improved Arithmetic Performance Hadi Parandeh-Afshar Philip Brisk Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) School of Computer and Communication Sciences
More informationHighly Versatile DSP Blocks for Improved FPGA Arithmetic Performance
2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance Hadi Parandeh-Afshar and Paolo Ienne Ecole
More informationArchitectural Improvements for Field Programmable Counter Arrays: Enabling Efficient Synthesis of Fast Compressor Trees on FPGAs
Architectural Improvements for Field Programmable Counter Arrays: Enabling Efficient Synthesis of Fast Compressor Trees on FPGAs Alessandro Cevrero,2, Panagiotis Athanasopoulos,2, Hadi Parandeh-Afshar
More informationENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER
ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER 1 ZUBER M. PATEL 1 S V National Institute of Technology, Surat, Gujarat, Inida E-mail: zuber_patel@rediffmail.com Abstract- This paper presents
More informationCHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES
69 CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES 4.1 INTRODUCTION Multiplication is one of the basic functions used in digital signal processing. It requires more
More informationAn Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors
An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors T.N.Priyatharshne Prof. L. Raja, M.E, (Ph.D) A. Vinodhini ME VLSI DESIGN Professor, ECE DEPT ME VLSI DESIGN
More informationA New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology
Inf. Sci. Lett. 2, No. 3, 159-164 (2013) 159 Information Sciences Letters An International Journal http://dx.doi.org/10.12785/isl/020305 A New network multiplier using modified high order encoder and optimized
More informationSIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS 1 T.Thomas Leonid, 2 M.Mary Grace Neela, and 3 Jose Anand
More informationHigh Speed Binary Counters Based on Wallace Tree Multiplier in VHDL
High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL E.Sangeetha 1 ASP and D.Tharaliga 2 Department of Electronics and Communication Engineering, Tagore College of Engineering and Technology,
More informationIJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN
An efficient add multiplier operator design using modified Booth recoder 1 I.K.RAMANI, 2 V L N PHANI PONNAPALLI 2 Assistant Professor 1,2 PYDAH COLLEGE OF ENGINEERING & TECHNOLOGY, Visakhapatnam,AP, India.
More informationA Novel Approach For Designing A Low Power Parallel Prefix Adders
A Novel Approach For Designing A Low Power Parallel Prefix Adders R.Chaitanyakumar M Tech student, Pragati Engineering College, Surampalem (A.P, IND). P.Sunitha Assistant Professor, Dept.of ECE Pragati
More informationCHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES
44 CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES 3.1 INTRODUCTION The design of high-speed and low-power VLSI architectures needs efficient arithmetic processing units,
More informationDesign of an optimized multiplier based on approximation logic
ISSN:2348-2079 Volume-6 Issue-1 International Journal of Intellectual Advancements and Research in Engineering Computations Design of an optimized multiplier based on approximation logic Dhivya Bharathi
More informationISSN Vol.07,Issue.08, July-2015, Pages:
ISSN 2348 2370 Vol.07,Issue.08, July-2015, Pages:1397-1402 www.ijatir.org Implementation of 64-Bit Modified Wallace MAC Based On Multi-Operand Adders MIDDE SHEKAR 1, M. SWETHA 2 1 PG Scholar, Siddartha
More informationDigital Integrated CircuitDesign
Digital Integrated CircuitDesign Lecture 13 Building Blocks (Multipliers) Register Adder Shift Register Adib Abrishamifar EE Department IUST Acknowledgement This lecture note has been summarized and categorized
More informationAn Optimized Design for Parallel MAC based on Radix-4 MBA
An Optimized Design for Parallel MAC based on Radix-4 MBA R.M.N.M.Varaprasad, M.Satyanarayana Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India Abstract In this paper a novel architecture
More informationHigh Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree
High Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree Alfiya V M, Meera Thampy Student, Dept. of ECE, Sree Narayana Gurukulam College of Engineering, Kadayiruppu, Ernakulam,
More informationA NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS
G RAMESH et al, Volume 2, Issue 7, PP:, SEPTEMBER 2014. A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS G.Ramesh 1*, K.Naga Lakshmi 2* 1. II. M.Tech (VLSI), Dept of ECE, AM Reddy Memorial College
More informationHigh-speed Multiplier Design Using Multi-Operand Multipliers
Volume 1, Issue, April 01 www.ijcsn.org ISSN 77-50 High-speed Multiplier Design Using Multi-Operand Multipliers 1,Mohammad Reza Reshadi Nezhad, 3 Kaivan Navi 1 Department of Electrical and Computer engineering,
More informationHigh Speed Vedic Multiplier Designs Using Novel Carry Select Adder
High Speed Vedic Multiplier Designs Using Novel Carry Select Adder 1 chintakrindi Saikumar & 2 sk.sahir 1 (M.Tech) VLSI, Dept. of ECE Priyadarshini Institute of Technology & Management 2 Associate Professor,
More informationOn Built-In Self-Test for Adders
On Built-In Self-Test for s Mary D. Pulukuri and Charles E. Stroud Dept. of Electrical and Computer Engineering, Auburn University, Alabama Abstract - We evaluate some previously proposed test approaches
More informationDesign of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing
Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing Yelle Harika M.Tech, Joginpally B.R.Engineering College. P.N.V.M.Sastry M.S(ECE)(A.U), M.Tech(ECE), (Ph.D)ECE(JNTUH), PG DIP
More informationVHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic
VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic DAVID NEUHÄUSER Friedrich Schiller University Department of Computer Science D-07737 Jena GERMANY dn@c3e.de
More informationJDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS
JDT-002-2013 EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS E. Prakash 1, R. Raju 2, Dr.R. Varatharajan 3 1 PG Student, Department of Electronics and Communication Engineeering
More informationExploring New Architectures for Recongurable Hardware
Swiss Federal Institute of Technology Lausanne Microelectronic Systems Laboratory Exploring New Architectures for Recongurable Hardware Master Diploma Work Student: Alessandro Cevrero Project Supervisors:
More informationA Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools
A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools K.Sravya [1] M.Tech, VLSID Shri Vishnu Engineering College for Women, Bhimavaram, West
More informationImplementation and Performance Evaluation of Prefix Adders uing FPGAs
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) ISSN: 2319 4200, ISBN No. : 2319 4197 Volume 1, Issue 1 (Sep-Oct. 2012), PP 51-57 Implementation and Performance Evaluation of Prefix Adders uing
More informationAREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER
American Journal of Applied Sciences 11 (2): 180-188, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.180.188 Published Online 11 (2) 2014 (http://www.thescipub.com/ajas.toc) AREA
More informationDesign A Redundant Binary Multiplier Using Dual Logic Level Technique
Design A Redundant Binary Multiplier Using Dual Logic Level Technique Sreenivasa Rao Assistant Professor, Department of ECE, Santhiram Engineering College, Nandyala, A.P. Jayanthi M.Tech Scholar in VLSI,
More informationIJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN
High throughput Modified Wallace MAC based on Multi operand Adders : 1 Menda Jaganmohanarao, 2 Arikathota Udaykumar 1 Student, 2 Assistant Professor 1,2 Sri Vekateswara College of Engineering and Technology,
More informationFaster and Low Power Twin Precision Multiplier
Faster and Low Twin Precision V. Sreedeep, B. Ramkumar and Harish M Kittur Abstract- In this work faster unsigned multiplication has been achieved by using a combination High Performance Multiplication
More informationLecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.
Introduction EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Techniques Cristian Grecu grecuc@ece.ubc.ca Course web site: http://courses.ece.ubc.ca/353/ What have you learned so far?
More informationA Review on Different Multiplier Techniques
A Review on Different Multiplier Techniques B.Sudharani Research Scholar, Department of ECE S.V.U.College of Engineering Sri Venkateswara University Tirupati, Andhra Pradesh, India Dr.G.Sreenivasulu Professor
More informationPerformance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL
Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL E.Deepthi, V.M.Rani, O.Manasa Abstract: This paper presents a performance analysis of carrylook-ahead-adder and carry
More informationAdder (electronics) - Wikipedia, the free encyclopedia
Page 1 of 7 Adder (electronics) From Wikipedia, the free encyclopedia (Redirected from Full adder) In electronics, an adder or summer is a digital circuit that performs addition of numbers. In many computers
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 Project Background High speed multiplication is another critical function in a range of very large scale integration (VLSI) applications. Multiplications are expensive and slow
More informationDesign and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm
Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm Vijay Dhar Maurya 1, Imran Ullah Khan 2 1 M.Tech Scholar, 2 Associate Professor (J), Department of
More informationA New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm
A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm V.Sandeep Kumar Assistant Professor, Indur Institute Of Engineering & Technology,Siddipet
More informationAn Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products
21st International Conference on VLSI Design An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products Sabyasachi Das Synplicity Inc Sunnyvale, CA, USA Email: sabya@synplicity.com
More informationImplementing Logic with the Embedded Array
Implementing Logic with the Embedded Array in FLEX 10K Devices May 2001, ver. 2.1 Product Information Bulletin 21 Introduction Altera s FLEX 10K devices are the first programmable logic devices (PLDs)
More informationDesign and Implementation of High Speed Carry Select Adder
Design and Implementation of High Speed Carry Select Adder P.Prashanti Digital Systems Engineering (M.E) ECE Department University College of Engineering Osmania University, Hyderabad, Andhra Pradesh -500
More informationReduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter
Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter Dr.N.C.sendhilkumar, Assistant Professor Department of Electronics and Communication Engineering Sri
More informationDesign of a Power Optimal Reversible FIR Filter for Speech Signal Processing
2015 International Conference on Computer Communication and Informatics (ICCCI -2015), Jan. 08 10, 2015, Coimbatore, INDIA Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing S.Padmapriya
More informationImplementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST
ǁ Volume 02 - Issue 01 ǁ January 2017 ǁ PP. 06-14 Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST Ms. Deepali P. Sukhdeve Assistant Professor Department
More informationMultiplier Design and Performance Estimation with Distributed Arithmetic Algorithm
Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm M. Suhasini, K. Prabhu Kumar & P. Srinivas Department of Electronics & Comm. Engineering, Nimra College of Engineering
More informationModified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen
Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen Abstract A new low area-cost FIR filter design is proposed using a modified Booth multiplier based on direct form
More informationArchitectures and Algorithms for Synthesizable Embedded Programmable Logic Cores
Architectures and Algorithms for Synthesizable Embedded Programmable Logic Cores Noha Kafafi, Kimberly Bozman, Steven J.E. Wilton Department of Electrical and Computer Engineering University of British
More informationDesign of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm
Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Vijay Kumar Ch 1, Leelakrishna Muthyala 1, Chitra E 2 1 Research Scholar, VLSI, SRM University, Tamilnadu, India 2 Assistant Professor,
More informationDESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA
International Journal of Advanced Research in Engineering and Technology (IJARET) Volume 10, Issue 1, January February 2019, pp. 88 94, Article ID: IJARET_10_01_009 Available online at http://www.iaeme.com/ijaret/issues.asp?jtype=ijaret&vtype=10&itype=1
More informationAN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER
AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER K. RAMAMOORTHY 1 T. CHELLADURAI 2 V. MANIKANDAN 3 1 Department of Electronics and Communication
More informationDesign and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2
IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 07, 2015 ISSN (online): 2321-0613 Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse
More informationReference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering
FPGA Fabrics Reference Wayne Wolf, FPGA-Based System Design Pearson Education, 2004 CPLD / FPGA CPLD Interconnection of several PLD blocks with Programmable interconnect on a single chip Logic blocks executes
More informationLow-Power Multipliers with Data Wordlength Reduction
Low-Power Multipliers with Data Wordlength Reduction Kyungtae Han, Brian L. Evans, and Earl E. Swartzlander, Jr. Dept. of Electrical and Computer Engineering The University of Texas at Austin Austin, TX
More information2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,
ISSN 2319-8885 Vol.03,Issue.30 October-2014, Pages:5968-5972 www.ijsetr.com Low Power and Area-Efficient Carry Select Adder THANNEERU DHURGARAO 1, P.PRASANNA MURALI KRISHNA 2 1 PG Scholar, Dept of DECS,
More informationComparative Analysis of Various Adders using VHDL
International Journal of Engineering and Technical Research (IJETR) ISSN: 2321-0869, Volume-3, Issue-4, April 2015 Comparative Analysis of Various s using VHDL Komal M. Lineswala, Zalak M. Vyas Abstract
More informationEfficient Implementation of Parallel Prefix Adders Using Verilog HDL
Efficient Implementation of Parallel Prefix Adders Using Verilog HDL D Harish Kumar, MTech Student, Department of ECE, Jawaharlal Nehru Institute Of Technology, Hyderabad. ABSTRACT In Very Large Scale
More informationMahendra Engineering College, Namakkal, Tamilnadu, India.
Implementation of Modified Booth Algorithm for Parallel MAC Stephen 1, Ravikumar. M 2 1 PG Scholar, ME (VLSI DESIGN), 2 Assistant Professor, Department ECE Mahendra Engineering College, Namakkal, Tamilnadu,
More informationPerformance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationImplementation of 256-bit High Speed and Area Efficient Carry Select Adder
Implementation of 5-bit High Speed and Area Efficient Carry Select Adder C. Sudarshan Babu, Dr. P. Ramana Reddy, Dept. of ECE, Jawaharlal Nehru Technological University, Anantapur, AP, India Abstract Implementation
More informationHIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE
HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE R.ARUN SEKAR 1 B.GOPINATH 2 1Department Of Electronics And Communication Engineering, Assistant Professor, SNS College Of Technology,
More informationComputer Arithmetic (2)
Computer Arithmetic () Arithmetic Units How do we carry out,,, in FPGA? How do we perform sin, cos, e, etc? ELEC816/ELEC61 Spring 1 Hayden Kwok-Hay So H. So, Sp1 Lecture 7 - ELEC816/61 Addition Two ve
More informationDesign of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi
International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 105 Design of Baugh Wooley Multiplier with Adaptive Hold Logic M.Kavia, V.Meenakshi Abstract Mostly, the overall
More informationDesign of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique
Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique TALLURI ANUSHA *1, and D.DAYAKAR RAO #2 * Student (Dept of ECE-VLSI), Sree Vahini Institute of Science and Technology,
More informationAn Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog
An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog 1 P.Sanjeeva Krishna Reddy, PG Scholar in VLSI Design, 2 A.M.Guna Sekhar Assoc.Professor 1 appireddigarichaitanya@gmail.com,
More informationA Survey on Power Reduction Techniques in FIR Filter
A Survey on Power Reduction Techniques in FIR Filter 1 Pooja Madhumatke, 2 Shubhangi Borkar, 3 Dinesh Katole 1, 2 Department of Computer Science & Engineering, RTMNU, Nagpur Institute of Technology Nagpur,
More informationHigh performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers
High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers Dharmapuri Ranga Rajini 1 M.Ramana Reddy 2 rangarajini.d@gmail.com 1 ramanareddy055@gmail.com 2 1 PG Scholar, Dept
More informationLow-Power Approximate Unsigned Multipliers with Configurable Error Recovery
SUBMITTED FOR REVIEW 1 Low-Power Approximate Unsigned Multipliers with Configurable Error Recovery Honglan Jiang*, Student Member, IEEE, Cong Liu*, Fabrizio Lombardi, Fellow, IEEE and Jie Han, Senior Member,
More informationPublished by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1
Design Of Low Power Approximate Mirror Adder Sasikala.M 1, Dr.G.K.D.Prasanna Venkatesan 2 ME VLSI student 1, Vice Principal, Professor and Head/ECE 2 PGP college of Engineering and Technology Nammakkal,
More informationJDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER
JDT-003-2013 LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER 1 Geetha.R, II M Tech, 2 Mrs.P.Thamarai, 3 Dr.T.V.Kirankumar 1 Dept of ECE, Bharath Institute of Science and Technology
More informationDesign and Implementation of Complex Multiplier Using Compressors
Design and Implementation of Complex Multiplier Using Compressors Abstract: In this paper, a low-power high speed Complex Multiplier using compressor circuit is proposed for fast digital arithmetic integrated
More informationSPIRO SOLUTIONS PVT LTD
VLSI S.NO PROJECT CODE TITLE YEAR ANALOG AMS(TANNER EDA) 01 ITVL01 20-Mb/s GFSK Modulator Based on 3.6-GHz Hybrid PLL With 3-b DCO Nonlinearity Calibration and Independent Delay Mismatch Control 02 ITVL02
More informationModified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition
Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition Thoka. Babu Rao 1, G. Kishore Kumar 2 1, M. Tech in VLSI & ES, Student at Velagapudi Ramakrishna
More informationDesign of 8-4 and 9-4 Compressors Forhigh Speed Multiplication
American Journal of Applied Sciences 10 (8): 893-900, 2013 ISSN: 1546-9239 2013 R. Marimuthu et al., This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/ajassp.2013.893.900
More informationDesign and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. PP 42-46 www.iosrjournals.org Design and Simulation of Convolution Using Booth Encoded Wallace
More informationInternational Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:
International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages-3529-3538 June-2015 ISSN (e): 2321-7545 Website: http://ijsae.in Efficient Architecture for Radix-2 Booth Multiplication
More informationAnalysis of Parallel Prefix Adders
Analysis of Parallel Prefix Adders T.Sravya M.Tech (VLSI) C.M.R Institute of Technology, Hyderabad. D. Chandra Mohan Assistant Professor C.M.R Institute of Technology, Hyderabad. Dr.M.Gurunadha Babu, M.Tech,
More informationDesign of Efficient Han-Carlson-Adder
Design of Efficient Han-Carlson-Adder S. Sri Katyayani Dept of ECE Narayana Engineering College, Nellore Dr.M.Chandramohan Reddy Dept of ECE Narayana Engineering College, Nellore Murali.K HoD, Dept of
More informationEfficient Multi-Operand Adders in VLSI Technology
Efficient Multi-Operand Adders in VLSI Technology K.Priyanka M.Tech-VLSI, D.Chandra Mohan Assistant Professor, Dr.S.Balaji, M.E, Ph.D Dean, Department of ECE, Abstract: This paper presents different approaches
More informationInternational Journal of Scientific & Engineering Research, Volume 7, Issue 3, March-2016 ISSN
ISSN 2229-5518 159 EFFICIENT AND ENHANCED CARRY SELECT ADDER FOR MULTIPURPOSE APPLICATIONS A.RAMESH Asst. Professor, E.C.E Department, PSCMRCET, Kothapet, Vijayawada, A.P, India. rameshavula99@gmail.com
More informationA MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE
A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE R.Mohanapriya #1, K. Rajesh*² # PG Scholar (VLSI Design), Knowledge Institute of Technology, Salem * Assistant
More informationAREA AND POWER EFFICIENT CARRY SELECT ADDER USING BRENT KUNG ARCHITECTURE
AREA AND POWER EFFICIENT CARRY SELECT ADDER USING BRENT KUNG ARCHITECTURE S.Durgadevi 1, Dr.S.Anbukarupusamy 2, Dr.N.Nandagopal 3 Department of Electronics and Communication Engineering Excel Engineering
More informationEC 1354-Principles of VLSI Design
EC 1354-Principles of VLSI Design UNIT I MOS TRANSISTOR THEORY AND PROCESS TECHNOLOGY PART-A 1. What are the four generations of integrated circuits? 2. Give the advantages of IC. 3. Give the variety of
More informationModified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier
Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier M.Shiva Krushna M.Tech, VLSI Design, Holy Mary Institute of Technology And Science, Hyderabad, T.S,
More informationFPGA Implementation of Wallace Tree Multiplier using CSLA / CLA
FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA Shruti Dixit 1, Praveen Kumar Pandey 2 1 Suresh Gyan Vihar University, Mahaljagtapura, Jaipur, Rajasthan, India 2 Suresh Gyan Vihar University,
More informationNOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA
NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA #1 NANGUNOORI THRIVENI Pursuing M.Tech, #2 P.NARASIMHULU - Associate Professor, SREE CHAITANYA COLLEGE OF ENGINEERING, KARIMNAGAR,
More informationDESIGN OF LOW POWER MULTIPLIERS
DESIGN OF LOW POWER MULTIPLIERS GowthamPavanaskar, RakeshKamath.R, Rashmi, Naveena Guided by: DivyeshDivakar AssistantProfessor EEE department Canaraengineering college, Mangalore Abstract:With advances
More informationAN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION
AN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION K.Mahesh #1, M.Pushpalatha *2 #1 M.Phil.,(Scholar), Padmavani Arts and Science College. *2 Assistant Professor, Padmavani Arts
More informationA Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers
IOSR Journal of Business and Management (IOSR-JBM) e-issn: 2278-487X, p-issn: 2319-7668 PP 43-50 www.iosrjournals.org A Survey on A High Performance Approximate Adder And Two High Performance Approximate
More informationA Compact Design of 8X8 Bit Vedic Multiplier Using Reversible Logic Based Compressor
A Compact Design of 8X8 Bit Vedic Multiplier Using Reversible Logic Based Compressor 1 Viswanath Gowthami, 2 B.Govardhana, 3 Madanna, 1 PG Scholar, Dept of VLSI System Design, Geethanajali college of engineering
More informationReduced Redundant Arithmetic Applied on Low Power Multiply-Accumulate Units
Reduced Redundant Arithmetic Applied on Low Power Multiply-Accumulate Units DAVID NEUHÄUSER Friedrich Schiller University Department of Computer Science D-7737 Jena GERMANY david.neuhaeuser@uni-jena.de
More informationArea Delay Efficient Novel Adder By QCA Technology
Area Delay Efficient Novel Adder By QCA Technology 1 Mohammad Mahad, 2 Manisha Waje 1 Research Student, Department of ETC, G.H.Raisoni College of Engineering, Pune, India 2 Assistant Professor, Department
More informationPUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER CSEA2012 ISSN: ; e-issn:
New BEC Design For Efficient Multiplier NAGESWARARAO CHINTAPANTI, KISHORE.A, SAROJA.BODA, MUNISHANKAR Dept. of Electronics & Communication Engineering, Siddartha Institute of Science And Technology Puttur
More informationArea Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique
Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique G. Sai Krishna Master of Technology VLSI Design, Abstract: In electronics, an adder or summer is digital circuits that
More informationImplementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA
Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA 1. Vijaya kumar vadladi,m. Tech. Student (VLSID), Holy Mary Institute of Technology and Science, Keesara, R.R. Dt. 2.David Solomon Raju.Y,Associate
More information[Devi*, 5(4): April, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY DESIGN OF HIGH SPEED FIR FILTER ON FPGA BY USING MULTIPLEXER ARRAY OPTIMIZATION IN DA-OBC ALGORITHM Palepu Mohan Radha Devi, Vijay
More informationAn Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay
An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay 1. K. Nivetha, PG Scholar, Dept of ECE, Nandha Engineering College, Erode. 2.
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 6a High-Speed Multiplication - I Israel Koren ECE666/Koren Part.6a.1 Speeding Up Multiplication
More informationArea Efficient and Low Power Reconfiurable Fir Filter
50 Area Efficient and Low Power Reconfiurable Fir Filter A. UMASANKAR N.VASUDEVAN N.Kirubanandasarathy Research scholar St.peter s university, ECE, Chennai- 600054, INDIA Dean (Engineering and Technology),
More informationDesign and implementation of Parallel Prefix Adders using FPGAs
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 6, Issue 5 (Jul. - Aug. 2013), PP 41-48 Design and implementation of Parallel Prefix Adders
More informationChapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates
Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates Objectives In this chapter, you will learn about The binary numbering system Boolean logic and gates Building computer circuits
More information