Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance
|
|
- Claire Nelson
- 5 years ago
- Views:
Transcription
1 th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance Hadi Parandeh-Afshar and Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) {hadi.parandehafshar and Abstract Integrating DSP blocks into FPGAs is an effective approach to close the existing gap between FPGAs and ASICs. A much wider range of applications could benefit from DSP blocks if they were more versatile than those currently found in commercial devices. In this paper we propose a novel DSP block which resembles commercially available ones and yet additionally supports a wide variety of multiplier bit widths as well as multi-input addition with negligible overhead. The novel DSP block uses much more efficiently the limited available input/output bandwidth. Experimental results show that on average the area overhead of the novel features added to a base design is a mere 3% with practically no delay penalty. Moreover, for multi-input addition, which is not supported in current DSP blocks, the proposed DSP block is more than 50% faster on average compared to FPGA soft logic. 1. Introduction One fundamental step to bridge the gap between FPGAs and ASICs for arithmetic-dominated circuits is to integrate the DSP blocks that perform commonlyused arithmetic functions into an FPGA. The current trend in FPGA design is to add as many features as possible to the DSP blocks to improve a wider range of applications. The challenging issue is that the new features should not impose a considerable overhead to the original DSP block. The most important feature of the DSP blocks in current FPGAs is multiplication. Most DSP blocks have a fixed bit-width multiplier as the base and other few multiplication bit-widths are formed on top of the base multiplier. For instance, the DSP blocks in Altera Stratix-II only support 9-bit and 36-bit multiplier on top of the base 18-bit multiplier. Moreover, despite the availability of IO bandwidth, when a DSP block is configured as a 36-bit multiplier, no other resources remain for other functionalities. In this paper, first we present a base architecture for the DSP block, which consists of Partial Product Generator (PPG) and Partial Product Reduction (PPR) units. To support additional features such as various multiplication bit-widths as well as multi-input addition, we make the PPG unit reconfigurable and keep the PPR unit almost the same. This way the PPR remains small and fast, which is ideal for multi-input addition. However, the PPG unit can get complicated, since it should provide a different set of inputs for the PPR based on DSP block configuration. We propose a number of techniques, which significantly reduce the costs of making PPG reconfigurable. The paper structure is as follows. Related work and preliminary arithmetic concepts are discussed in sections 2 and 3 respectively. Section 4 presents the architecture of the proposed base DSP block. Sections 5 and 6 describe how new features are added to the base DSP block. Experimental results are presented in section 7 and the final section is conclusion. 2. Related work Two architectures [1,2] have been proposed in the past for FPGAs to improve multi-input addition. In both architectures, an accelerator for carry-save arithmetic is proposed which is intended to be placed in an FPGA as a DSP block. The problem with these two architectures is that they can only implement multi-input addition. To perform multiplication, one can use the FPGA soft-logic for the PPG and the proposed architectures for the PPR. The first drawback is that only a few multipliers can be implemented in each DSP block due to the input bandwidth and resource constraints. Moreover, such multipliers have poor performance and area utilization, since the PPG is generated by the soft logic and general routing network is used to connect PPG and PPR units. In a subsequent work [3] by the same group, the PPG was integrated into the accelerator. This removed the second drawback, but still only two 9-bit multipliers can be implemented in the DSP block, /10 $ IEEE DOI /FCCM
2 while one can fit nine 9-bit multipliers to the DSP block of Stratix-II. There are several academic proposals [4-8] to improve the scientific applications on FPGAs. Such applications require floating-point arithmetic and to implement floating-point arithmetic operations on an FPGA, a large amount of resources are required. Therefore, the primary goal in these proposals is to integrate the floating-point functionality into DSPs, whereas our goal is different and complementary. Our contribution is to propose a base DSP block in which we can add various multiplier bit-widths as well as multi-input addition with small overhead. 3. Arithmetic preliminaries In this section, we explain a number of arithmetic concepts such as Radix-4 Booth multiplication and compressor tree that are used in this paper. Radix-4 Booth [9] multiplication is a standard technique to design smaller and faster multipliers, by recoding the numbers that are multiplied. By Radix-4 Booth recoding, the number of Partial Products (PP) is reduced by half. The basic idea is to take every second column, and multiply by ±1, ±2, or 0, instead of shifting and adding for every column of the multiplier term and multiplying by 1 or 0. To Booth recode the multiplier term, the bits in blocks of three are considered, such that each block overlaps the previous block by one bit. The overlap is necessary so that we know what happened in the last block, as the MSB of the block acts like a sign bit. To compute the sum of two integers, a carrypropagate adder (CPA) [9] such as ripple-carry and carry-select adders is used. To add more numbers, a compressor tree [9] is used. The building block of compressor tree can be either a counter or a compressor. Compressors are, in principal, similar to counters, but in contrast, they have explicit carry inputs and outputs. Each compressor tree may consist of several layers of either counters or compressors. However, no ripple carry propagation occurs within the same layer of compressor trees. Only, a final CPA is required to compute the final sum. 4. Proposed base architecture of DSP The fundamental building block of our base DSP block, similar to DSP blocks of Altera FPGAs, consists of two paired 18-bit multipliers followed by an optional adder stage. The adder unit is mainly used for complex arithmetic multiplication [12]. We proposed to use Radix-4 Booth architecture for the multipliers. Y 2j+1 Y 2j Y 2j-1 Booth Recoder Unit Two Non-Zero Neg Correction Term X i-1 X i 0 MFS Figure 1. Radix-4 Booth PPG unit. Booth recoder is common for the bits of a PP, but each PP bit needs a separate MFS unit. There are two reasons for choosing Radix-4 Booth multiplier. First, by modifying the PPG structure of Radix-4 Booth multiplier and doing some transformations on the sign extension parts of the PPs, we can significantly reduce the costs of adding new multiplier bit-widths to the base architecture. Second, we use the PPR unit of the multipliers in the base DSP block as a compressor tree for implementing multiinput addition. In Radix-4 Booth multiplier, the number of PPs is half compared to that of parallel array multiplier. Therefore, we can exploit faster and smaller compressors to build the compressor tree. This is a key factor in designing compressor trees for multi-input addition. We will provide more details about these advantages of Radix-4 Booth, once we unveiled the complete architecture in subsequent sections. In the following, we give a brief overview of the base DSP architecture as it is needed to understand the modifications that we will made to increase the flexibility. There are a couple of differences from standard Radix-4 architecture in the way that we design the PPG and PPR units. For the PPG, to multiply the multiplicand by 1, 2, or 0, all that is needed is a few multiplexers, which have a delay time that is independent of the size of the inputs. The only complexity relates to negating a 2 s complement number, where a 1 is added to an inverted number. This complexity can be avoided in the PPG, if we move the summation part into the PPR unit. For this purpose, a correction bit corresponding to each PP is added to the PPR. Figure 1 illustrates the PPG unit of the Radix-4 multiplier. For each PP, one Booth recoder is required, while for each bit of PP we need the multiplicand factor selection (MFS) unit shown in this figure. All MFS units get the same set of select signals. PP k 230
3 Partial Product Generator : 2 9 : 2 9 : 2 9 : 2 Figure 2. Proposed design for 9:2 compressor. In Booth multiplication each PP should be sign extended and also should be shifted to the left by two bits. Therefore, the PPR unit of an 18-bit multiplier is 36-bit wide. Moreover, the height of the PPR is nine, since we have nine PPs. We propose to exploit a layer of 9:2 compressors followed by a final CPA for the PPR unit. In this layer, there is a 9:2 compressor per each column. So for an 18-bit multiplier, thirty-six 9:2 compressors are required. Figure 2 shows the proposed circuit level design of 9:2 compressor. All of its inputs, including the carry bits, have the same bit position, rank i. The two outputs also have rank i, but the carry outputs have rank i+1. The delay of 9:2 layer is independent of the layer width, since no ripple carry path exist in the layer. The longest path that a carry can propagate contains 3 cells. Since the compressor layer will be reused for other DSP block configurations, the 9:2 layers of all 18-bit multipliers in the base DSP block are chained, but at the multiplier boundaries, each carry input is set to 0, by a simple AND gate. As explained, besides the PPs, we have a number of correction bits that need to be added to the PPR unit. Each correction bit (Cbit) is aligned with the first bit of the corresponding PP. Due to the shifting of the PPs, there is always a free place in the 9:2 layer for every Cbit, except for the ninth one. To avoid the overhead of exploiting 10:2 compressors instead of 9:2 ones, we merge the ninth Cbit with the first PP as shown in Figure 3. In this figure S is the the sign bit of the PP and C is the correction bit. Since the Cbit is aligned with the 17 th bit of the PP, the MSB bits are modified from that bit position as shown.... S S S P C... ( C P16). S + ( C + P16). S CP. 16 S C P16... Figure 3. Merging the 9 th Cbit with the 1 st PP. MSB bits of the PP from bit 16 are modified. 2 : 2 2 : 2 4 : 2 4 : 2 4 : 2 2 : 2 2 : 2 Figure 4. Building block of proposed DSP block. In addition to 9:2 layer, we need a 4:2 layer to sum the results of the two paired 18-bit multipliers for complex arithmetic multiplication. A 4:2 compressor is similar to the 9:2 compressor, which has fewer number of inputs and carry bits. This layer is added between the 9:2 layer and the final CPA. This layer should be 36-bit wide for each multiplier pair. Figure 4 shows the diagram of the building block in the proposed DSP block. The dashed box represents the PPR unit. 5. Supporting various multiplier bit-widths In this section we will describe how we can reduce the cost of adding new multiplier bit-width to the base DSP block by modifying the PPG structure of Radix-4 Booth multiplier and performing some transformations on the sign extension parts of the PPs. We use the same PPR unit for all multiplier bit-widths and the PPG unit is modified to provide the required inputs for the PPR, based on DSP block configuration. Providing such flexibility in the PPG can make it considerably complex. For instance as shown in Figure 5, when a new bit-width is added to the base DSP, for a certain bit position, we may need to select between a Booth encoded bit or a sign bit. This requires exploiting a multiplexer to choose the right configuration. In this figure, the PPG corresponding to the first PP of an 18- bit multiplier is illustrated on the top. Within the same number of bits, two 9-bit multipliers can fit. In this figure, we can see the first PPG of both 9-bit multiplies below that of the 18-bit multiplier. Since both configurations use the same PPR unit, we need to exploit several multiplexers to choose between these two configurations. Note that in most of the cases, we should select between a sign bit and another bit, which 231
4 Sign Extension 18-bit Multiplier PPG F17 F bit Multiplier PPG 9-bit Multiplier PPG Sign Extension Sign Extension F17 F9 F8 F0 Figure 7. Reducing the constant numbers to one number. + S PPR Figure 5. Overlap between the first PPG of two different multiplier configurations. Since the same PPR is used for both configurations, several multiplexers are required to select between a sign bit and an encoded-bit or other sign bit. can be a Booth-encoded bit or a different sign bit. For the bit positions that the encoded parts of two multiplier configurations overlaps, we need to use one encoder for both, but with some multiplexers at the inputs. In this case, such overlap occurs in the first nine bit and since the encoder inputs are the same, no multiplexer is required. Therefore, one efficient approach to reduce the PPG complexity is to avoid this huge amount of multiplexers that are added for each bit of partial product, when a new multiplier configuration is added. The first step is to eliminate the sign extension parts of the PPs. Here we use a similar technique to the one that is used in Baugh-Wooly multiplier [10]. This technique is illustrated in Figure 6. In this figure, the sign extension part of a PP is first added with +1 and then with 1. As shown, when it is added with +1, the whole sign part is reduced to a single inverted sign bit. S.... S S S S S S Figure 6. Reducing the repetitive sign bits by adding with ± S S S Figure 8. Merging the constant number into first partial product. Now, if this rule is applied to the sign extension parts of N PPs, then we will have N constant numbers and N single inverted sign bits. Now, we can reduce the N constant numbers to only one number, by summing up all the constant numbers, as shown in Figure 7. Since the first bit of this constant number is aligned against the inverted sign bit of the first PP, we can append the constant number to the first PP as shown in Figure 8. The resulted value is then appended to first PP from its sign bit position. With this technique, we replace the sign parts of all PPs with a set of 0s and a single inverted bit, except for the first PP, where there are three sign bits and a number of 0s and 1s. This means that we need to choose between constant bits ( 0 or 1 ), single sign bits (inverted or non-inverted) and the normal Booth encoded bits. To avoid multiplexers, we modified the multiplicand factor selection (MFS) unit of the Radix-4 Booth PPG in Figure 2 and added two extra control signals as shown in Figure 9. Compared to the original design, two control signals, Const and Inv, and two simple two input gates have been added on the selection logic of the last two multiplexers in MFS. When Inv signal is set, the output of the MFS is inverted and when Const bit is set, 0 is selected as the output of the second multiplexer. Table 1 shows the operation modes of the modified PPG based on these two control signals. With the modified MFS, for having a constant output, the Const signal should be set and based on the required value the Inv input is defined. This operation mode, resolves the conflicts of a constant bit ( 0 or 232
5 Table 1. Operation modes of modified Radix-4 Booth PPG. Const Inv Func 0 0 PP k 0 1 PP k ) with the opposite constant bit, a Booth encoded bit and an inverted sign bit. To produce the inverted sign bit, only we need to set the Inv signal. For a normal encoding, the two control bits are tied to 0. In contrast to PPG, the PPR unit does not require any major modification. The compressor tree in the PPR was designed in a way that can be reused for several different bit-widths. For the bit widths smaller than 18, since the number of PPs is less than nine, they can be reduced within a certain number of slices in the 9:2 layer. Nevertheless, for the bigger bit-widths up to 36, we should split the PPs into two groups and reduce each group separately by two disjoint chunks of 9:2 compressors. Then, we need to sum the results of the two chunks. For this part, we use the 4:2 layer of the base DSP. The number of 9:2 slices that are required for a set of PP is obtained from the following equation. In this equation, MulBW represents the multiplier bitwidth. Slice BW = MulBW + 2 Num of PPs For example, in 36-bit multiplier, we have two sets of nine PPs. Therefore, each set requires 54 9:2 slices. In this case, we allocate the first 108 slices of the 9:2 layer to the 36-bit multiplier. Similarly, for a 24-bit multiplier, we need 72 slices. The required numbers of 4:2 slices for 36- and 24-bit multipliers are 54 and 30 respectively. In fact, for both cases, the LSB of the second PP set aligns at the 19 th bit of the first set and the first 18 bits go directly to the final CPA. 6. Supporting multi-input addition It is well known in ASIC design that the right way of implementing multi-input addition is to use compressor trees. In general, using smaller compressors as the building blocks of the compressor tree leads to have faster and more flexible designs. From the multi-input addition perspective, the advantage of using the PPR of Radix-4 Booth multiplier compared to that of parallel array multiplier is that we can use smaller, faster and more flexible Y 2j+1 Y 2j Y 2j-1 Booth Recoder Unit Correction Term Two Non-Zero Neg Const Inv MFS X i-1 X i 0 Figure 9. Modified Radix-4 Booth PPG encoder for resolving the conflicts of PPG parts of various multiplier bit-widths. compressors for the synthesis. The compressor tree in PPR can be used for multi-input addition by bypassing the PPG. This is a missing feature in the DSP blocks of current FPGAs. In such DSP blocks, the PPG cannot be bypassed and the PPR has not been designed for multi-input addition. Assuming that there is no connectivity constraint between the DSP block inputs and the inputs of the compressor unit inside the PPR, we can efficiently map any regular and irregular multi-input addition by the PPR. However, such a connectivity requires a fully populated switch box which is extremely costly. Therefore, the real challenge is to find a way to benefit from the inherent flexibility of the PPR compressor tree with minimum overhead. Our solution for this problem is to define a set of fixed rectangular blocks (rc-block) within the 9:2 layer of the PPR for mapping different patterns. In fact, for different multi-input addition patterns, a different combination of the rc-blocks is used. The advantage of this approach is that no crossbar is required to make a direct connection between the DSP inputs and the PPR inputs. Since each rc-block is placed in a fixed region of the 9:2 layer, there will be a predefined pattern to connect DSP blocks inputs to the PPR for multi-input addition. Table 2 shows the dimensions of each rcblock for a half-dsp with 72 inputs. Here we assume that each half-dsp gets half of the total inputs for multi-input addition. The maximum height of an rcblock is nine since it should fit into the 9:2 layer. The width of an rc-block in this table, defines the number of slices of 9:2 that are used. Figure 10 shows how different DSP inputs are referenced in different rc-blocks. In this figure, the first four rc-blocks in Table 2 are shown. The numbers PP k 233
6 Table 2. Different rectangular blocks that are used for mapping the bits in an adder tree. Block Heigth Width bit 12-bit inside each rc-block specify the DSP input indices. Note that the PPG unit should be bypassed for the indicated bit positions. Therefore, we insert some multiplexers into the PPG unit for this purpose. To minimize the number of multiplexers, the rc-blocks are overlapped maximally as shown in this figure. For a maximum overlap, we align all the rc-blocks to the left for the right half-dsp and to the right for left half- DSP. The reason for such alignment is that we can connect two rc-blocks of each half-dsp and form bigger blocks. For instance, rc-block 0 in the first half can be chained to rc-block 5 in the second half and form a bigger non-rectangular block for covering the input bits; similarly, two identical rc-blocks can be chained and form a wider rectangular block with the same height. Having fixed block sizes with fixed placement removes the need to route the DSP inputs to the compressor unit of PPR through a crossbar Multi-input addition mapping algorithm In this section, the mapping algorithm of multiinput addition is described. Mapping algorithm is an important step, since we have to choose the best combination of the mentioned rc-blocs to efficiently cover the input bits, which almost have irregular shapes. This is an important decision, which can affect both area utilization and performance. We define the compression ratio (CR) parameter for each block and based on that we prioritize them for the mapping. This parameter is defined as follows: W b hi i= 0 CR = 2 W In this equation, W b is the width of the bits that are generated by a block and h i is the number of bits in the i-th column of the block. The higher the CR is the b Figure 10. DSP Block input indices that are connected to each rc-block. Rc-blocks are aligned for maximum input sharing. greater is the overall compression ratio of the rc-block. Blocks with higher CRs tend to compress more. Although, the maximum height of a column for a specific rc-block is limited to the numbers illustrated in Table 2, more bits can be mapped to some columns in certain situations if the heights of other columns do not reach to the expected height. As an example, assume that rc-block-2 covers a set of bits, where all columns have 7 bits except the two right most columns, which only have 5 bits. This means that two of the bits in each of these two columns remain unmapped. Now if we find an input index in other columns outside the rcblock boundary that matches an input index in the short columns, then we have the chance to cover one more bit in the higher column. In this example, we can add two bits to the first column since the indices of the two exterior bits are 0 and 8 and these two inputs are found in the short columns. By using this trick, we can cover more bits by a block, when it is not fully utilized. We call this process block refinement. The proposed mapping algorithm has three major steps: In the first step, the best block for covering a set of input bits is defined. This block can be either a single rc-block or two joined rc-blocks as explained. The best block is the one that has the highest CR after the block refinement. Then, the covered bits are removed from the uncovered set of bits and this step is repeated until we reach to the termination condition. Termination condition is either covering all the bits or not able to find a block with more than 50% covering ratio. The covering ratio for a block is the ratio between the number of bits that are covered and the maximum number of bits that can be covered. The second step of the mapping is to generate the output bits corresponding to the covered bits. If this is the last level of the whole compressor tree and no other DSP block is required for the mapping, the final adder of the DSP has to be used to generate the result, 234
7 otherwise the final adder is skipped and the output will stay in the form of carry and save. In the third step of the mapping, we place the selected blocks in the DSPs and connect the DSP blocks based on the mapping. Each DSP can hold two rc-blocks, one in each half, and the joined rc-blocks are placed in the same DSP where the two halves are chained. 7. Experiments To evaluate the proposed DSP block architecture, we designed a sample base DSP block with 144 inputs and outputs. This is the IO bandwidth of a DSP block in Altera Stratix-II. Since we use 90nm CMOS technology for the DSP block design and we have to estimate the inter DSP net delays for our experiments, Stratix-II was selected as a baseline FPGA. Considering the IO bandwidths, two of the building blocks shown in Figure 5 can be implemented and so we will have four 18-bit multiplier. Therefore, we need a 144-bit wide 9:2 layer. So each half-dsp has two 18- bit multiplier followed by an optional 36-bit 4:2 layer for complex arithmetic multiplication. For our experiments, we add 9/12/24/36 multiplier bit-widths to the base DSP block and we evaluate the overhead that is imposed by each of them. Note that 24-bit multiplier is used for single precision floating-point multiplication. We also measure the overhead of adding multi-input addition feature to the sample DSP block. Based on the available bandwidth and resources, the designed DSP block can implement up to eight 9- bit, six 12-bit, two 24-bit and one 36-bit multipliers. For the 36-bit multiplier, some parts of the second half- DSP are used in conjunction with the complete first half-dsp. However, we can implement two 9-bit or one 12-bit multipliers in the remained part of the second half-dsp. Moreover, we can use the multiinput Table 3. Overhead of adding new features to the base DSP. The delay numbers show the 18- bit multiplier delay in each case. DSP Features Delay Area (ns) (µm 2 ) Base DSP Base DSP + 9-bit Mul Base DSP + 9/12 Mul Base DSP + 9/12/24 Mul Base DSP + 9/12/24/36 Mul Base DSP + 9/12/24/36 Mul and MADD addition feature of the second half at the same time that we have 36-bit multiplier. We modeled the sample DSP in Verilog and used Synopsys Design Compiler with 90nm Artisan standard cell library for the synthesis. The mapping algorithm of the multi-input addition was developed in C++. The methodology that we used for estimating the net delay of DSP inter connections was to replace the DSPs of Stratix-II in the netlist and extract the real net delays by Quartus tool Results Table 3 compares the synthesis results of the proposed base DSP with different features. The delay values show the delay of the 18-bit multiplier in each case and the area is the total area of the DSP with all mentioned features. The interesting point is that the delay overhead of adding new features to the base DSP is almost nothing; while the area overhead of all bitwidths before and after supporting multi-input addition feature is 11% and 13% respectively. This means that on average, less than 3% area overhead is imposed by supporting a new multiplier bit-width and the area overhead of adding multi-input addition is 2%. Table 4 shows the combinational delay of each multiplier in the final DSP with all features included. These numbers can further be improved by inserting some pipeline registers between the layers of the DSP. In Stratix-II, the combinational delays of 9/18/36 bit multipliers are 2.99 ns, 3.17 ns and 4.57 respectively. The delay of 9-bit multiplier of DSP in [3] is 1.71 ns. To evaluate the multi-input addition feature, we compared the synthesis results of the multi-input addition parts of some real arithmetic, multimedia and signal processing applications on Stratix-II FPGA soft logic [11], FPCT [2] and our DSP block. Note that, this feature is not supported by the DSP blocks of current FPGAs. Table 5 shows the delay results. Compared to FPCT, our DSP has a lower delay for the FIR benchmarks. On average, the delay of our DSP is around 4% slower than FPCT, which only does multiinput addition. Compared to the soft-logic, our DSP Table 4. Delay (ns) of each multiplier in the final DSP block. Multiplier BW Our DSP Stratix-II 9-bit bit bit bit bit
8 Table 5. Delays (ns) of multi-input addition benchmarks on different hardwares. Benchmark Soft Logic[11] FPCT [2] Our DSP DCT Motion Es G72x ADPCM Fir Fir Hpoly Average Table 6. Areas of multi-input addition benchmarks on different hardwares. Benchmark Soft Logic FPCT Ours (LAB) (DSP) (DSP) DCT Motion Es G72x ADPCM 3.5 Fir Fir Hpoly has a lower delay for all the benchmarks and on average is 54% faster. Area comparison of these three methods is not that straight forward. Table 6 shows the area of each benchmark in terms of the basic blocks that is used. For the soft logic, the area is in terms of LABs number, for FPCT and our DSP block, the numbers represent the DSPs number. The area of FPCT is µm 2, while ours is less than µm 2 with all mentioned features. 8. Conclusion In this paper, we proposed a base DSP block architecture for FPGAs with the potential of adding various multiplier bit-widths and multi-input addition without a considerable overhead. This was achieved by employing a number of techniques that was used to simplify the reconfigurable PPG unit. We studied the effects of adding these features on top the base architecture and we designed a sample DSP block with 9/12/18/24/36 multiplier bit-widths and multi-input addition. Moreover, the novel DSP block uses much more efficiently the limited available input/output bandwidth. 9. References [1] H. Parandeh-Afshar, P. Brisk, and P. Ienne, Integrating generalized parallel counters into FPGAs for improved arithmetic performance, to appear in in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, [2] A. Cevrero, et al., Field programmable compressor trees: acceleration of multi-input addition of FPGAs, ACM Trans. Reconfigurable Technology and Systems, vol. 2, no. 2, article no. 13, June, [3] H. Parandeh-Afshar, et al., A flexible DSP block to enhance FPGA arithmetic performance, in Proc. of the IEEE International Conference on Field Programmable Technology, FPT 09, pp , [4] M. J. Beauchamp, S. Hauck, K. D. Underwood, and K. S. Hemmert, Architectural modifications to enhance the floating-point performance of FPGAs, IEEE Trans. VLSI, vol. 16, no. 2, pp , Feb [5] Y. J. Chong and S. Parameswaran, Flexible multi-mode embedded floating-point unit for field programmable gate arrays, in Proc. Int. Symp. FPGAs 09, 2009, pp [6] C. H. Ho, et al., Virtual embedded blocks: a methodology for evaluating embedded elements in FPGAs, in Proc. IEEE Symp. Field Programmable Custom Computing Machines 06, 2006, pp [7] P. Jamieson and J. Rose, Architecting hard crossbars on FPGAs and increasing their area-efficiency with shadow clusters, in IEEE Int. Conf. Field Programmable Technology 07, 2007, pp [8] P. Jamieson and J. Rose, Enhancing the area-efficiency of FPGAs with hard circuits using shadow clusters, in IEEE Int. Conf. Field Programmable Technology 06, 2006, pp [9] M.D. Ercegovac and T. Lang, Digital Arithmetic, Morgan Kaufmann Publishers, San Francisco, [10] C. R. Baugh and B. A. Wooley, A two s complement parallel array multiplication algorithm, in IEEE Trans. Computers, vol. C-22, 1973 pp [11] H. Parandeh-Afshar, P. Brisk, and P. Ienne, Exploiting fast carry-chains of FPGAs for designing compressor trees, in Proc. 19 th international conference on Field Programmable Logic and Applications. FPL 09, 2009, pp [12] Aletra Corporation, DSP Blocks in Stratix-III Devices, in Stratix-III Device Handbook, Volume 1. May
SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS 1 T.Thomas Leonid, 2 M.Mary Grace Neela, and 3 Jose Anand
More informationCHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES
69 CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES 4.1 INTRODUCTION Multiplication is one of the basic functions used in digital signal processing. It requires more
More informationHigh performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers
High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers Dharmapuri Ranga Rajini 1 M.Ramana Reddy 2 rangarajini.d@gmail.com 1 ramanareddy055@gmail.com 2 1 PG Scholar, Dept
More informationIJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN
An efficient add multiplier operator design using modified Booth recoder 1 I.K.RAMANI, 2 V L N PHANI PONNAPALLI 2 Assistant Professor 1,2 PYDAH COLLEGE OF ENGINEERING & TECHNOLOGY, Visakhapatnam,AP, India.
More informationAn Optimized Design for Parallel MAC based on Radix-4 MBA
An Optimized Design for Parallel MAC based on Radix-4 MBA R.M.N.M.Varaprasad, M.Satyanarayana Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India Abstract In this paper a novel architecture
More informationENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER
ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER 1 ZUBER M. PATEL 1 S V National Institute of Technology, Surat, Gujarat, Inida E-mail: zuber_patel@rediffmail.com Abstract- This paper presents
More informationA New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology
Inf. Sci. Lett. 2, No. 3, 159-164 (2013) 159 Information Sciences Letters An International Journal http://dx.doi.org/10.12785/isl/020305 A New network multiplier using modified high order encoder and optimized
More informationA Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers
IOSR Journal of Business and Management (IOSR-JBM) e-issn: 2278-487X, p-issn: 2319-7668 PP 43-50 www.iosrjournals.org A Survey on A High Performance Approximate Adder And Two High Performance Approximate
More informationPERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY
PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY JasbirKaur 1, Sumit Kumar 2 Asst. Professor, Department of E & CE, PEC University of Technology, Chandigarh, India 1 P.G. Student,
More informationDesign A Redundant Binary Multiplier Using Dual Logic Level Technique
Design A Redundant Binary Multiplier Using Dual Logic Level Technique Sreenivasa Rao Assistant Professor, Department of ECE, Santhiram Engineering College, Nandyala, A.P. Jayanthi M.Tech Scholar in VLSI,
More informationModified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier
Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier M.Shiva Krushna M.Tech, VLSI Design, Holy Mary Institute of Technology And Science, Hyderabad, T.S,
More informationDigital Integrated CircuitDesign
Digital Integrated CircuitDesign Lecture 13 Building Blocks (Multipliers) Register Adder Shift Register Adib Abrishamifar EE Department IUST Acknowledgement This lecture note has been summarized and categorized
More informationDesign and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm
Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm Vijay Dhar Maurya 1, Imran Ullah Khan 2 1 M.Tech Scholar, 2 Associate Professor (J), Department of
More informationHigh Speed Vedic Multiplier Designs Using Novel Carry Select Adder
High Speed Vedic Multiplier Designs Using Novel Carry Select Adder 1 chintakrindi Saikumar & 2 sk.sahir 1 (M.Tech) VLSI, Dept. of ECE Priyadarshini Institute of Technology & Management 2 Associate Professor,
More information2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,
ISSN 2319-8885 Vol.03,Issue.30 October-2014, Pages:5968-5972 www.ijsetr.com Low Power and Area-Efficient Carry Select Adder THANNEERU DHURGARAO 1, P.PRASANNA MURALI KRISHNA 2 1 PG Scholar, Dept of DECS,
More informationModified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen
Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen Abstract A new low area-cost FIR filter design is proposed using a modified Booth multiplier based on direct form
More informationISSN Vol.07,Issue.08, July-2015, Pages:
ISSN 2348 2370 Vol.07,Issue.08, July-2015, Pages:1397-1402 www.ijatir.org Implementation of 64-Bit Modified Wallace MAC Based On Multi-Operand Adders MIDDE SHEKAR 1, M. SWETHA 2 1 PG Scholar, Siddartha
More informationHigh Performance Low-Power Signed Multiplier
High Performance Low-Power Signed Multiplier Amir R. Attarha Mehrdad Nourani VLSI Circuits & Systems Laboratory Department of Electrical and Computer Engineering University of Tehran, IRAN Email: attarha@khorshid.ece.ut.ac.ir
More informationA New Architecture for Signed Radix-2 m Pure Array Multipliers
A New Architecture for Signed Radi-2 m Pure Array Multipliers Eduardo Costa Sergio Bampi José Monteiro UCPel, Pelotas, Brazil UFRGS, P. Alegre, Brazil IST/INESC, Lisboa, Portugal ecosta@atlas.ucpel.tche.br
More informationFaster and Low Power Twin Precision Multiplier
Faster and Low Twin Precision V. Sreedeep, B. Ramkumar and Harish M Kittur Abstract- In this work faster unsigned multiplication has been achieved by using a combination High Performance Multiplication
More informationAn FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor
An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor HADI PARANDEH-AFSHAR, PHILIP BRISK, and PAOLO IENNE Ecole Polytechnique Federale de Lausanne (EPFL) To improve FPGA performance
More informationJDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER
JDT-003-2013 LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER 1 Geetha.R, II M Tech, 2 Mrs.P.Thamarai, 3 Dr.T.V.Kirankumar 1 Dept of ECE, Bharath Institute of Science and Technology
More informationAN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER
AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER K. RAMAMOORTHY 1 T. CHELLADURAI 2 V. MANIKANDAN 3 1 Department of Electronics and Communication
More informationReview of Booth Algorithm for Design of Multiplier
Review of Booth Algorithm for Design of Multiplier N.VEDA KUMAR, THEEGALA DHIVYA Assistant Professor, M.TECH STUDENT Dept of ECE,Megha Institute of Engineering & Technology For womens,edulabad,ghatkesar
More informationArea Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique
Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique G. Sai Krishna Master of Technology VLSI Design, Abstract: In electronics, an adder or summer is digital circuits that
More informationImplementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST
ǁ Volume 02 - Issue 01 ǁ January 2017 ǁ PP. 06-14 Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST Ms. Deepali P. Sukhdeve Assistant Professor Department
More informationDesign of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing
Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing Yelle Harika M.Tech, Joginpally B.R.Engineering College. P.N.V.M.Sastry M.S(ECE)(A.U), M.Tech(ECE), (Ph.D)ECE(JNTUH), PG DIP
More informationDesign of an optimized multiplier based on approximation logic
ISSN:2348-2079 Volume-6 Issue-1 International Journal of Intellectual Advancements and Research in Engineering Computations Design of an optimized multiplier based on approximation logic Dhivya Bharathi
More informationMahendra Engineering College, Namakkal, Tamilnadu, India.
Implementation of Modified Booth Algorithm for Parallel MAC Stephen 1, Ravikumar. M 2 1 PG Scholar, ME (VLSI DESIGN), 2 Assistant Professor, Department ECE Mahendra Engineering College, Namakkal, Tamilnadu,
More informationA Novel Approach of an Efficient Booth Encoder for Signal Processing Applications
International Conference on Systems, Science, Control, Communication, Engineering and Technology 406 International Conference on Systems, Science, Control, Communication, Engineering and Technology 2016
More informationAn Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors
An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors T.N.Priyatharshne Prof. L. Raja, M.E, (Ph.D) A. Vinodhini ME VLSI DESIGN Professor, ECE DEPT ME VLSI DESIGN
More informationA Review on Different Multiplier Techniques
A Review on Different Multiplier Techniques B.Sudharani Research Scholar, Department of ECE S.V.U.College of Engineering Sri Venkateswara University Tirupati, Andhra Pradesh, India Dr.G.Sreenivasulu Professor
More informationA MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE
A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE R.Mohanapriya #1, K. Rajesh*² # PG Scholar (VLSI Design), Knowledge Institute of Technology, Salem * Assistant
More informationA New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm
A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm V.Sandeep Kumar Assistant Professor, Indur Institute Of Engineering & Technology,Siddipet
More informationNOWADAYS, many Digital Signal Processing (DSP) applications,
1 HUB-Floating-Point for improving FPGA implementations of DSP Applications Javier Hormigo, and Julio Villalba, Member, IEEE Abstract The increasing complexity of new digital signalprocessing applications
More informationPerformance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationREALIAZATION OF LOW POWER VLSI ARCHITECTURE FOR RECONFIGURABLE FIR FILTER USING DYNAMIC SWITCHING ACITIVITY OF MULTIPLIERS
REALIAZATION OF LOW POWER VLSI ARCHITECTURE FOR RECONFIGURABLE FIR FILTER USING DYNAMIC SWITCHING ACITIVITY OF MULTIPLIERS M. Sai Sri 1, K. Padma Vasavi 2 1 M. Tech -VLSID Student, Department of Electronics
More informationA Novel Approach For Designing A Low Power Parallel Prefix Adders
A Novel Approach For Designing A Low Power Parallel Prefix Adders R.Chaitanyakumar M Tech student, Pragati Engineering College, Surampalem (A.P, IND). P.Sunitha Assistant Professor, Dept.of ECE Pragati
More informationHigh Speed Binary Counters Based on Wallace Tree Multiplier in VHDL
High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL E.Sangeetha 1 ASP and D.Tharaliga 2 Department of Electronics and Communication Engineering, Tagore College of Engineering and Technology,
More informationDesign and Analysis of Row Bypass Multiplier using various logic Full Adders
Design and Analysis of Row Bypass Multiplier using various logic Full Adders Dr.R.Naveen 1, S.A.Sivakumar 2, K.U.Abhinaya 3, N.Akilandeeswari 4, S.Anushya 5, M.A.Asuvanti 6 1 Associate Professor, 2 Assistant
More informationCOMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS
COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS ( 1 Dr.V.Malleswara rao, 2 K.V.Ganesh, 3 P.Pavan Kumar) 1 Professor &HOD of ECE,GITAM University,Visakhapatnam. 2 Ph.D
More informationAn Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay
An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay 1. K. Nivetha, PG Scholar, Dept of ECE, Nandha Engineering College, Erode. 2.
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 Project Background High speed multiplication is another critical function in a range of very large scale integration (VLSI) applications. Multiplications are expensive and slow
More informationPerformance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL
Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL E.Deepthi, V.M.Rani, O.Manasa Abstract: This paper presents a performance analysis of carrylook-ahead-adder and carry
More informationHigh Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree
High Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree Alfiya V M, Meera Thampy Student, Dept. of ECE, Sree Narayana Gurukulam College of Engineering, Kadayiruppu, Ernakulam,
More informationAREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER
American Journal of Applied Sciences 11 (2): 180-188, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.180.188 Published Online 11 (2) 2014 (http://www.thescipub.com/ajas.toc) AREA
More informationIMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA Sooraj.N.P. PG Scholar, Electronics & Communication Dept. Hindusthan Institute of Technology, Coimbatore,Anna University ABSTRACT Multiplications
More informationLow Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier
Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier Gowridevi.B 1, Swamynathan.S.M 2, Gangadevi.B 3 1,2 Department of ECE, Kathir College of Engineering 3 Department of ECE,
More informationHIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE
HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE R.ARUN SEKAR 1 B.GOPINATH 2 1Department Of Electronics And Communication Engineering, Assistant Professor, SNS College Of Technology,
More informationIJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN
High throughput Modified Wallace MAC based on Multi operand Adders : 1 Menda Jaganmohanarao, 2 Arikathota Udaykumar 1 Student, 2 Assistant Professor 1,2 Sri Vekateswara College of Engineering and Technology,
More informationAn Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products
21st International Conference on VLSI Design An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products Sabyasachi Das Synplicity Inc Sunnyvale, CA, USA Email: sabya@synplicity.com
More informationAn Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder
An Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder Sony Sethukumar, Prajeesh R, Sri Vellappally Natesan College of Engineering SVNCE, Kerala, India. Manukrishna
More informationA NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS
G RAMESH et al, Volume 2, Issue 7, PP:, SEPTEMBER 2014. A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS G.Ramesh 1*, K.Naga Lakshmi 2* 1. II. M.Tech (VLSI), Dept of ECE, AM Reddy Memorial College
More informationDesign of Low Power Column bypass Multiplier using FPGA
Design of Low Power Column bypass Multiplier using FPGA J.sudha rani 1,R.N.S.Kalpana 2 Dept. of ECE 1, Assistant Professor,CVSR College of Engineering,Andhra pradesh, India, Assistant Professor 2,Dept.
More informationIMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER
ISSN: 0976-3104 Srividya. ARTICLE OPEN ACCESS IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER Srividya Sahyadri College of Engineering & Management, ECE Dept, Mangalore,
More informationComparison of Conventional Multiplier with Bypass Zero Multiplier
Comparison of Conventional Multiplier with Bypass Zero Multiplier 1 alyani Chetan umar, 2 Shrikant Deshmukh, 3 Prashant Gupta. M.tech VLSI Student SENSE Department, VIT University, Vellore, India. 632014.
More informationDESIGN OF LOW POWER MULTIPLIERS
DESIGN OF LOW POWER MULTIPLIERS GowthamPavanaskar, RakeshKamath.R, Rashmi, Naveena Guided by: DivyeshDivakar AssistantProfessor EEE department Canaraengineering college, Mangalore Abstract:With advances
More informationDesign and Analyse Low Power Wallace Multiplier Using GDI Technique
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 12, Issue 2, Ver. III (Mar.-Apr. 2017), PP 49-54 www.iosrjournals.org Design and Analyse
More informationA Novel FPGA Logic Block for Improved Arithmetic Performance
A Novel FPGA Logic Block for Improved Arithmetic Performance Hadi Parandeh-Afshar Philip Brisk Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) School of Computer and Communication Sciences
More informationModified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition
Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition Thoka. Babu Rao 1, G. Kishore Kumar 2 1, M. Tech in VLSI & ES, Student at Velagapudi Ramakrishna
More informationREVIEW ARTICLE: EFFICIENT MULTIPLIER ARCHITECTURE IN VLSI DESIGN
REVIEW ARTICLE: EFFICIENT MULTIPLIER ARCHITECTURE IN VLSI DESIGN M. JEEVITHA 1, R.MUTHAIAH 2, P.SWAMINATHAN 3 1 P.G. Scholar, School of Computing, SASTRA University, Tamilnadu, INDIA 2 Assoc. Prof., School
More informationLow-Power Multipliers with Data Wordlength Reduction
Low-Power Multipliers with Data Wordlength Reduction Kyungtae Han, Brian L. Evans, and Earl E. Swartzlander, Jr. Dept. of Electrical and Computer Engineering The University of Texas at Austin Austin, TX
More informationKeywords: Column bypassing multiplier, Modified booth algorithm, Spartan-3AN.
Volume 4, Issue 5, May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Empirical Review
More informationHigh Performance 128 Bits Multiplexer Based MBE Multiplier for Signed-Unsigned Number Operating at 1GHz
High Performance 128 Bits Multiplexer Based MBE Multiplier for Signed-Unsigned Number Operating at 1GHz Ravindra P Rajput Department of Electronics and Communication Engineering JSS Research Foundation,
More informationA Novel Approach to 32-Bit Approximate Adder
A Novel Approach to 32-Bit Approximate Adder Shalini Singh 1, Ghanshyam Jangid 2 1 Department of Electronics and Communication, Gyan Vihar University, Jaipur, Rajasthan, India 2 Assistant Professor, Department
More informationArchitectural Improvements for Field Programmable Counter Arrays: Enabling Efficient Synthesis of Fast Compressor Trees on FPGAs
Architectural Improvements for Field Programmable Counter Arrays: Enabling Efficient Synthesis of Fast Compressor Trees on FPGAs Alessandro Cevrero,2, Panagiotis Athanasopoulos,2, Hadi Parandeh-Afshar
More informationDesign of a Power Optimal Reversible FIR Filter for Speech Signal Processing
2015 International Conference on Computer Communication and Informatics (ICCCI -2015), Jan. 08 10, 2015, Coimbatore, INDIA Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing S.Padmapriya
More informationUsing Soft Multipliers with Stratix & Stratix GX
Using Soft Multipliers with Stratix & Stratix GX Devices November 2002, ver. 2.0 Application Note 246 Introduction Traditionally, designers have been forced to make a tradeoff between the flexibility of
More informationAn Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2
An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2 1 M.Tech student, ECE, Sri Indu College of Engineering and Technology,
More informationModified Design of High Speed Baugh Wooley Multiplier
Modified Design of High Speed Baugh Wooley Multiplier 1 Yugvinder Dixit, 2 Amandeep Singh 1 Student, 2 Assistant Professor VLSI Design, Department of Electrical & Electronics Engineering, Lovely Professional
More informationFOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER
International Journal of Advancements in Research & Technology, Volume 4, Issue 6, June -2015 31 A SPST BASED 16x16 MULTIPLIER FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER
More informationAN EFFICIENT MAC DESIGN IN DIGITAL FILTERS
AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS THIRUMALASETTY SRIKANTH 1*, GUNGI MANGARAO 2* 1. Dept of ECE, Malineni Lakshmaiah Engineering College, Andhra Pradesh, India. Email Id : srikanthmailid07@gmail.com
More information[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Design of Wallace Tree Multiplier using Compressors K.Gopi Krishna *1, B.Santhosh 2, V.Sridhar 3 gopikoleti@gmail.com Abstract
More information6. DSP Blocks in Stratix II and Stratix II GX Devices
6. SP Blocks in Stratix II and Stratix II GX evices SII52006-2.2 Introduction Stratix II and Stratix II GX devices have dedicated digital signal processing (SP) blocks optimized for SP applications requiring
More informationISSN Vol.03,Issue.02, February-2014, Pages:
www.semargroup.org, www.ijsetr.com ISSN 2319-8885 Vol.03,Issue.02, February-2014, Pages:0239-0244 Design and Implementation of High Speed Radix 8 Multiplier using 8:2 Compressors A.M.SRINIVASA CHARYULU
More informationDesign of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique
Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique TALLURI ANUSHA *1, and D.DAYAKAR RAO #2 * Student (Dept of ECE-VLSI), Sree Vahini Institute of Science and Technology,
More informationImplementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA
Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA 1. Vijaya kumar vadladi,m. Tech. Student (VLSID), Holy Mary Institute of Technology and Science, Keesara, R.R. Dt. 2.David Solomon Raju.Y,Associate
More informationLow power and Area Efficient MDC based FFT for Twin Data Streams
RESEARCH ARTICLE OPEN ACCESS Low power and Area Efficient MDC based FFT for Twin Data Streams M. Hemalatha 1, R. Ashok Chaitanya Varma 2 1 ( M.Tech -VLSID Student, Department of Electronics and Communications
More informationVHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic
VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic DAVID NEUHÄUSER Friedrich Schiller University Department of Computer Science D-07737 Jena GERMANY dn@c3e.de
More informationA Design Approach for Compressor Based Approximate Multipliers
A Approach for Compressor Based Approximate Multipliers Naman Maheshwari Electrical & Electronics Engineering, Birla Institute of Technology & Science, Pilani, Rajasthan - 333031, India Email: naman.mah1993@gmail.com
More informationReduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter
Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter Dr.N.C.sendhilkumar, Assistant Professor Department of Electronics and Communication Engineering Sri
More informationDesign of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm
Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Vijay Kumar Ch 1, Leelakrishna Muthyala 1, Chitra E 2 1 Research Scholar, VLSI, SRM University, Tamilnadu, India 2 Assistant Professor,
More informationDesign and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors
Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors M.Satheesh, D.Sri Hari Student, Dept of Electronics and Communication Engineering, Siddartha Educational Academy
More informationVerilog Implementation of 64-bit Redundant Binary Product generator using MBE
Verilog Implementation of 64-bit Redundant Binary Product generator using MBE Santosh Kumar G.B 1, Mallikarjuna A 2 M.Tech (D.E), Dept. of ECE, BITM, Ballari, India 1 Assistant professor, Dept. of ECE,
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DESIGN OF LOW POWER MULTIPLIERS USING APPROXIMATE ADDER MR. PAWAN SONWANE 1, DR.
More informationMultiplier Design and Performance Estimation with Distributed Arithmetic Algorithm
Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm M. Suhasini, K. Prabhu Kumar & P. Srinivas Department of Electronics & Comm. Engineering, Nimra College of Engineering
More informationHIGH SPEED FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS
HIGH SPEED FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS Jeena James, Prof.Binu K Mathew 2, PG student, Associate Professor, Saintgits College of Engineering, Saintgits College of Engineering, MG University,
More informationImplementation of Efficient 16-Bit MAC Using Modified Booth Algorithm and Different Adders
International Journal of Scientific and Research Publications, Volume 4, Issue 3, March 2014 1 Implementation of Efficient 16-Bit MAC Using Modified Booth Algorithm and Different s M.Karthikkumar, D.Manoranjitham,
More informationA High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits
IOSR Journal of Electronics and Communication Engineering (IOSRJECE) ISSN: 2278-2834, ISBN No: 2278-8735 Volume 3, Issue 1 (Sep-Oct 2012), PP 07-11 A High Speed Wallace Tree Multiplier Using Modified Booth
More informationPublished by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1
Design Of Low Power Approximate Mirror Adder Sasikala.M 1, Dr.G.K.D.Prasanna Venkatesan 2 ME VLSI student 1, Vice Principal, Professor and Head/ECE 2 PGP college of Engineering and Technology Nammakkal,
More informationDesign of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi
International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 105 Design of Baugh Wooley Multiplier with Adaptive Hold Logic M.Kavia, V.Meenakshi Abstract Mostly, the overall
More informationMS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng.
MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng., UCLA - http://nanocad.ee.ucla.edu/ 1 Outline Introduction
More information10. DSP Blocks in Arria GX Devices
10. SP Blocks in Arria GX evices AGX52010-1.2 Introduction Arria TM GX devices have dedicated digital signal processing (SP) blocks optimized for SP applications requiring high data throughput. These SP
More informationS.Nagaraj 1, R.Mallikarjuna Reddy 2
FPGA Implementation of Modified Booth Multiplier S.Nagaraj, R.Mallikarjuna Reddy 2 Associate professor, Department of ECE, SVCET, Chittoor, nagarajsubramanyam@gmail.com 2 Associate professor, Department
More informationEfficient Dedicated Multiplication Blocks for 2 s Complement Radix-2m Array Multipliers
1502 JOURNAL OF COMPUTERS, VOL. 5, NO. 10, OCTOBER 2010 Efficient Dedicated Multiplication Blocks for 2 s Complement Radix-2m Array Multipliers Leandro Z. Pieper, Eduardo A. C. da Costa, Sérgio J. M. de
More informationGlobally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 5, Ver. II (Sep. - Oct. 2016), PP 15-21 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Globally Asynchronous Locally
More informationDesign and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. PP 42-46 www.iosrjournals.org Design and Simulation of Convolution Using Booth Encoded Wallace
More informationAn Area Efficient Decomposed Approximate Multiplier for DCT Applications
An Area Efficient Decomposed Approximate Multiplier for DCT Applications K.Mohammed Rafi 1, M.P.Venkatesh 2 P.G. Student, Department of ECE, Shree Institute of Technical Education, Tirupati, India 1 Assistant
More informationInternational Journal of Modern Trends in Engineering and Research
Scientific Journal Impact Factor (SJIF): 1.711 e-issn: 2349-9745 p-issn: 2393-8161 International Journal of Modern Trends in Engineering and Research www.ijmter.com FPGA Implementation of High Speed Architecture
More informationJin-Hyuk Kim, Je-Huk Ryu, Jun-Dong Cho. Sungkyunkwan Univ.
A High Speed and Low Power VLSI Multiplier Using a Redundant Binary Booth Encoding Jin-Hyuk Kim, Je-Huk Ryu, Jun-Dong Cho School of Electrical and Computer Engineering Sungkyunkwan Univ. jhkim,compro@nature.skku.ac.kr,
More information