Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance

Size: px
Start display at page:

Download "Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance"

Transcription

1 th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance Hadi Parandeh-Afshar and Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) {hadi.parandehafshar and Abstract Integrating DSP blocks into FPGAs is an effective approach to close the existing gap between FPGAs and ASICs. A much wider range of applications could benefit from DSP blocks if they were more versatile than those currently found in commercial devices. In this paper we propose a novel DSP block which resembles commercially available ones and yet additionally supports a wide variety of multiplier bit widths as well as multi-input addition with negligible overhead. The novel DSP block uses much more efficiently the limited available input/output bandwidth. Experimental results show that on average the area overhead of the novel features added to a base design is a mere 3% with practically no delay penalty. Moreover, for multi-input addition, which is not supported in current DSP blocks, the proposed DSP block is more than 50% faster on average compared to FPGA soft logic. 1. Introduction One fundamental step to bridge the gap between FPGAs and ASICs for arithmetic-dominated circuits is to integrate the DSP blocks that perform commonlyused arithmetic functions into an FPGA. The current trend in FPGA design is to add as many features as possible to the DSP blocks to improve a wider range of applications. The challenging issue is that the new features should not impose a considerable overhead to the original DSP block. The most important feature of the DSP blocks in current FPGAs is multiplication. Most DSP blocks have a fixed bit-width multiplier as the base and other few multiplication bit-widths are formed on top of the base multiplier. For instance, the DSP blocks in Altera Stratix-II only support 9-bit and 36-bit multiplier on top of the base 18-bit multiplier. Moreover, despite the availability of IO bandwidth, when a DSP block is configured as a 36-bit multiplier, no other resources remain for other functionalities. In this paper, first we present a base architecture for the DSP block, which consists of Partial Product Generator (PPG) and Partial Product Reduction (PPR) units. To support additional features such as various multiplication bit-widths as well as multi-input addition, we make the PPG unit reconfigurable and keep the PPR unit almost the same. This way the PPR remains small and fast, which is ideal for multi-input addition. However, the PPG unit can get complicated, since it should provide a different set of inputs for the PPR based on DSP block configuration. We propose a number of techniques, which significantly reduce the costs of making PPG reconfigurable. The paper structure is as follows. Related work and preliminary arithmetic concepts are discussed in sections 2 and 3 respectively. Section 4 presents the architecture of the proposed base DSP block. Sections 5 and 6 describe how new features are added to the base DSP block. Experimental results are presented in section 7 and the final section is conclusion. 2. Related work Two architectures [1,2] have been proposed in the past for FPGAs to improve multi-input addition. In both architectures, an accelerator for carry-save arithmetic is proposed which is intended to be placed in an FPGA as a DSP block. The problem with these two architectures is that they can only implement multi-input addition. To perform multiplication, one can use the FPGA soft-logic for the PPG and the proposed architectures for the PPR. The first drawback is that only a few multipliers can be implemented in each DSP block due to the input bandwidth and resource constraints. Moreover, such multipliers have poor performance and area utilization, since the PPG is generated by the soft logic and general routing network is used to connect PPG and PPR units. In a subsequent work [3] by the same group, the PPG was integrated into the accelerator. This removed the second drawback, but still only two 9-bit multipliers can be implemented in the DSP block, /10 $ IEEE DOI /FCCM

2 while one can fit nine 9-bit multipliers to the DSP block of Stratix-II. There are several academic proposals [4-8] to improve the scientific applications on FPGAs. Such applications require floating-point arithmetic and to implement floating-point arithmetic operations on an FPGA, a large amount of resources are required. Therefore, the primary goal in these proposals is to integrate the floating-point functionality into DSPs, whereas our goal is different and complementary. Our contribution is to propose a base DSP block in which we can add various multiplier bit-widths as well as multi-input addition with small overhead. 3. Arithmetic preliminaries In this section, we explain a number of arithmetic concepts such as Radix-4 Booth multiplication and compressor tree that are used in this paper. Radix-4 Booth [9] multiplication is a standard technique to design smaller and faster multipliers, by recoding the numbers that are multiplied. By Radix-4 Booth recoding, the number of Partial Products (PP) is reduced by half. The basic idea is to take every second column, and multiply by ±1, ±2, or 0, instead of shifting and adding for every column of the multiplier term and multiplying by 1 or 0. To Booth recode the multiplier term, the bits in blocks of three are considered, such that each block overlaps the previous block by one bit. The overlap is necessary so that we know what happened in the last block, as the MSB of the block acts like a sign bit. To compute the sum of two integers, a carrypropagate adder (CPA) [9] such as ripple-carry and carry-select adders is used. To add more numbers, a compressor tree [9] is used. The building block of compressor tree can be either a counter or a compressor. Compressors are, in principal, similar to counters, but in contrast, they have explicit carry inputs and outputs. Each compressor tree may consist of several layers of either counters or compressors. However, no ripple carry propagation occurs within the same layer of compressor trees. Only, a final CPA is required to compute the final sum. 4. Proposed base architecture of DSP The fundamental building block of our base DSP block, similar to DSP blocks of Altera FPGAs, consists of two paired 18-bit multipliers followed by an optional adder stage. The adder unit is mainly used for complex arithmetic multiplication [12]. We proposed to use Radix-4 Booth architecture for the multipliers. Y 2j+1 Y 2j Y 2j-1 Booth Recoder Unit Two Non-Zero Neg Correction Term X i-1 X i 0 MFS Figure 1. Radix-4 Booth PPG unit. Booth recoder is common for the bits of a PP, but each PP bit needs a separate MFS unit. There are two reasons for choosing Radix-4 Booth multiplier. First, by modifying the PPG structure of Radix-4 Booth multiplier and doing some transformations on the sign extension parts of the PPs, we can significantly reduce the costs of adding new multiplier bit-widths to the base architecture. Second, we use the PPR unit of the multipliers in the base DSP block as a compressor tree for implementing multiinput addition. In Radix-4 Booth multiplier, the number of PPs is half compared to that of parallel array multiplier. Therefore, we can exploit faster and smaller compressors to build the compressor tree. This is a key factor in designing compressor trees for multi-input addition. We will provide more details about these advantages of Radix-4 Booth, once we unveiled the complete architecture in subsequent sections. In the following, we give a brief overview of the base DSP architecture as it is needed to understand the modifications that we will made to increase the flexibility. There are a couple of differences from standard Radix-4 architecture in the way that we design the PPG and PPR units. For the PPG, to multiply the multiplicand by 1, 2, or 0, all that is needed is a few multiplexers, which have a delay time that is independent of the size of the inputs. The only complexity relates to negating a 2 s complement number, where a 1 is added to an inverted number. This complexity can be avoided in the PPG, if we move the summation part into the PPR unit. For this purpose, a correction bit corresponding to each PP is added to the PPR. Figure 1 illustrates the PPG unit of the Radix-4 multiplier. For each PP, one Booth recoder is required, while for each bit of PP we need the multiplicand factor selection (MFS) unit shown in this figure. All MFS units get the same set of select signals. PP k 230

3 Partial Product Generator : 2 9 : 2 9 : 2 9 : 2 Figure 2. Proposed design for 9:2 compressor. In Booth multiplication each PP should be sign extended and also should be shifted to the left by two bits. Therefore, the PPR unit of an 18-bit multiplier is 36-bit wide. Moreover, the height of the PPR is nine, since we have nine PPs. We propose to exploit a layer of 9:2 compressors followed by a final CPA for the PPR unit. In this layer, there is a 9:2 compressor per each column. So for an 18-bit multiplier, thirty-six 9:2 compressors are required. Figure 2 shows the proposed circuit level design of 9:2 compressor. All of its inputs, including the carry bits, have the same bit position, rank i. The two outputs also have rank i, but the carry outputs have rank i+1. The delay of 9:2 layer is independent of the layer width, since no ripple carry path exist in the layer. The longest path that a carry can propagate contains 3 cells. Since the compressor layer will be reused for other DSP block configurations, the 9:2 layers of all 18-bit multipliers in the base DSP block are chained, but at the multiplier boundaries, each carry input is set to 0, by a simple AND gate. As explained, besides the PPs, we have a number of correction bits that need to be added to the PPR unit. Each correction bit (Cbit) is aligned with the first bit of the corresponding PP. Due to the shifting of the PPs, there is always a free place in the 9:2 layer for every Cbit, except for the ninth one. To avoid the overhead of exploiting 10:2 compressors instead of 9:2 ones, we merge the ninth Cbit with the first PP as shown in Figure 3. In this figure S is the the sign bit of the PP and C is the correction bit. Since the Cbit is aligned with the 17 th bit of the PP, the MSB bits are modified from that bit position as shown.... S S S P C... ( C P16). S + ( C + P16). S CP. 16 S C P16... Figure 3. Merging the 9 th Cbit with the 1 st PP. MSB bits of the PP from bit 16 are modified. 2 : 2 2 : 2 4 : 2 4 : 2 4 : 2 2 : 2 2 : 2 Figure 4. Building block of proposed DSP block. In addition to 9:2 layer, we need a 4:2 layer to sum the results of the two paired 18-bit multipliers for complex arithmetic multiplication. A 4:2 compressor is similar to the 9:2 compressor, which has fewer number of inputs and carry bits. This layer is added between the 9:2 layer and the final CPA. This layer should be 36-bit wide for each multiplier pair. Figure 4 shows the diagram of the building block in the proposed DSP block. The dashed box represents the PPR unit. 5. Supporting various multiplier bit-widths In this section we will describe how we can reduce the cost of adding new multiplier bit-width to the base DSP block by modifying the PPG structure of Radix-4 Booth multiplier and performing some transformations on the sign extension parts of the PPs. We use the same PPR unit for all multiplier bit-widths and the PPG unit is modified to provide the required inputs for the PPR, based on DSP block configuration. Providing such flexibility in the PPG can make it considerably complex. For instance as shown in Figure 5, when a new bit-width is added to the base DSP, for a certain bit position, we may need to select between a Booth encoded bit or a sign bit. This requires exploiting a multiplexer to choose the right configuration. In this figure, the PPG corresponding to the first PP of an 18- bit multiplier is illustrated on the top. Within the same number of bits, two 9-bit multipliers can fit. In this figure, we can see the first PPG of both 9-bit multiplies below that of the 18-bit multiplier. Since both configurations use the same PPR unit, we need to exploit several multiplexers to choose between these two configurations. Note that in most of the cases, we should select between a sign bit and another bit, which 231

4 Sign Extension 18-bit Multiplier PPG F17 F bit Multiplier PPG 9-bit Multiplier PPG Sign Extension Sign Extension F17 F9 F8 F0 Figure 7. Reducing the constant numbers to one number. + S PPR Figure 5. Overlap between the first PPG of two different multiplier configurations. Since the same PPR is used for both configurations, several multiplexers are required to select between a sign bit and an encoded-bit or other sign bit. can be a Booth-encoded bit or a different sign bit. For the bit positions that the encoded parts of two multiplier configurations overlaps, we need to use one encoder for both, but with some multiplexers at the inputs. In this case, such overlap occurs in the first nine bit and since the encoder inputs are the same, no multiplexer is required. Therefore, one efficient approach to reduce the PPG complexity is to avoid this huge amount of multiplexers that are added for each bit of partial product, when a new multiplier configuration is added. The first step is to eliminate the sign extension parts of the PPs. Here we use a similar technique to the one that is used in Baugh-Wooly multiplier [10]. This technique is illustrated in Figure 6. In this figure, the sign extension part of a PP is first added with +1 and then with 1. As shown, when it is added with +1, the whole sign part is reduced to a single inverted sign bit. S.... S S S S S S Figure 6. Reducing the repetitive sign bits by adding with ± S S S Figure 8. Merging the constant number into first partial product. Now, if this rule is applied to the sign extension parts of N PPs, then we will have N constant numbers and N single inverted sign bits. Now, we can reduce the N constant numbers to only one number, by summing up all the constant numbers, as shown in Figure 7. Since the first bit of this constant number is aligned against the inverted sign bit of the first PP, we can append the constant number to the first PP as shown in Figure 8. The resulted value is then appended to first PP from its sign bit position. With this technique, we replace the sign parts of all PPs with a set of 0s and a single inverted bit, except for the first PP, where there are three sign bits and a number of 0s and 1s. This means that we need to choose between constant bits ( 0 or 1 ), single sign bits (inverted or non-inverted) and the normal Booth encoded bits. To avoid multiplexers, we modified the multiplicand factor selection (MFS) unit of the Radix-4 Booth PPG in Figure 2 and added two extra control signals as shown in Figure 9. Compared to the original design, two control signals, Const and Inv, and two simple two input gates have been added on the selection logic of the last two multiplexers in MFS. When Inv signal is set, the output of the MFS is inverted and when Const bit is set, 0 is selected as the output of the second multiplexer. Table 1 shows the operation modes of the modified PPG based on these two control signals. With the modified MFS, for having a constant output, the Const signal should be set and based on the required value the Inv input is defined. This operation mode, resolves the conflicts of a constant bit ( 0 or 232

5 Table 1. Operation modes of modified Radix-4 Booth PPG. Const Inv Func 0 0 PP k 0 1 PP k ) with the opposite constant bit, a Booth encoded bit and an inverted sign bit. To produce the inverted sign bit, only we need to set the Inv signal. For a normal encoding, the two control bits are tied to 0. In contrast to PPG, the PPR unit does not require any major modification. The compressor tree in the PPR was designed in a way that can be reused for several different bit-widths. For the bit widths smaller than 18, since the number of PPs is less than nine, they can be reduced within a certain number of slices in the 9:2 layer. Nevertheless, for the bigger bit-widths up to 36, we should split the PPs into two groups and reduce each group separately by two disjoint chunks of 9:2 compressors. Then, we need to sum the results of the two chunks. For this part, we use the 4:2 layer of the base DSP. The number of 9:2 slices that are required for a set of PP is obtained from the following equation. In this equation, MulBW represents the multiplier bitwidth. Slice BW = MulBW + 2 Num of PPs For example, in 36-bit multiplier, we have two sets of nine PPs. Therefore, each set requires 54 9:2 slices. In this case, we allocate the first 108 slices of the 9:2 layer to the 36-bit multiplier. Similarly, for a 24-bit multiplier, we need 72 slices. The required numbers of 4:2 slices for 36- and 24-bit multipliers are 54 and 30 respectively. In fact, for both cases, the LSB of the second PP set aligns at the 19 th bit of the first set and the first 18 bits go directly to the final CPA. 6. Supporting multi-input addition It is well known in ASIC design that the right way of implementing multi-input addition is to use compressor trees. In general, using smaller compressors as the building blocks of the compressor tree leads to have faster and more flexible designs. From the multi-input addition perspective, the advantage of using the PPR of Radix-4 Booth multiplier compared to that of parallel array multiplier is that we can use smaller, faster and more flexible Y 2j+1 Y 2j Y 2j-1 Booth Recoder Unit Correction Term Two Non-Zero Neg Const Inv MFS X i-1 X i 0 Figure 9. Modified Radix-4 Booth PPG encoder for resolving the conflicts of PPG parts of various multiplier bit-widths. compressors for the synthesis. The compressor tree in PPR can be used for multi-input addition by bypassing the PPG. This is a missing feature in the DSP blocks of current FPGAs. In such DSP blocks, the PPG cannot be bypassed and the PPR has not been designed for multi-input addition. Assuming that there is no connectivity constraint between the DSP block inputs and the inputs of the compressor unit inside the PPR, we can efficiently map any regular and irregular multi-input addition by the PPR. However, such a connectivity requires a fully populated switch box which is extremely costly. Therefore, the real challenge is to find a way to benefit from the inherent flexibility of the PPR compressor tree with minimum overhead. Our solution for this problem is to define a set of fixed rectangular blocks (rc-block) within the 9:2 layer of the PPR for mapping different patterns. In fact, for different multi-input addition patterns, a different combination of the rc-blocks is used. The advantage of this approach is that no crossbar is required to make a direct connection between the DSP inputs and the PPR inputs. Since each rc-block is placed in a fixed region of the 9:2 layer, there will be a predefined pattern to connect DSP blocks inputs to the PPR for multi-input addition. Table 2 shows the dimensions of each rcblock for a half-dsp with 72 inputs. Here we assume that each half-dsp gets half of the total inputs for multi-input addition. The maximum height of an rcblock is nine since it should fit into the 9:2 layer. The width of an rc-block in this table, defines the number of slices of 9:2 that are used. Figure 10 shows how different DSP inputs are referenced in different rc-blocks. In this figure, the first four rc-blocks in Table 2 are shown. The numbers PP k 233

6 Table 2. Different rectangular blocks that are used for mapping the bits in an adder tree. Block Heigth Width bit 12-bit inside each rc-block specify the DSP input indices. Note that the PPG unit should be bypassed for the indicated bit positions. Therefore, we insert some multiplexers into the PPG unit for this purpose. To minimize the number of multiplexers, the rc-blocks are overlapped maximally as shown in this figure. For a maximum overlap, we align all the rc-blocks to the left for the right half-dsp and to the right for left half- DSP. The reason for such alignment is that we can connect two rc-blocks of each half-dsp and form bigger blocks. For instance, rc-block 0 in the first half can be chained to rc-block 5 in the second half and form a bigger non-rectangular block for covering the input bits; similarly, two identical rc-blocks can be chained and form a wider rectangular block with the same height. Having fixed block sizes with fixed placement removes the need to route the DSP inputs to the compressor unit of PPR through a crossbar Multi-input addition mapping algorithm In this section, the mapping algorithm of multiinput addition is described. Mapping algorithm is an important step, since we have to choose the best combination of the mentioned rc-blocs to efficiently cover the input bits, which almost have irregular shapes. This is an important decision, which can affect both area utilization and performance. We define the compression ratio (CR) parameter for each block and based on that we prioritize them for the mapping. This parameter is defined as follows: W b hi i= 0 CR = 2 W In this equation, W b is the width of the bits that are generated by a block and h i is the number of bits in the i-th column of the block. The higher the CR is the b Figure 10. DSP Block input indices that are connected to each rc-block. Rc-blocks are aligned for maximum input sharing. greater is the overall compression ratio of the rc-block. Blocks with higher CRs tend to compress more. Although, the maximum height of a column for a specific rc-block is limited to the numbers illustrated in Table 2, more bits can be mapped to some columns in certain situations if the heights of other columns do not reach to the expected height. As an example, assume that rc-block-2 covers a set of bits, where all columns have 7 bits except the two right most columns, which only have 5 bits. This means that two of the bits in each of these two columns remain unmapped. Now if we find an input index in other columns outside the rcblock boundary that matches an input index in the short columns, then we have the chance to cover one more bit in the higher column. In this example, we can add two bits to the first column since the indices of the two exterior bits are 0 and 8 and these two inputs are found in the short columns. By using this trick, we can cover more bits by a block, when it is not fully utilized. We call this process block refinement. The proposed mapping algorithm has three major steps: In the first step, the best block for covering a set of input bits is defined. This block can be either a single rc-block or two joined rc-blocks as explained. The best block is the one that has the highest CR after the block refinement. Then, the covered bits are removed from the uncovered set of bits and this step is repeated until we reach to the termination condition. Termination condition is either covering all the bits or not able to find a block with more than 50% covering ratio. The covering ratio for a block is the ratio between the number of bits that are covered and the maximum number of bits that can be covered. The second step of the mapping is to generate the output bits corresponding to the covered bits. If this is the last level of the whole compressor tree and no other DSP block is required for the mapping, the final adder of the DSP has to be used to generate the result, 234

7 otherwise the final adder is skipped and the output will stay in the form of carry and save. In the third step of the mapping, we place the selected blocks in the DSPs and connect the DSP blocks based on the mapping. Each DSP can hold two rc-blocks, one in each half, and the joined rc-blocks are placed in the same DSP where the two halves are chained. 7. Experiments To evaluate the proposed DSP block architecture, we designed a sample base DSP block with 144 inputs and outputs. This is the IO bandwidth of a DSP block in Altera Stratix-II. Since we use 90nm CMOS technology for the DSP block design and we have to estimate the inter DSP net delays for our experiments, Stratix-II was selected as a baseline FPGA. Considering the IO bandwidths, two of the building blocks shown in Figure 5 can be implemented and so we will have four 18-bit multiplier. Therefore, we need a 144-bit wide 9:2 layer. So each half-dsp has two 18- bit multiplier followed by an optional 36-bit 4:2 layer for complex arithmetic multiplication. For our experiments, we add 9/12/24/36 multiplier bit-widths to the base DSP block and we evaluate the overhead that is imposed by each of them. Note that 24-bit multiplier is used for single precision floating-point multiplication. We also measure the overhead of adding multi-input addition feature to the sample DSP block. Based on the available bandwidth and resources, the designed DSP block can implement up to eight 9- bit, six 12-bit, two 24-bit and one 36-bit multipliers. For the 36-bit multiplier, some parts of the second half- DSP are used in conjunction with the complete first half-dsp. However, we can implement two 9-bit or one 12-bit multipliers in the remained part of the second half-dsp. Moreover, we can use the multiinput Table 3. Overhead of adding new features to the base DSP. The delay numbers show the 18- bit multiplier delay in each case. DSP Features Delay Area (ns) (µm 2 ) Base DSP Base DSP + 9-bit Mul Base DSP + 9/12 Mul Base DSP + 9/12/24 Mul Base DSP + 9/12/24/36 Mul Base DSP + 9/12/24/36 Mul and MADD addition feature of the second half at the same time that we have 36-bit multiplier. We modeled the sample DSP in Verilog and used Synopsys Design Compiler with 90nm Artisan standard cell library for the synthesis. The mapping algorithm of the multi-input addition was developed in C++. The methodology that we used for estimating the net delay of DSP inter connections was to replace the DSPs of Stratix-II in the netlist and extract the real net delays by Quartus tool Results Table 3 compares the synthesis results of the proposed base DSP with different features. The delay values show the delay of the 18-bit multiplier in each case and the area is the total area of the DSP with all mentioned features. The interesting point is that the delay overhead of adding new features to the base DSP is almost nothing; while the area overhead of all bitwidths before and after supporting multi-input addition feature is 11% and 13% respectively. This means that on average, less than 3% area overhead is imposed by supporting a new multiplier bit-width and the area overhead of adding multi-input addition is 2%. Table 4 shows the combinational delay of each multiplier in the final DSP with all features included. These numbers can further be improved by inserting some pipeline registers between the layers of the DSP. In Stratix-II, the combinational delays of 9/18/36 bit multipliers are 2.99 ns, 3.17 ns and 4.57 respectively. The delay of 9-bit multiplier of DSP in [3] is 1.71 ns. To evaluate the multi-input addition feature, we compared the synthesis results of the multi-input addition parts of some real arithmetic, multimedia and signal processing applications on Stratix-II FPGA soft logic [11], FPCT [2] and our DSP block. Note that, this feature is not supported by the DSP blocks of current FPGAs. Table 5 shows the delay results. Compared to FPCT, our DSP has a lower delay for the FIR benchmarks. On average, the delay of our DSP is around 4% slower than FPCT, which only does multiinput addition. Compared to the soft-logic, our DSP Table 4. Delay (ns) of each multiplier in the final DSP block. Multiplier BW Our DSP Stratix-II 9-bit bit bit bit bit

8 Table 5. Delays (ns) of multi-input addition benchmarks on different hardwares. Benchmark Soft Logic[11] FPCT [2] Our DSP DCT Motion Es G72x ADPCM Fir Fir Hpoly Average Table 6. Areas of multi-input addition benchmarks on different hardwares. Benchmark Soft Logic FPCT Ours (LAB) (DSP) (DSP) DCT Motion Es G72x ADPCM 3.5 Fir Fir Hpoly has a lower delay for all the benchmarks and on average is 54% faster. Area comparison of these three methods is not that straight forward. Table 6 shows the area of each benchmark in terms of the basic blocks that is used. For the soft logic, the area is in terms of LABs number, for FPCT and our DSP block, the numbers represent the DSPs number. The area of FPCT is µm 2, while ours is less than µm 2 with all mentioned features. 8. Conclusion In this paper, we proposed a base DSP block architecture for FPGAs with the potential of adding various multiplier bit-widths and multi-input addition without a considerable overhead. This was achieved by employing a number of techniques that was used to simplify the reconfigurable PPG unit. We studied the effects of adding these features on top the base architecture and we designed a sample DSP block with 9/12/18/24/36 multiplier bit-widths and multi-input addition. Moreover, the novel DSP block uses much more efficiently the limited available input/output bandwidth. 9. References [1] H. Parandeh-Afshar, P. Brisk, and P. Ienne, Integrating generalized parallel counters into FPGAs for improved arithmetic performance, to appear in in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, [2] A. Cevrero, et al., Field programmable compressor trees: acceleration of multi-input addition of FPGAs, ACM Trans. Reconfigurable Technology and Systems, vol. 2, no. 2, article no. 13, June, [3] H. Parandeh-Afshar, et al., A flexible DSP block to enhance FPGA arithmetic performance, in Proc. of the IEEE International Conference on Field Programmable Technology, FPT 09, pp , [4] M. J. Beauchamp, S. Hauck, K. D. Underwood, and K. S. Hemmert, Architectural modifications to enhance the floating-point performance of FPGAs, IEEE Trans. VLSI, vol. 16, no. 2, pp , Feb [5] Y. J. Chong and S. Parameswaran, Flexible multi-mode embedded floating-point unit for field programmable gate arrays, in Proc. Int. Symp. FPGAs 09, 2009, pp [6] C. H. Ho, et al., Virtual embedded blocks: a methodology for evaluating embedded elements in FPGAs, in Proc. IEEE Symp. Field Programmable Custom Computing Machines 06, 2006, pp [7] P. Jamieson and J. Rose, Architecting hard crossbars on FPGAs and increasing their area-efficiency with shadow clusters, in IEEE Int. Conf. Field Programmable Technology 07, 2007, pp [8] P. Jamieson and J. Rose, Enhancing the area-efficiency of FPGAs with hard circuits using shadow clusters, in IEEE Int. Conf. Field Programmable Technology 06, 2006, pp [9] M.D. Ercegovac and T. Lang, Digital Arithmetic, Morgan Kaufmann Publishers, San Francisco, [10] C. R. Baugh and B. A. Wooley, A two s complement parallel array multiplication algorithm, in IEEE Trans. Computers, vol. C-22, 1973 pp [11] H. Parandeh-Afshar, P. Brisk, and P. Ienne, Exploiting fast carry-chains of FPGAs for designing compressor trees, in Proc. 19 th international conference on Field Programmable Logic and Applications. FPL 09, 2009, pp [12] Aletra Corporation, DSP Blocks in Stratix-III Devices, in Stratix-III Device Handbook, Volume 1. May

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS 1 T.Thomas Leonid, 2 M.Mary Grace Neela, and 3 Jose Anand

More information

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES 69 CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES 4.1 INTRODUCTION Multiplication is one of the basic functions used in digital signal processing. It requires more

More information

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers Dharmapuri Ranga Rajini 1 M.Ramana Reddy 2 rangarajini.d@gmail.com 1 ramanareddy055@gmail.com 2 1 PG Scholar, Dept

More information

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN An efficient add multiplier operator design using modified Booth recoder 1 I.K.RAMANI, 2 V L N PHANI PONNAPALLI 2 Assistant Professor 1,2 PYDAH COLLEGE OF ENGINEERING & TECHNOLOGY, Visakhapatnam,AP, India.

More information

An Optimized Design for Parallel MAC based on Radix-4 MBA

An Optimized Design for Parallel MAC based on Radix-4 MBA An Optimized Design for Parallel MAC based on Radix-4 MBA R.M.N.M.Varaprasad, M.Satyanarayana Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India Abstract In this paper a novel architecture

More information

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER 1 ZUBER M. PATEL 1 S V National Institute of Technology, Surat, Gujarat, Inida E-mail: zuber_patel@rediffmail.com Abstract- This paper presents

More information

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology Inf. Sci. Lett. 2, No. 3, 159-164 (2013) 159 Information Sciences Letters An International Journal http://dx.doi.org/10.12785/isl/020305 A New network multiplier using modified high order encoder and optimized

More information

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers IOSR Journal of Business and Management (IOSR-JBM) e-issn: 2278-487X, p-issn: 2319-7668 PP 43-50 www.iosrjournals.org A Survey on A High Performance Approximate Adder And Two High Performance Approximate

More information

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY JasbirKaur 1, Sumit Kumar 2 Asst. Professor, Department of E & CE, PEC University of Technology, Chandigarh, India 1 P.G. Student,

More information

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

Design A Redundant Binary Multiplier Using Dual Logic Level Technique Design A Redundant Binary Multiplier Using Dual Logic Level Technique Sreenivasa Rao Assistant Professor, Department of ECE, Santhiram Engineering College, Nandyala, A.P. Jayanthi M.Tech Scholar in VLSI,

More information

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier M.Shiva Krushna M.Tech, VLSI Design, Holy Mary Institute of Technology And Science, Hyderabad, T.S,

More information

Digital Integrated CircuitDesign

Digital Integrated CircuitDesign Digital Integrated CircuitDesign Lecture 13 Building Blocks (Multipliers) Register Adder Shift Register Adib Abrishamifar EE Department IUST Acknowledgement This lecture note has been summarized and categorized

More information

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm Vijay Dhar Maurya 1, Imran Ullah Khan 2 1 M.Tech Scholar, 2 Associate Professor (J), Department of

More information

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder High Speed Vedic Multiplier Designs Using Novel Carry Select Adder 1 chintakrindi Saikumar & 2 sk.sahir 1 (M.Tech) VLSI, Dept. of ECE Priyadarshini Institute of Technology & Management 2 Associate Professor,

More information

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India, ISSN 2319-8885 Vol.03,Issue.30 October-2014, Pages:5968-5972 www.ijsetr.com Low Power and Area-Efficient Carry Select Adder THANNEERU DHURGARAO 1, P.PRASANNA MURALI KRISHNA 2 1 PG Scholar, Dept of DECS,

More information

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen Abstract A new low area-cost FIR filter design is proposed using a modified Booth multiplier based on direct form

More information

ISSN Vol.07,Issue.08, July-2015, Pages:

ISSN Vol.07,Issue.08, July-2015, Pages: ISSN 2348 2370 Vol.07,Issue.08, July-2015, Pages:1397-1402 www.ijatir.org Implementation of 64-Bit Modified Wallace MAC Based On Multi-Operand Adders MIDDE SHEKAR 1, M. SWETHA 2 1 PG Scholar, Siddartha

More information

High Performance Low-Power Signed Multiplier

High Performance Low-Power Signed Multiplier High Performance Low-Power Signed Multiplier Amir R. Attarha Mehrdad Nourani VLSI Circuits & Systems Laboratory Department of Electrical and Computer Engineering University of Tehran, IRAN Email: attarha@khorshid.ece.ut.ac.ir

More information

A New Architecture for Signed Radix-2 m Pure Array Multipliers

A New Architecture for Signed Radix-2 m Pure Array Multipliers A New Architecture for Signed Radi-2 m Pure Array Multipliers Eduardo Costa Sergio Bampi José Monteiro UCPel, Pelotas, Brazil UFRGS, P. Alegre, Brazil IST/INESC, Lisboa, Portugal ecosta@atlas.ucpel.tche.br

More information

Faster and Low Power Twin Precision Multiplier

Faster and Low Power Twin Precision Multiplier Faster and Low Twin Precision V. Sreedeep, B. Ramkumar and Harish M Kittur Abstract- In this work faster unsigned multiplication has been achieved by using a combination High Performance Multiplication

More information

An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor

An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor HADI PARANDEH-AFSHAR, PHILIP BRISK, and PAOLO IENNE Ecole Polytechnique Federale de Lausanne (EPFL) To improve FPGA performance

More information

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER JDT-003-2013 LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER 1 Geetha.R, II M Tech, 2 Mrs.P.Thamarai, 3 Dr.T.V.Kirankumar 1 Dept of ECE, Bharath Institute of Science and Technology

More information

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER K. RAMAMOORTHY 1 T. CHELLADURAI 2 V. MANIKANDAN 3 1 Department of Electronics and Communication

More information

Review of Booth Algorithm for Design of Multiplier

Review of Booth Algorithm for Design of Multiplier Review of Booth Algorithm for Design of Multiplier N.VEDA KUMAR, THEEGALA DHIVYA Assistant Professor, M.TECH STUDENT Dept of ECE,Megha Institute of Engineering & Technology For womens,edulabad,ghatkesar

More information

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique G. Sai Krishna Master of Technology VLSI Design, Abstract: In electronics, an adder or summer is digital circuits that

More information

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST ǁ Volume 02 - Issue 01 ǁ January 2017 ǁ PP. 06-14 Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST Ms. Deepali P. Sukhdeve Assistant Professor Department

More information

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing Yelle Harika M.Tech, Joginpally B.R.Engineering College. P.N.V.M.Sastry M.S(ECE)(A.U), M.Tech(ECE), (Ph.D)ECE(JNTUH), PG DIP

More information

Design of an optimized multiplier based on approximation logic

Design of an optimized multiplier based on approximation logic ISSN:2348-2079 Volume-6 Issue-1 International Journal of Intellectual Advancements and Research in Engineering Computations Design of an optimized multiplier based on approximation logic Dhivya Bharathi

More information

Mahendra Engineering College, Namakkal, Tamilnadu, India.

Mahendra Engineering College, Namakkal, Tamilnadu, India. Implementation of Modified Booth Algorithm for Parallel MAC Stephen 1, Ravikumar. M 2 1 PG Scholar, ME (VLSI DESIGN), 2 Assistant Professor, Department ECE Mahendra Engineering College, Namakkal, Tamilnadu,

More information

A Novel Approach of an Efficient Booth Encoder for Signal Processing Applications

A Novel Approach of an Efficient Booth Encoder for Signal Processing Applications International Conference on Systems, Science, Control, Communication, Engineering and Technology 406 International Conference on Systems, Science, Control, Communication, Engineering and Technology 2016

More information

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors T.N.Priyatharshne Prof. L. Raja, M.E, (Ph.D) A. Vinodhini ME VLSI DESIGN Professor, ECE DEPT ME VLSI DESIGN

More information

A Review on Different Multiplier Techniques

A Review on Different Multiplier Techniques A Review on Different Multiplier Techniques B.Sudharani Research Scholar, Department of ECE S.V.U.College of Engineering Sri Venkateswara University Tirupati, Andhra Pradesh, India Dr.G.Sreenivasulu Professor

More information

A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE

A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE R.Mohanapriya #1, K. Rajesh*² # PG Scholar (VLSI Design), Knowledge Institute of Technology, Salem * Assistant

More information

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm V.Sandeep Kumar Assistant Professor, Indur Institute Of Engineering & Technology,Siddipet

More information

NOWADAYS, many Digital Signal Processing (DSP) applications,

NOWADAYS, many Digital Signal Processing (DSP) applications, 1 HUB-Floating-Point for improving FPGA implementations of DSP Applications Javier Hormigo, and Julio Villalba, Member, IEEE Abstract The increasing complexity of new digital signalprocessing applications

More information

Performance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems

Performance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

REALIAZATION OF LOW POWER VLSI ARCHITECTURE FOR RECONFIGURABLE FIR FILTER USING DYNAMIC SWITCHING ACITIVITY OF MULTIPLIERS

REALIAZATION OF LOW POWER VLSI ARCHITECTURE FOR RECONFIGURABLE FIR FILTER USING DYNAMIC SWITCHING ACITIVITY OF MULTIPLIERS REALIAZATION OF LOW POWER VLSI ARCHITECTURE FOR RECONFIGURABLE FIR FILTER USING DYNAMIC SWITCHING ACITIVITY OF MULTIPLIERS M. Sai Sri 1, K. Padma Vasavi 2 1 M. Tech -VLSID Student, Department of Electronics

More information

A Novel Approach For Designing A Low Power Parallel Prefix Adders

A Novel Approach For Designing A Low Power Parallel Prefix Adders A Novel Approach For Designing A Low Power Parallel Prefix Adders R.Chaitanyakumar M Tech student, Pragati Engineering College, Surampalem (A.P, IND). P.Sunitha Assistant Professor, Dept.of ECE Pragati

More information

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL E.Sangeetha 1 ASP and D.Tharaliga 2 Department of Electronics and Communication Engineering, Tagore College of Engineering and Technology,

More information

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

Design and Analysis of Row Bypass Multiplier using various logic Full Adders Design and Analysis of Row Bypass Multiplier using various logic Full Adders Dr.R.Naveen 1, S.A.Sivakumar 2, K.U.Abhinaya 3, N.Akilandeeswari 4, S.Anushya 5, M.A.Asuvanti 6 1 Associate Professor, 2 Assistant

More information

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS ( 1 Dr.V.Malleswara rao, 2 K.V.Ganesh, 3 P.Pavan Kumar) 1 Professor &HOD of ECE,GITAM University,Visakhapatnam. 2 Ph.D

More information

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay 1. K. Nivetha, PG Scholar, Dept of ECE, Nandha Engineering College, Erode. 2.

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Project Background High speed multiplication is another critical function in a range of very large scale integration (VLSI) applications. Multiplications are expensive and slow

More information

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL E.Deepthi, V.M.Rani, O.Manasa Abstract: This paper presents a performance analysis of carrylook-ahead-adder and carry

More information

High Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree

High Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree High Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree Alfiya V M, Meera Thampy Student, Dept. of ECE, Sree Narayana Gurukulam College of Engineering, Kadayiruppu, Ernakulam,

More information

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER American Journal of Applied Sciences 11 (2): 180-188, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.180.188 Published Online 11 (2) 2014 (http://www.thescipub.com/ajas.toc) AREA

More information

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA Sooraj.N.P. PG Scholar, Electronics & Communication Dept. Hindusthan Institute of Technology, Coimbatore,Anna University ABSTRACT Multiplications

More information

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier Gowridevi.B 1, Swamynathan.S.M 2, Gangadevi.B 3 1,2 Department of ECE, Kathir College of Engineering 3 Department of ECE,

More information

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE R.ARUN SEKAR 1 B.GOPINATH 2 1Department Of Electronics And Communication Engineering, Assistant Professor, SNS College Of Technology,

More information

IJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN

IJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN High throughput Modified Wallace MAC based on Multi operand Adders : 1 Menda Jaganmohanarao, 2 Arikathota Udaykumar 1 Student, 2 Assistant Professor 1,2 Sri Vekateswara College of Engineering and Technology,

More information

An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products

An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products 21st International Conference on VLSI Design An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products Sabyasachi Das Synplicity Inc Sunnyvale, CA, USA Email: sabya@synplicity.com

More information

An Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder

An Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder An Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder Sony Sethukumar, Prajeesh R, Sri Vellappally Natesan College of Engineering SVNCE, Kerala, India. Manukrishna

More information

A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS

A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS G RAMESH et al, Volume 2, Issue 7, PP:, SEPTEMBER 2014. A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS G.Ramesh 1*, K.Naga Lakshmi 2* 1. II. M.Tech (VLSI), Dept of ECE, AM Reddy Memorial College

More information

Design of Low Power Column bypass Multiplier using FPGA

Design of Low Power Column bypass Multiplier using FPGA Design of Low Power Column bypass Multiplier using FPGA J.sudha rani 1,R.N.S.Kalpana 2 Dept. of ECE 1, Assistant Professor,CVSR College of Engineering,Andhra pradesh, India, Assistant Professor 2,Dept.

More information

IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER

IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER ISSN: 0976-3104 Srividya. ARTICLE OPEN ACCESS IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER Srividya Sahyadri College of Engineering & Management, ECE Dept, Mangalore,

More information

Comparison of Conventional Multiplier with Bypass Zero Multiplier

Comparison of Conventional Multiplier with Bypass Zero Multiplier Comparison of Conventional Multiplier with Bypass Zero Multiplier 1 alyani Chetan umar, 2 Shrikant Deshmukh, 3 Prashant Gupta. M.tech VLSI Student SENSE Department, VIT University, Vellore, India. 632014.

More information

DESIGN OF LOW POWER MULTIPLIERS

DESIGN OF LOW POWER MULTIPLIERS DESIGN OF LOW POWER MULTIPLIERS GowthamPavanaskar, RakeshKamath.R, Rashmi, Naveena Guided by: DivyeshDivakar AssistantProfessor EEE department Canaraengineering college, Mangalore Abstract:With advances

More information

Design and Analyse Low Power Wallace Multiplier Using GDI Technique

Design and Analyse Low Power Wallace Multiplier Using GDI Technique IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 12, Issue 2, Ver. III (Mar.-Apr. 2017), PP 49-54 www.iosrjournals.org Design and Analyse

More information

A Novel FPGA Logic Block for Improved Arithmetic Performance

A Novel FPGA Logic Block for Improved Arithmetic Performance A Novel FPGA Logic Block for Improved Arithmetic Performance Hadi Parandeh-Afshar Philip Brisk Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) School of Computer and Communication Sciences

More information

Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition

Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition Thoka. Babu Rao 1, G. Kishore Kumar 2 1, M. Tech in VLSI & ES, Student at Velagapudi Ramakrishna

More information

REVIEW ARTICLE: EFFICIENT MULTIPLIER ARCHITECTURE IN VLSI DESIGN

REVIEW ARTICLE: EFFICIENT MULTIPLIER ARCHITECTURE IN VLSI DESIGN REVIEW ARTICLE: EFFICIENT MULTIPLIER ARCHITECTURE IN VLSI DESIGN M. JEEVITHA 1, R.MUTHAIAH 2, P.SWAMINATHAN 3 1 P.G. Scholar, School of Computing, SASTRA University, Tamilnadu, INDIA 2 Assoc. Prof., School

More information

Low-Power Multipliers with Data Wordlength Reduction

Low-Power Multipliers with Data Wordlength Reduction Low-Power Multipliers with Data Wordlength Reduction Kyungtae Han, Brian L. Evans, and Earl E. Swartzlander, Jr. Dept. of Electrical and Computer Engineering The University of Texas at Austin Austin, TX

More information

Keywords: Column bypassing multiplier, Modified booth algorithm, Spartan-3AN.

Keywords: Column bypassing multiplier, Modified booth algorithm, Spartan-3AN. Volume 4, Issue 5, May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Empirical Review

More information

High Performance 128 Bits Multiplexer Based MBE Multiplier for Signed-Unsigned Number Operating at 1GHz

High Performance 128 Bits Multiplexer Based MBE Multiplier for Signed-Unsigned Number Operating at 1GHz High Performance 128 Bits Multiplexer Based MBE Multiplier for Signed-Unsigned Number Operating at 1GHz Ravindra P Rajput Department of Electronics and Communication Engineering JSS Research Foundation,

More information

A Novel Approach to 32-Bit Approximate Adder

A Novel Approach to 32-Bit Approximate Adder A Novel Approach to 32-Bit Approximate Adder Shalini Singh 1, Ghanshyam Jangid 2 1 Department of Electronics and Communication, Gyan Vihar University, Jaipur, Rajasthan, India 2 Assistant Professor, Department

More information

Architectural Improvements for Field Programmable Counter Arrays: Enabling Efficient Synthesis of Fast Compressor Trees on FPGAs

Architectural Improvements for Field Programmable Counter Arrays: Enabling Efficient Synthesis of Fast Compressor Trees on FPGAs Architectural Improvements for Field Programmable Counter Arrays: Enabling Efficient Synthesis of Fast Compressor Trees on FPGAs Alessandro Cevrero,2, Panagiotis Athanasopoulos,2, Hadi Parandeh-Afshar

More information

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing 2015 International Conference on Computer Communication and Informatics (ICCCI -2015), Jan. 08 10, 2015, Coimbatore, INDIA Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing S.Padmapriya

More information

Using Soft Multipliers with Stratix & Stratix GX

Using Soft Multipliers with Stratix & Stratix GX Using Soft Multipliers with Stratix & Stratix GX Devices November 2002, ver. 2.0 Application Note 246 Introduction Traditionally, designers have been forced to make a tradeoff between the flexibility of

More information

An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2

An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2 An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2 1 M.Tech student, ECE, Sri Indu College of Engineering and Technology,

More information

Modified Design of High Speed Baugh Wooley Multiplier

Modified Design of High Speed Baugh Wooley Multiplier Modified Design of High Speed Baugh Wooley Multiplier 1 Yugvinder Dixit, 2 Amandeep Singh 1 Student, 2 Assistant Professor VLSI Design, Department of Electrical & Electronics Engineering, Lovely Professional

More information

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER International Journal of Advancements in Research & Technology, Volume 4, Issue 6, June -2015 31 A SPST BASED 16x16 MULTIPLIER FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

More information

AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS

AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS THIRUMALASETTY SRIKANTH 1*, GUNGI MANGARAO 2* 1. Dept of ECE, Malineni Lakshmaiah Engineering College, Andhra Pradesh, India. Email Id : srikanthmailid07@gmail.com

More information

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Design of Wallace Tree Multiplier using Compressors K.Gopi Krishna *1, B.Santhosh 2, V.Sridhar 3 gopikoleti@gmail.com Abstract

More information

6. DSP Blocks in Stratix II and Stratix II GX Devices

6. DSP Blocks in Stratix II and Stratix II GX Devices 6. SP Blocks in Stratix II and Stratix II GX evices SII52006-2.2 Introduction Stratix II and Stratix II GX devices have dedicated digital signal processing (SP) blocks optimized for SP applications requiring

More information

ISSN Vol.03,Issue.02, February-2014, Pages:

ISSN Vol.03,Issue.02, February-2014, Pages: www.semargroup.org, www.ijsetr.com ISSN 2319-8885 Vol.03,Issue.02, February-2014, Pages:0239-0244 Design and Implementation of High Speed Radix 8 Multiplier using 8:2 Compressors A.M.SRINIVASA CHARYULU

More information

Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique

Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique TALLURI ANUSHA *1, and D.DAYAKAR RAO #2 * Student (Dept of ECE-VLSI), Sree Vahini Institute of Science and Technology,

More information

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA 1. Vijaya kumar vadladi,m. Tech. Student (VLSID), Holy Mary Institute of Technology and Science, Keesara, R.R. Dt. 2.David Solomon Raju.Y,Associate

More information

Low power and Area Efficient MDC based FFT for Twin Data Streams

Low power and Area Efficient MDC based FFT for Twin Data Streams RESEARCH ARTICLE OPEN ACCESS Low power and Area Efficient MDC based FFT for Twin Data Streams M. Hemalatha 1, R. Ashok Chaitanya Varma 2 1 ( M.Tech -VLSID Student, Department of Electronics and Communications

More information

VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic

VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic DAVID NEUHÄUSER Friedrich Schiller University Department of Computer Science D-07737 Jena GERMANY dn@c3e.de

More information

A Design Approach for Compressor Based Approximate Multipliers

A Design Approach for Compressor Based Approximate Multipliers A Approach for Compressor Based Approximate Multipliers Naman Maheshwari Electrical & Electronics Engineering, Birla Institute of Technology & Science, Pilani, Rajasthan - 333031, India Email: naman.mah1993@gmail.com

More information

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter Dr.N.C.sendhilkumar, Assistant Professor Department of Electronics and Communication Engineering Sri

More information

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm Vijay Kumar Ch 1, Leelakrishna Muthyala 1, Chitra E 2 1 Research Scholar, VLSI, SRM University, Tamilnadu, India 2 Assistant Professor,

More information

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors M.Satheesh, D.Sri Hari Student, Dept of Electronics and Communication Engineering, Siddartha Educational Academy

More information

Verilog Implementation of 64-bit Redundant Binary Product generator using MBE

Verilog Implementation of 64-bit Redundant Binary Product generator using MBE Verilog Implementation of 64-bit Redundant Binary Product generator using MBE Santosh Kumar G.B 1, Mallikarjuna A 2 M.Tech (D.E), Dept. of ECE, BITM, Ballari, India 1 Assistant professor, Dept. of ECE,

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DESIGN OF LOW POWER MULTIPLIERS USING APPROXIMATE ADDER MR. PAWAN SONWANE 1, DR.

More information

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm M. Suhasini, K. Prabhu Kumar & P. Srinivas Department of Electronics & Comm. Engineering, Nimra College of Engineering

More information

HIGH SPEED FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS

HIGH SPEED FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS HIGH SPEED FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS Jeena James, Prof.Binu K Mathew 2, PG student, Associate Professor, Saintgits College of Engineering, Saintgits College of Engineering, MG University,

More information

Implementation of Efficient 16-Bit MAC Using Modified Booth Algorithm and Different Adders

Implementation of Efficient 16-Bit MAC Using Modified Booth Algorithm and Different Adders International Journal of Scientific and Research Publications, Volume 4, Issue 3, March 2014 1 Implementation of Efficient 16-Bit MAC Using Modified Booth Algorithm and Different s M.Karthikkumar, D.Manoranjitham,

More information

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits IOSR Journal of Electronics and Communication Engineering (IOSRJECE) ISSN: 2278-2834, ISBN No: 2278-8735 Volume 3, Issue 1 (Sep-Oct 2012), PP 07-11 A High Speed Wallace Tree Multiplier Using Modified Booth

More information

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1 Design Of Low Power Approximate Mirror Adder Sasikala.M 1, Dr.G.K.D.Prasanna Venkatesan 2 ME VLSI student 1, Vice Principal, Professor and Head/ECE 2 PGP college of Engineering and Technology Nammakkal,

More information

Design of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi

Design of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 105 Design of Baugh Wooley Multiplier with Adaptive Hold Logic M.Kavia, V.Meenakshi Abstract Mostly, the overall

More information

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng.

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng. MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng., UCLA - http://nanocad.ee.ucla.edu/ 1 Outline Introduction

More information

10. DSP Blocks in Arria GX Devices

10. DSP Blocks in Arria GX Devices 10. SP Blocks in Arria GX evices AGX52010-1.2 Introduction Arria TM GX devices have dedicated digital signal processing (SP) blocks optimized for SP applications requiring high data throughput. These SP

More information

S.Nagaraj 1, R.Mallikarjuna Reddy 2

S.Nagaraj 1, R.Mallikarjuna Reddy 2 FPGA Implementation of Modified Booth Multiplier S.Nagaraj, R.Mallikarjuna Reddy 2 Associate professor, Department of ECE, SVCET, Chittoor, nagarajsubramanyam@gmail.com 2 Associate professor, Department

More information

Efficient Dedicated Multiplication Blocks for 2 s Complement Radix-2m Array Multipliers

Efficient Dedicated Multiplication Blocks for 2 s Complement Radix-2m Array Multipliers 1502 JOURNAL OF COMPUTERS, VOL. 5, NO. 10, OCTOBER 2010 Efficient Dedicated Multiplication Blocks for 2 s Complement Radix-2m Array Multipliers Leandro Z. Pieper, Eduardo A. C. da Costa, Sérgio J. M. de

More information

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 5, Ver. II (Sep. - Oct. 2016), PP 15-21 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Globally Asynchronous Locally

More information

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. PP 42-46 www.iosrjournals.org Design and Simulation of Convolution Using Booth Encoded Wallace

More information

An Area Efficient Decomposed Approximate Multiplier for DCT Applications

An Area Efficient Decomposed Approximate Multiplier for DCT Applications An Area Efficient Decomposed Approximate Multiplier for DCT Applications K.Mohammed Rafi 1, M.P.Venkatesh 2 P.G. Student, Department of ECE, Shree Institute of Technical Education, Tirupati, India 1 Assistant

More information

International Journal of Modern Trends in Engineering and Research

International Journal of Modern Trends in Engineering and Research Scientific Journal Impact Factor (SJIF): 1.711 e-issn: 2349-9745 p-issn: 2393-8161 International Journal of Modern Trends in Engineering and Research www.ijmter.com FPGA Implementation of High Speed Architecture

More information

Jin-Hyuk Kim, Je-Huk Ryu, Jun-Dong Cho. Sungkyunkwan Univ.

Jin-Hyuk Kim, Je-Huk Ryu, Jun-Dong Cho. Sungkyunkwan Univ. A High Speed and Low Power VLSI Multiplier Using a Redundant Binary Booth Encoding Jin-Hyuk Kim, Je-Huk Ryu, Jun-Dong Cho School of Electrical and Computer Engineering Sungkyunkwan Univ. jhkim,compro@nature.skku.ac.kr,

More information