Transactions Briefs. Sorter Based Permutation Units for Media-Enhanced Microprocessors

Size: px
Start display at page:

Download "Transactions Briefs. Sorter Based Permutation Units for Media-Enhanced Microprocessors"

Transcription

1 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE Transactions Briefs Sorter Based Permutation Units for Media-Enhanced Microprocessors Giorgos Dimitrakopoulos, Christos Mavrokefalidis, Kostas Galanopoulos, and Dimitris Nikolos Abstract Single or multibit subword permutations are useful in many multimedia and cryptographic applications. Several specialized instructions have been proposed to handle the required data rearrangements. In this paper, we examine the hardware implementation of the powerful permutation instruction group (GRP). The design of the proposed permutation unit is based on the functionality of sorting networks. Two variants of the sorter-based GRP unit are introduced and analyzed and their energy-delay behavior is investigated using static CMOS implementations in a 130-nm CMOS technology. Index Terms Cryptography, data-rearrangement instructions, multimedia processors, permutation units, sorting networks. Fig. 1. Example of the execution of GRP. I. INTRODUCTION Multimedia applications constitute a large and increasing percentage of general purpose computing workload [1], [2]. One of the main characteristics of multimedia applications is that they deal with low precision data that exhibit high levels of data parallelism. In most cases, multimedia data are packed into subwords of 1 or 2 bytes that are processed in parallel in word oriented processors according to the single instruction multiple data (SIMD) paradigm [3], [4]. New instructions have been introduced to the instruction set of modern microprocessors, in order to efficiently handle subword operations and enhance the performance of software implemented multimedia algorithms. In order to fully exploit the subword parallel operations, the subwords need to be efficiently rearranged inside the registers in order to enhance the computation. Efficient handling of permutations is also needed for the software implementation of cryptographic algorithms in order to achieve the required throughput. The selection of efficient permutation instructions and the design of fast permutation units have recently attracted a lot of interest [5] [9]. Several instructions have been proposed for speeding up the execution of arbitrary bit permutations [5]. Among them, instruction GRP needs log 2 n instructions to generate an arbitrary permutation of n bits, assuming that at most n control bits are available for each operation. GRP has a versatile use and achieves greater speedup when used in cryptographic algorithms [5], [9]. GRP R D;R S;R C takes two source operands, the data and the control bits stored in R S and in R C, respectively, and generates one result that is stored in the destination register R D. The instruction divides the bits of R S in two groups based on the control bits of R C. If a control bit is 1, then the corresponding data bit of R S is put in the first group. Otherwise, the bit of R S is put in the second group. In the result, the relative position of the bits in each group remains unchanged. An example of the execution of GRP is shown in Manuscript received May 9, 2006; revised November 22, This work was supported by the European Social Fund (ESF), Operational Program for Educational and Vocational Training II (EPEAEK II), and particularly the program PYTHAGORAS. The authors are with the Technology and Computer Architecture Laboratory, Computer Engineering and Informatics Department, University of Patras, Patras 26500, Greece ( dimitrak@ceid.upatras.gr). Digital Object Identifier /TVLSI Fig bit sorter designed using (a) a bitonic and (b) a merge-sorting network. Fig. 1. Hardware implementations of GRP have already been presented in [10] and [11]. In this paper, new hardware implementations of the GRP instruction are introduced. In Section II, we present the main idea behind our work, showing that the proposed permutation units can follow a structure similar to sorting networks. In the following sections, we explore the architecture and the circuit implementation of two alternatives of the enhanced sorting networks that we propose for the design of the GRP execution unit. The efficiency of the proposed circuits has been validated using static CMOS implementations in a 130-nm CMOS technology. Experimental results are given in Section V and conclusions are drawn in Section VI. II. MAIN IDEA According to GRP, the data bits that are associated with a control bit equal to one, are concentrated to the left side of the output. This action resembles a sorting operation for the control bits, where the largest bits, i.e., bits equal to one, are gathered to the left. Therefore, the problem of designing a hardware unit that executes GRP, is equivalent to the design of a sorting network that sorts the control bits and simultaneously exchanges the positions of the corresponding data bits appropriately. The main problem that arises is, that apart from sorting the control bits, we must also ensure that the relative position of the data bits remains unchanged. We consider two cases of sorting networks [12], the bitonic and the merge-sorting network, which are shown in Fig. 2 for the case of 8 bits. Each sorting network is composed of the same compare element and the direction of its arrow denotes the final position of the maximum input. In Fig. 2 both networks sort the same binary word. This binary word corresponds to the control bits of the GRP instruction. We need to examine if, at the sorted output, the relative significance of the bits inside the two groups of ones and zeros, respectively, is preserved /$ IEEE

2 712 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007 Fig. 3. Implementation of GRP using two separate extraction units. Fig. 4. Block diagram of an 8-bit enhanced bitonic-sorting network, showing also the structure of the corresponding 2-, 4-, and 8-bit EBS units. At first, we consider the case of a bitonic-sorting network [see Fig. 2(a)]. The circuit is composed of appropriately connected subnetworks, called bitonic sorters [13]. A bitonic sorter can only sort bitonic sequences. A string of bits is a bitonic sequence, when it has the form or At each stage, two bitonic sequences are merged and a new double size bitonic sequence is produced. Bitonic sorting networks cannot preserve at the output the relative significance of equal bits. When two bitonic sequences are merged the bits are recursively divided in two independent halves. Therefore, when a bit is put in one half, it cannot regain its relative significance compared to the rest bits with the same value that were placed to the other half. Please notice the example of the two marked ones in Fig. 2(a). Although they are correctly sorted at the output (no zero bit exists in a more significant position), the routes that they followed have altered their relative significance. Therefore, if their associated data bits followed them, while they were sorted, they would be in wrong order according to the definition of GRP. The merge sorting network, shown in Fig. 2(b), follows a different sorting principle. At each stage, two already sorted sequences, i.e., , are merged to form a double size sorted sequence. The merging of the two sequences is carried out by the merge sorter. The merge sorter has the same number of compare levels as the bitonic sorter but requires less compare elements. A clear explanation for the construction of bitonic and merge sorting networks can be found in [14]. Again, as shown by the example of Fig. 2(b), the merge sorting network cannot preserve at the output the relative significance of equal bits. Since the bitonic and merge networks cannot preserve the relative significance of equal bits, the data associated with control bits equal to one and zero, respectively, are separately extracted from the input operand and aligned at the output. The general form of the two-datapath architecture is shown in Fig. 3. The left datapath is responsible for concentrating the data bits with a control bit equal to one to the left side of the result register. In the same manner, the right datapath that assumes complemented control bits, concentrates the rest data bits to the right side of the result register. The partial results of the two extraction units are unified with a logical OR operation. In order to allow the OR unification at the output, the input data bits are first masked with the corresponding control bits. The general architecture that uses two separate extraction units was also followed in previous GRP implementations [10], [11]. However, each extraction unit follows a completely different design principle and, as we will show in Section V, it leads to the less efficient circuits compared to the proposed sorter-based GRP units. III. ENHANCED BITONIC-SORTING NETWORK In the following, we derive a new structure, called enhanced bitonicsorting network (EBSN), that properly aligns the data bits with control bits equal to one to the left side of the output (left extraction unit). Then, following the two-datapath architecture of Fig. 3, a complete GRP unit can be easily derived. The general structure of an 8-bit EBSN is shown in Fig. 4. The EBSN takes two n-bit inputs; the data bits that need to be correctly aligned at the left side of the output, and the control bits that denote, which data bits should be selected. According to the architecture of Fig. 3, the data bits are at first masked with their associated control bits. Since we are not interested in the masked data bits that are set to zero, they are graphically represented as x values. The EBSN is composed of appropriately connected subnetworks called enhanced bitonic sorters (EBS). Each EBS has two sets of inputs. A set of control bits that should be in bitonic form and their associated data bits. The purpose of the EBS is to sort the control bits and appropriately move the corresponding data bits. Every two neighbor EBS units of Fig. 4 are of equal size and have opposite converging directions. This is required so as the newly derived double-size sequence of control bits, i.e., the combination of the outputs of two EBS units, to be in bitonic form. An EBS is effectively a butterfly network. The structure of an 8-bit EBS unit that sorts the maximum elements to the left, is shown in Fig. 4. The first role of the EBS unit is to sort the bitonic sequence of control bits. When two control bits are compared, their maximum is given by their boolean OR function, which appears at the output of the HB cell. Similarly, their minimum is given by their boolean AND function and is produced at the output of the LB cell. The other purpose of the EBS unit is to move appropriately the data bits with control bits equal to one. Since the control bits are in bitonic form, we can use only a subset of them in order to align the data bits. To see this we assume for the 8-bit case that the data and control bits are equal to xxcdexxx and , respectively. If we had an EBS unit that could correctly move the data bits, C; D; E, with control bits equal to one, to the left side of the output, then the result would be equal to CDExxxxx. We do not care about the relative significance of the x bits, since they are equal to zero. We can get the same result if we assume that all the n=2 less significant control bits are equal to one. Therefore, only the n=2 more significant of the input control bits are required to move appropriately the data bits. The rest can be safely assumed equal to one. Following the control bits simplification, we will describe how the data bits are moved to the output of the EBS via the 8-bit example

3 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE Fig. 5. Example of the functionality of the 8-bit EBS. shown in Fig. 5. The input data and control bits are shown in Fig. 5(a). Data bits A 7 ;A 6 ;A 5, and A 0 are x values but we keep their names for clarity. Also, as explained earlier, the control bit of A 0 can be safely assumed equal to one. The connections shown in Fig. 5(b) (d) represents the connections of the first, the second, and the third level, respectively, of the 8-bit EBS, shown in Fig. 4. The functionality of the first stage of the 8-bit EBS is shown in Fig. 5(b). The control bits that are equal to zero at the left end of the control word, represent empty positions that must be filled by the most significant data bits of the right half. Data bits A 3 ;A 2, and A 1 should be exchanged with A 7 ;A 6 ;A 5, that have control bits equal to zero, in order to fill the left half of the data word. In this way, although A 3;A 2, and A 1 have moved to the correct half of the output, their relative significance with respect to A 4, that also has a control bit equal to one, has been violated. At the output of the first stage, a new control bit, called swap bit, is generated for each data bit. Each swap bit indicates if the corresponding data bit has changed its position at the previous stage. Hence, swap bits equal to one are assigned to data bits A 3;A 2;A 1 and A 7 ;A 6 ;A 5. The swap bits associated with A 4 and A 0 are set to zero, because they did not change their position, and they are already at the correct half of the result. In general, in the first stage, an exchange between data bits A j+n=2 and A j ; 0 j<n=2 takes place, only when the corresponding control bits C j+n=2 of the upper half and C j of the lower half are equal to 0 and 1, respectively. Since all the control bits of the lower half are assumed equal to one, the condition for an exchange reduces to C j+n=2 =0. Hence, the newly generated swap bits are both set equal to C j+n=2. The implementation of the HB and LB cells used in the first level of every EBS is shown in Fig. 6. The AND/OR gates are used for sorting the initial control bits and the multiplexers that select between the corresponding data bits, are driven by the generated swap signals. In the following, each stage uses the swap bits generated at the output of the previous stage to correct the relative position of the data bits. When the swap bits of two data bits are different, it means that their relative position is not correct, and they should be exchanged. For instance, at the second stage [see Fig. 5(c)], data bits A 2;A 4, and A 6;A 0 have different swap bits. A 2 came from the lower half of the previous stage and passed over A 4 that did not move at the first stage. Hence, an exchange should take place between A 4 and A 2 in order to correct their relative significance. At the right half of the second stage, A 6 and A 0 also have different swap bits. A 6 came from the left half of the previous stage and it was placed to a more significant position compared to A 0. Since A 6 has been swapped in the previous stage, it means that it was compared to a data bit with a larger control bit, while A 0 was compared to a data bit with an equal control bit (equal to one). Thus, A 0 should be in a more significant position than A 6 in the final result. Therefore, an exchange between A 6 and A 0 should take place in order Fig. 6. Implementation of the HB and LB cells. to correct their relative position. At the output of the second stage, swap bits equal to one are assigned to the data bits that changed position at this stage. The remaining stages work in the same way due to the recursive nature of the butterfly network. Fig. 5(d) shows the last stage of the example, where data bits with different swap bits are exchanged. Since the last four data bits A 0 A 7 A 6 A 5 are from the beginning equal to zero [see Fig. 5(a)], the output is correct according to the functionality of the left extraction unit. In general, when the swap bits of two data bits are different, an exchange takes place and their new swap bits are set to one. Therefore, every new swap bit is the exclusive-or of the swap bits of the previous stage. The HB/LB cells that implement the remaining levels of the EBS are shown in Fig. 6. The complete enhanced bitonic sorting network [see Fig. 4] requires EBS units that can sort also the data and the control bits to the right direction. In order to configure the EBS to gather the corresponding data bits to the right end of the result, just two simple changes are required. In the first stage of each EBS, the control signals of the multiplexers should be changed from C j+n=2 to C j. This also changes the generation of the corresponding swap bits. Finally, in all stages of the EBS, the OR gates of the HB cells should be replaced by AND gates, while the AND gates of the LB cells should be transformed to OR gates. IV. ENHANCED MERGE-SORTING NETWORK In the case of merge sorting networks, each extraction unit is implemented using the proposed enhanced merge-sorting network (EMSN). The block diagram of an 8-bit EMSN-based left extraction unit is shown in Fig. 7. As in the case of EBSN, the merge-sorting network is also composed of appropriately connected subnetworks, called in this case enhanced merge-sorters (EMS). The purpose of the EMS unit is analogous to that of the EBS unit. However, due to the different structure of the merge-sort approach, new circuit modifications are required. The structure of an 8-bit EMS unit is shown in Fig. 7. In the case of EMS, the corresponding cells that perform the necessary data rearrangement, are denoted as HM and LM, respectively. Also, as in the case of bitonic sorting, only the n=2 more significant control bits are required to guide the data bits rearrangement and the rest control bits can be assumed equal to one. The functionality of the EMS unit will be clarified using the example of Fig. 8 for the 8-bit case. The connections illustrated in Fig. 8(b) (d) represents the connections of the 8-bit EMS unit that is shown in Fig. 7.

4 714 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007 Fig. 9. Implementation of the HM/LM cell used in all stages of the EMS following the first. Fig bit enhanced merge-sorting network and the structure of the 2-, 4-, and 8-bit EMS units. Fig. 8. Example of the functionality of the 8-bit EMS. In Fig. 8(a), the most significant control bits of the upper part that are equal to one, suggest that no exchange should be performed at those positions. The corresponding data bits A 7 and A 6 are in the correct positions and should remain at the same positions at the output. This constraint holds for all compare levels of the EMS unit. Thus, in all levels, the HM and LM compare-and-exchange cells that receive an input from the left part of the EMS (from bit position j with j n=2) should not perform any comparison, when the input control bit C j =1 (Barrier Constraint I). On the other hand, the zero control bits of the upper part denote empty positions that should be filled by the data bits of the lower part with control bits equal to one. Following the connections of the first level of the 8-bit EMS shown also in Fig. 8(b), the only exchanges that are allowed at the first level, are between data bits A 1 ;A 0 and A 5 ;A 4, respectively. After this operation two things occur. Data bits A 1 and A 0 have violated their relative significance with A 3 and A 2 that have control bits equal to one, while A 4 and A 5 have been correctly placed at the tail of the result. In general, an exchange between data bits A j+n=2 and A j ; 0 j<n=2 is performed at the first stage only when C j+n=2 =0. Following this principle, we guarantee that the most significant data bits of the upper part will never lose their position, and the useless data bits with a zero control bit will be correctly moved to the right end of the result. In our example, all other data bits should be rearranged between A 7A 6 and A 5A 4. We should not allow any other exchange operation to be performed at bit positions 0 and 1, where A 4 and A 5 have moved. In general, the number of bit positions that should be blocked in the right part of the EMS is equal to the number of zero control bits of the upper part. Therefore, in all remaining levels the HM and LM cells that receive an input from the jth bit position j<n=2, that belongs to the right part of the EMS, should not perform any comparison, when the input control bit C j+n=2 =0(Barrier Constraint II). The barrier constraints are set only by the n=2 more significant input control bits and block the compare-and-exchange operations of the left and the right end of the EMS, respectively. As in the case of bitonic-sort, the data bits that have exchanged their positions at the first stage, are assigned a swap signal equal to one, while the rest, get a swap signal equal to zero [see Fig. 8(b)]. It should be noted that the swap signal generation is only required for the positions where the HM and LM cells are present in the next level. For the rest positions, it is implicitly equal to zero. For example, due to the structure of an 8-bit EMS, bit positions 7, 6, 1, and 0 directly get a zero swap signal even if an exchange has been performed at those positions. The functionality of the remaining stages is determined by the swap signals that behave exactly the same way as in the case of the EBS. Nevertheless, for EMS, Barrier Constraints I and II should be also taken into account. An exchange is performed at the HM/LM cells, when the input swap signals are different and the barrier signals of the corresponding bit positions are both deasserted. If one of the barrier signals is asserted, no exchange is performed at the HM/LM cells. It should be noted that each swap signal is directly associated with a data bit and follows the same route in the network. On the contrary, the barrier constraints that are produced after the first level, describe a property of specific bit positions of the network and remain unchanged for all compare levels of the EMS. The barrier B j associated with the jth bit position is equal to C j for j > n=2 (left part), while in case that j < n=2 (right part) B j = C j+n=2. Following this rule, and returning back to the example of Fig. 8(c), data bits A 3 and A 2 are exchanged with A 1 and A 0, respectively, since they have different swap signals and the operation is not blocked by any barrier constraint. The last level of the EMS network is shown in Fig. 8(d). Data bits A 2 and A 1 are correctly placed and do not move since they have equal swap signals. Also, although A 6 and A 3 have different swap signals, they are not exchanged since the barrier associated with the 6th bit position is equal to one (Barrier Constraint I, B 6 = C 6 =1). The same holds for A 0 and A 5. In the latter case, Barrier Constraint II is satisfied since B 1 = C 5 =1. The connections and the functionality of the first level of the EMS is the same with the functionality of the first level of the EBS. Therefore, the HM/LM cells of the first level are implemented in the same way as in the case of EBS and their circuit implementation is shown in Fig. 6. For the remaining levels of the EMS, the implementation of the HM/LM cells is shown in Fig. 9. The AND/OR gates that are used to sort the input control bits, are not blocked by any constraint and do not interfere with the data-exchange operation.

5 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE the analysis presented in [11], the most efficient previous GRP implementation achieves a minimum delay of around 38 FO4 assuming equal input and output capacitances, which, under heavy output loading conditions, is rather unrealistic. Thus, the best of the proposed 64-bit GRP units achieves significant delay reductions compared to previous solutions that range between 8% and 28%. The energy-delay behavior of the most practical case of a GRP unit that supports a minimum subword of 1 byte on a 64-bit word using the two variants of the enhanced sorting network, is shown in Fig. 10(c). The circuits are sized for the 25/100 ff input/output capacitance case. The EMSN-based GRP unit is again the fastest solution achieving a minimum delay of 11FO4. This delay is roughly equal to a 64-bit static CMOS Radix-2 adder implemented in the same technology. However, the energy spent per addition is significantly smaller. Again, the EBSN approach is the worst solution, since, for every delay target, requires more energy than the EMSN. Fig. 10. Energy-delay curves for the 64-bit GRP units implemented with the EBSN and EMSN, respectively, (a) C =C = 25/100 ff and (b) C =C = 5/25 ff. (c) The energy-delay curve for a subword oriented GRP unit that supports a minumum subword size of 1 byte. V. EXPERIMENTAL RESULTS The proposed circuits have been evaluated using static CMOS implementations in a 130-nm CMOS technology. The delay measurements for all examined designs are reported in fanout-of-4 inverter delays (FO4). The FO4 delay metric equals to the delay of an inverter that drives four equally-sized inverters, and it is used since it provides in some sense a technology independent way to express the delay of a circuit. In order to explore the energy-delay space for each design, we performed gate sizing for several delay targets, beginning from the circuit s minimum achievable delay. Circuit sizing is performed using geometric-programming-based optimization [15]. For the derived gate sizes, the energy and the delay of each circuit have been measured in HSpice. For the energy measurements, we assumed random inputs that caused on average 30% switching activity. Interstage wiring loads have also been taken into account. The RC contribution of each wire has been estimated according to its length, assuming a bit pitch of 16 metal-1 tracks. At first, 64-bit GRP units have been evaluated. The energy-delay curves of the EBSN and the EMSN-based GRP units, under two different loading conditions and for the typical process corner, are shown in Fig. 10(a) and (b). For the data of Fig. 10(a), we assumed during optimization and measurements that the outputs of the circuit are loaded with a capacitance of 100 ff and that the maximum allowable input capacitance of each circuit is less than 25 ff, which corresponds to a ratio of 4 for the circuit s output to input capacitance. In Fig. 10(b), both circuits are optimized for lighter load, since the assumed input and output capacitance is equal to 5 and 25 ff, respectively. In both cases, the GRP unit designed using the EMSN is faster and requires less energy per operation for the same delay compared to EBSN. This result is justified from the reduced number of compare-and-exchange cells required by the EMSN, which leads to less fanout on the cells outputs and simplifies wiring. From Fig. 10(a) and (b), it is derived that the energy required by the EMSN is less by 10% 75% compared to that of the EBSN for equal delay. The delay of the EMSN-based GRP unit reported in Fig. 10(a) and (b) is roughly between 27 and 35 FO4. Based on VI. CONCLUSION A novel framework for the design of GRP permutation units has been presented in this paper. The functionality of GRP has been transformed to a sorting problem, and two enhanced sorting networks have been derived. Each one of the proposed circuits is designed using a single processing cell, while the connections between the cells are regular and are well suited for a dense datapath-style layout. Also, the speed/energy savings of the proposed solutions does not come from any special or tricky circuit technique that will not be viable in future technologies, but is a result of a new algorithmic and logic-level approach. Therefore, we believe that the proposed designs are scalable to any future technology. REFERENCES [1] K. Diefendorff and P. K. Dubey, How mutimedia will change processor design, IEEE Computer, vol. 30, no. 9, pp , Sep [2] N. T. Slingerland and A. J. Smith, Multimedia extensions for general purpose microprocessors: A survey, Microprocess. Microsyst., vol. 29, no. 5, pp , [3] T. Conte et al., Challenges to combining general-purpose and multimedia processors, IEEE Computer, vol. 30, no. 12, pp , Dec [4] I. Kuroda and T. Nishitani, Multimedia processors, Proc. IEEE, vol. 86, no. 6, pp , Jun [5] R. B. Lee, Z. Shi, and X. Yang, Efficient permutation instructions for fast software cryptography, IEEE Micro, vol. 21, no. 6, pp , Nov./Dec [6] X. Yang and R. B. Lee, Fast subword permutation instructions using omega and flip network stages, in Proc. IEEE Int. Conf. Comput. Design, 2000, pp [7] J. P. McGregor and R. B. Lee, Architectural enhancements for fast subword permutations with repetitions in cryptographic applications, in Proc. IEEE Int. Conf. Comput. Design, 2001, pp [8] Z. Shi and R. B. Lee, Bit permutation instructions for accelerating software cryptography, in Proc. IEEE Conf. Application-Specific Syst., Arch. Process., 2000, pp [9] Z. J. Shi, Bit permutation instructions: Architecture, implementation and cryptographic properties, Ph.D. dissertation, Electr. Eng. Dept., Princeton Univ., Princeton, NJ, [10] Z. J. Shi and R. B. Lee, Implementation complexity of bit permutation instructions, in Proc. Asilomar Conf. Signals, Syst. Comput., 2003, pp [11] Y. Hilewitz, Z. J. Shi, and R. B. Lee, Comparing fast implementations of bit permutation instructions, in Proc. Asilomar Conf. Signals, Syst. Comput., 2004, pp [12] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd ed. Cambridge, MA: MIT Press, [13] K. E. Batcher, Sorting networks and their applications, in Proc. AFIPS Joint Comput. Conf., 1968, pp [14] Z. Hong and R. Sedgewick, Notes on merging networks, in Proc. ACM Symp. Theory Comput., 1982, pp [15] S. P. Boyd, S. J. Kim, D. Patil, and M. A. Horowitz, Digital circuit optimization via geometric programming, Oper. Res., vol. 53, no. 6, pp , Nov./Dec

How a processor can permute n bits in O(1) cycles

How a processor can permute n bits in O(1) cycles How a processor can permute n bits in O(1) cycles Ruby Lee, Zhijie Shi, Xiao Yang Princeton Architecture Lab for Multimedia and Security (PALMS) Department of Electrical Engineering Princeton University

More information

DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER

DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER 1 SAROJ P. SAHU, 2 RASHMI KEOTE 1 M.tech IVth Sem( Electronics Engg.), 2 Assistant Professor,Yeshwantrao Chavan College of Engineering,

More information

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India, ISSN 2319-8885 Vol.03,Issue.30 October-2014, Pages:5968-5972 www.ijsetr.com Low Power and Area-Efficient Carry Select Adder THANNEERU DHURGARAO 1, P.PRASANNA MURALI KRISHNA 2 1 PG Scholar, Dept of DECS,

More information

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER K. RAMAMOORTHY 1 T. CHELLADURAI 2 V. MANIKANDAN 3 1 Department of Electronics and Communication

More information

TECHNOLOGY scaling, aided by innovative circuit techniques,

TECHNOLOGY scaling, aided by innovative circuit techniques, 122 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 2, FEBRUARY 2006 Energy Optimization of Pipelined Digital Systems Using Circuit Sizing and Supply Scaling Hoang Q. Dao,

More information

Comparing Fast Implementations of Bit Permutation Instructions

Comparing Fast Implementations of Bit Permutation Instructions Comparing Fast Implementations of Bit Permutation Instructions Yedidya Hilewitz 1, Zhijie Jerry Shi 2 and Ruby B. Lee 1 Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA,

More information

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique G. Sai Krishna Master of Technology VLSI Design, Abstract: In electronics, an adder or summer is digital circuits that

More information

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL E.Sangeetha 1 ASP and D.Tharaliga 2 Department of Electronics and Communication Engineering, Tagore College of Engineering and Technology,

More information

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors T.N.Priyatharshne Prof. L. Raja, M.E, (Ph.D) A. Vinodhini ME VLSI DESIGN Professor, ECE DEPT ME VLSI DESIGN

More information

SINGLE CYCLE TREE 64 BIT BINARY COMPARATOR WITH CONSTANT DELAY LOGIC

SINGLE CYCLE TREE 64 BIT BINARY COMPARATOR WITH CONSTANT DELAY LOGIC SINGLE CYCLE TREE 64 BIT BINARY COMPARATOR WITH CONSTANT DELAY LOGIC 1 LAVANYA.D, 2 MANIKANDAN.T, Dept. of Electronics and communication Engineering PGP college of Engineering and Techonology, Namakkal,

More information

Bit Permutation Instructions for Accelerating Software Cryptography

Bit Permutation Instructions for Accelerating Software Cryptography Bit Permutation Instructions for Accelerating Software Cryptography Zhijie Shi, Ruby B. Lee Department of Electrical Engineering, Princeton University {zshi, rblee}@ee.princeton.edu Abstract Permutation

More information

Faster and Low Power Twin Precision Multiplier

Faster and Low Power Twin Precision Multiplier Faster and Low Twin Precision V. Sreedeep, B. Ramkumar and Harish M Kittur Abstract- In this work faster unsigned multiplication has been achieved by using a combination High Performance Multiplication

More information

EFFICIENT VLSI IMPLEMENTATION OF A SEQUENTIAL FINITE FIELD MULTIPLIER USING REORDERED NORMAL BASIS IN DOMINO LOGIC

EFFICIENT VLSI IMPLEMENTATION OF A SEQUENTIAL FINITE FIELD MULTIPLIER USING REORDERED NORMAL BASIS IN DOMINO LOGIC EFFICIENT VLSI IMPLEMENTATION OF A SEQUENTIAL FINITE FIELD MULTIPLIER USING REORDERED NORMAL BASIS IN DOMINO LOGIC P.NAGA SUDHAKAR 1, S.NAZMA 2 1 Assistant Professor, Dept of ECE, CBIT, Proddutur, AP,

More information

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay 1. K. Nivetha, PG Scholar, Dept of ECE, Nandha Engineering College, Erode. 2.

More information

On Permutation Operations in Cipher Design

On Permutation Operations in Cipher Design On Permutation Operations in Cipher Design Ruby B. Lee, Z. J. Shi and Y. L. Yin Princeton University Department of Electrical Engineering B-218, Engineering Quadrangle Princeton, NJ 08544, U.S.A. Email:

More information

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY JasbirKaur 1, Sumit Kumar 2 Asst. Professor, Department of E & CE, PEC University of Technology, Chandigarh, India 1 P.G. Student,

More information

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology Inf. Sci. Lett. 2, No. 3, 159-164 (2013) 159 Information Sciences Letters An International Journal http://dx.doi.org/10.12785/isl/020305 A New network multiplier using modified high order encoder and optimized

More information

Permutation Operations in Block Ciphers

Permutation Operations in Block Ciphers Chapter I Permutation Operations in Block Ciphers R. B. Lee I.1, I.2,R.L.Rivest I.3,M.J.B.Robshaw I.4, Z. J. Shi I.2,Y.L.Yin I.2 New and emerging applications can change the mix of operations commonly

More information

FPGA Implementation of Area-Delay and Power Efficient Carry Select Adder

FPGA Implementation of Area-Delay and Power Efficient Carry Select Adder International Journal of Innovative Research in Electronics and Communications (IJIREC) Volume 2, Issue 8, 2015, PP 37-49 ISSN 2349-4042 (Print) & ISSN 2349-4050 (Online) www.arcjournals.org FPGA Implementation

More information

Design and Implementation of Complex Multiplier Using Compressors

Design and Implementation of Complex Multiplier Using Compressors Design and Implementation of Complex Multiplier Using Compressors Abstract: In this paper, a low-power high speed Complex Multiplier using compressor circuit is proposed for fast digital arithmetic integrated

More information

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Design of Wallace Tree Multiplier using Compressors K.Gopi Krishna *1, B.Santhosh 2, V.Sridhar 3 gopikoleti@gmail.com Abstract

More information

An Area Efficient Decomposed Approximate Multiplier for DCT Applications

An Area Efficient Decomposed Approximate Multiplier for DCT Applications An Area Efficient Decomposed Approximate Multiplier for DCT Applications K.Mohammed Rafi 1, M.P.Venkatesh 2 P.G. Student, Department of ECE, Shree Institute of Technical Education, Tirupati, India 1 Assistant

More information

Computer Architecture and Organization:

Computer Architecture and Organization: Computer Architecture and Organization: L03: Register transfer and System Bus By: A. H. Abdul Hafez Abdul.hafez@hku.edu.tr, ah.abdulhafez@gmail.com 1 CAO, by Dr. A.H. Abdul Hafez, CE Dept. HKU Outlines

More information

A Taxonomy of Parallel Prefix Networks

A Taxonomy of Parallel Prefix Networks A Taxonomy of Parallel Prefix Networks David Harris Harvey Mudd College / Sun Microsystems Laboratories 31 E. Twelfth St. Claremont, CA 91711 David_Harris@hmc.edu Abstract - Parallel prefix networks are

More information

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers Dharmapuri Ranga Rajini 1 M.Ramana Reddy 2 rangarajini.d@gmail.com 1 ramanareddy055@gmail.com 2 1 PG Scholar, Dept

More information

An Optimized Design for Parallel MAC based on Radix-4 MBA

An Optimized Design for Parallel MAC based on Radix-4 MBA An Optimized Design for Parallel MAC based on Radix-4 MBA R.M.N.M.Varaprasad, M.Satyanarayana Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India Abstract In this paper a novel architecture

More information

An Efficent Real Time Analysis of Carry Select Adder

An Efficent Real Time Analysis of Carry Select Adder An Efficent Real Time Analysis of Carry Select Adder Geetika Gesu Department of Electronics Engineering Abha Gaikwad-Patil College of Engineering Nagpur, Maharashtra, India E-mail: geetikagesu@gmail.com

More information

Data Word Length Reduction for Low-Power DSP Software

Data Word Length Reduction for Low-Power DSP Software EE382C: LITERATURE SURVEY, APRIL 2, 2004 1 Data Word Length Reduction for Low-Power DSP Software Kyungtae Han Abstract The increasing demand for portable computing accelerates the study of minimizing power

More information

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture N.SALMASULTHANA 1, R.PURUSHOTHAM NAIK 2 1Asst.Prof, Electronics & Communication Engineering, Princeton College of engineering

More information

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers IOSR Journal of Business and Management (IOSR-JBM) e-issn: 2278-487X, p-issn: 2319-7668 PP 43-50 www.iosrjournals.org A Survey on A High Performance Approximate Adder And Two High Performance Approximate

More information

SUCCESSIVE approximation register (SAR) analog-todigital

SUCCESSIVE approximation register (SAR) analog-todigital 426 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 62, NO. 5, MAY 2015 A Novel Hybrid Radix-/Radix-2 SAR ADC With Fast Convergence and Low Hardware Complexity Manzur Rahman, Arindam

More information

AN EFFICIENT DESIGN OF ROBA MULTIPLIERS 1 BADDI. MOUNIKA, 2 V. RAMA RAO M.Tech, Assistant professor

AN EFFICIENT DESIGN OF ROBA MULTIPLIERS 1 BADDI. MOUNIKA, 2 V. RAMA RAO M.Tech, Assistant professor AN EFFICIENT DESIGN OF ROBA MULTIPLIERS 1 BADDI. MOUNIKA, 2 V. RAMA RAO M.Tech, Assistant professor 1,2 Eluru College of Engineering and Technology, Duggirala, Pedavegi, West Godavari, Andhra Pradesh,

More information

PRIORITY encoder (PE) is a particular circuit that resolves

PRIORITY encoder (PE) is a particular circuit that resolves 1102 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 64, NO. 9, SEPTEMBER 2017 A Scalable High-Performance Priority Encoder Using 1D-Array to 2D-Array Conversion Xuan-Thuan Nguyen, Student

More information

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL E.Deepthi, V.M.Rani, O.Manasa Abstract: This paper presents a performance analysis of carrylook-ahead-adder and carry

More information

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension Monisha.T.S 1, Senthil Prakash.K 2 1 PG Student, ECE, Velalar College of Engineering and Technology

More information

A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages

A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages Jalluri srinivisu,(m.tech),email Id: jsvasu494@gmail.com Ch.Prabhakar,M.tech,Assoc.Prof,Email Id: skytechsolutions2015@gmail.com

More information

Design of Parallel Algorithms. Communication Algorithms

Design of Parallel Algorithms. Communication Algorithms + Design of Parallel Algorithms Communication Algorithms + Topic Overview n One-to-All Broadcast and All-to-One Reduction n All-to-All Broadcast and Reduction n All-Reduce and Prefix-Sum Operations n Scatter

More information

Techniques for Generating Sudoku Instances

Techniques for Generating Sudoku Instances Chapter Techniques for Generating Sudoku Instances Overview Sudoku puzzles become worldwide popular among many players in different intellectual levels. In this chapter, we are going to discuss different

More information

A New Configurable Full Adder For Low Power Applications

A New Configurable Full Adder For Low Power Applications A New Configurable Full Adder For Low Power Applications Astha Sharma 1, Zoonubiya Ali 2 PG Student, Department of Electronics & Telecommunication Engineering, Disha Institute of Management & Technology

More information

Design of Parallel Prefix Tree Based High Speed Scalable CMOS Comparator for converters

Design of Parallel Prefix Tree Based High Speed Scalable CMOS Comparator for converters Design of Parallel Prefix Tree Based High Speed Scalable CMOS Comparator for converters 1 M. Gokilavani PG Scholar, Department of ECE, Indus College of Engineering, Coimbatore, India. 2 P. Niranjana Devi

More information

DESIGN AND IMPLEMENTATION OF 128-BIT QUANTUM-DOT CELLULAR AUTOMATA ADDER

DESIGN AND IMPLEMENTATION OF 128-BIT QUANTUM-DOT CELLULAR AUTOMATA ADDER DESIGN AND IMPLEMENTATION OF 128-BIT QUANTUM-DOT CELLULAR AUTOMATA ADDER 1 K.RAVITHEJA, 2 G.VASANTHA, 3 I.SUNEETHA 1 student, Dept of Electronics & Communication Engineering, Annamacharya Institute of

More information

RECENT technology trends have lead to an increase in

RECENT technology trends have lead to an increase in IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004 1581 Noise Analysis Methodology for Partially Depleted SOI Circuits Mini Nanua and David Blaauw Abstract In partially depleted silicon-on-insulator

More information

Efficient Carry Select Adder Using VLSI Techniques With Advantages of Area, Delay And Power

Efficient Carry Select Adder Using VLSI Techniques With Advantages of Area, Delay And Power Efficient Carry Select Adder Using VLSI Techniques With Advantages of Area, Delay And Power Abstract: Carry Select Adder (CSLA) is one of the high speed adders used in many computational systems to perform

More information

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm V.Sandeep Kumar Assistant Professor, Indur Institute Of Engineering & Technology,Siddipet

More information

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER International Journal of Advancements in Research & Technology, Volume 4, Issue 6, June -2015 31 A SPST BASED 16x16 MULTIPLIER FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

More information

SURVEY AND EVALUATION OF LOW-POWER FULL-ADDER CELLS

SURVEY AND EVALUATION OF LOW-POWER FULL-ADDER CELLS SURVEY ND EVLUTION OF LOW-POWER FULL-DDER CELLS hmed Sayed and Hussain l-saad Department of Electrical & Computer Engineering University of California Davis, C, U.S.. STRCT In this paper, we survey various

More information

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA Shaik Magbul Basha 1 L. Srinivas Reddy 2 magbul1000@gmail.com 1 lsr.ngi@gmail.com 2 1 UG Scholar, Dept of ECE, Nalanda Group of Institutions,

More information

CENTRALIZED BUFFERING AND LOOKAHEAD WAVELENGTH CONVERSION IN MULTISTAGE INTERCONNECTION NETWORKS

CENTRALIZED BUFFERING AND LOOKAHEAD WAVELENGTH CONVERSION IN MULTISTAGE INTERCONNECTION NETWORKS CENTRALIZED BUFFERING AND LOOKAHEAD WAVELENGTH CONVERSION IN MULTISTAGE INTERCONNECTION NETWORKS Mohammed Amer Arafah, Nasir Hussain, Victor O. K. Li, Department of Computer Engineering, College of Computer

More information

Comparative Analysis of Multiplier in Quaternary logic

Comparative Analysis of Multiplier in Quaternary logic IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 3, Ver. I (May - Jun. 2015), PP 06-11 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Comparative Analysis of Multiplier

More information

High Speed, Low power and Area Efficient Processor Design Using Square Root Carry Select Adder

High Speed, Low power and Area Efficient Processor Design Using Square Root Carry Select Adder IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 9, Issue 2, Ver. VII (Mar - Apr. 2014), PP 14-18 High Speed, Low power and Area Efficient

More information

Performance Comparison of VLSI Adders Using Logical Effort 1

Performance Comparison of VLSI Adders Using Logical Effort 1 Performance Comparison of VLSI Adders Using Logical Effort 1 Hoang Q. Dao and Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory Department of Electrical and Computer Engineering University

More information

Implementation of Memory Less Based Low-Complexity CODECS

Implementation of Memory Less Based Low-Complexity CODECS Implementation of Memory Less Based Low-Complexity CODECS K.Vijayalakshmi, I.V.G Manohar & L. Srinivas Department of Electronics and Communication Engineering, Nalanda Institute Of Engineering And Technology,

More information

Mixed Synchronous/Asynchronous State Memory for Low Power FSM Design

Mixed Synchronous/Asynchronous State Memory for Low Power FSM Design Mixed Synchronous/Asynchronous State Memory for Low Power FSM Design Cao Cao and Bengt Oelmann Department of Information Technology and Media, Mid-Sweden University S-851 70 Sundsvall, Sweden {cao.cao@mh.se}

More information

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder Implementation of 5-bit High Speed and Area Efficient Carry Select Adder C. Sudarshan Babu, Dr. P. Ramana Reddy, Dept. of ECE, Jawaharlal Nehru Technological University, Anantapur, AP, India Abstract Implementation

More information

A Novel 128-Bit QCA Adder

A Novel 128-Bit QCA Adder International Journal of Emerging Engineering Research and Technology Volume 2, Issue 5, August 2014, PP 81-88 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) A Novel 128-Bit QCA Adder V Ravichandran

More information

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS JDT-002-2013 EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS E. Prakash 1, R. Raju 2, Dr.R. Varatharajan 3 1 PG Student, Department of Electronics and Communication Engineeering

More information

Design and Implementation of 128-bit SQRT-CSLA using Area-delaypower efficient CSLA

Design and Implementation of 128-bit SQRT-CSLA using Area-delaypower efficient CSLA International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-56 Volume: 3 Issue: 8 Aug-26 www.irjet.net p-issn: 2395-72 Design and Implementation of 28-bit SQRT-CSLA using Area-delaypower

More information

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

Design A Redundant Binary Multiplier Using Dual Logic Level Technique Design A Redundant Binary Multiplier Using Dual Logic Level Technique Sreenivasa Rao Assistant Professor, Department of ECE, Santhiram Engineering College, Nandyala, A.P. Jayanthi M.Tech Scholar in VLSI,

More information

Design and Implementation of High Speed Carry Select Adder

Design and Implementation of High Speed Carry Select Adder Design and Implementation of High Speed Carry Select Adder P.Prashanti Digital Systems Engineering (M.E) ECE Department University College of Engineering Osmania University, Hyderabad, Andhra Pradesh -500

More information

Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition

Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition Thoka. Babu Rao 1, G. Kishore Kumar 2 1, M. Tech in VLSI & ES, Student at Velagapudi Ramakrishna

More information

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier M.Shiva Krushna M.Tech, VLSI Design, Holy Mary Institute of Technology And Science, Hyderabad, T.S,

More information

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1 Design Of Low Power Approximate Mirror Adder Sasikala.M 1, Dr.G.K.D.Prasanna Venkatesan 2 ME VLSI student 1, Vice Principal, Professor and Head/ECE 2 PGP college of Engineering and Technology Nammakkal,

More information

PROCESS and environment parameter variations in scaled

PROCESS and environment parameter variations in scaled 1078 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 10, OCTOBER 2006 Reversed Temperature-Dependent Propagation Delay Characteristics in Nanometer CMOS Circuits Ranjith Kumar

More information

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA 1. Vijaya kumar vadladi,m. Tech. Student (VLSID), Holy Mary Institute of Technology and Science, Keesara, R.R. Dt. 2.David Solomon Raju.Y,Associate

More information

Low-Power Multipliers with Data Wordlength Reduction

Low-Power Multipliers with Data Wordlength Reduction Low-Power Multipliers with Data Wordlength Reduction Kyungtae Han, Brian L. Evans, and Earl E. Swartzlander, Jr. Dept. of Electrical and Computer Engineering The University of Texas at Austin Austin, TX

More information

CS256 Applied Theory of Computation

CS256 Applied Theory of Computation CS256 Applied Theory of Computation Parallel Computation III John E Savage Overview Mapping normal algorithms to meshes Shuffle operations on linear arrays Shuffle operations on two-dimensional arrays

More information

A Highly Efficient Carry Select Adder

A Highly Efficient Carry Select Adder IJSTE - International Journal of Science Technology & Engineering Volume 2 Issue 4 October 2015 ISSN (online): 2349-784X A Highly Efficient Carry Select Adder Shiya Andrews V PG Student Department of Electronics

More information

DESIGN OF RING OSCILLATOR USING CS-CMOS FOR MIXED SIGNAL SOCS

DESIGN OF RING OSCILLATOR USING CS-CMOS FOR MIXED SIGNAL SOCS International Journal of Electrical and Electronics Engineering (IJEEE) ISSN 2278-9944 Vol. 2, Issue 2, May 2013, 21-26 IASET DESIGN OF RING OSCILLATOR USING CS-CMOS FOR MIXED SIGNAL SOCS VINOD KUMAR &

More information

Design and Analysis of CMOS based Low Power Carry Select Full Adder

Design and Analysis of CMOS based Low Power Carry Select Full Adder Design and Analysis of CMOS based Low Power Carry Select Full Adder Mayank Sharma 1, Himanshu Prakash Rajput 2 1 Department of Electronics & Communication Engineering Hindustan College of Science & Technology,

More information

Exploiting Regularity for Low-Power Design

Exploiting Regularity for Low-Power Design Reprint from Proceedings of the International Conference on Computer-Aided Design, 996 Exploiting Regularity for Low-Power Design Renu Mehra and Jan Rabaey Department of Electrical Engineering and Computer

More information

AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS

AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS THIRUMALASETTY SRIKANTH 1*, GUNGI MANGARAO 2* 1. Dept of ECE, Malineni Lakshmaiah Engineering College, Andhra Pradesh, India. Email Id : srikanthmailid07@gmail.com

More information

Design of an optimized multiplier based on approximation logic

Design of an optimized multiplier based on approximation logic ISSN:2348-2079 Volume-6 Issue-1 International Journal of Intellectual Advancements and Research in Engineering Computations Design of an optimized multiplier based on approximation logic Dhivya Bharathi

More information

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER 1 CH.JAYA PRAKASH, 2 P.HAREESH, 3 SK. FARISHMA 1&2 Assistant Professor, Dept. of ECE, 3 M.Tech-Student, Sir CR Reddy College

More information

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm Vijay Dhar Maurya 1, Imran Ullah Khan 2 1 M.Tech Scholar, 2 Associate Professor (J), Department of

More information

Low Power and Area EfficientALU Design

Low Power and Area EfficientALU Design Low Power and Area EfficientALU Design A.Sowmya, Dr.B.K.Madhavi ABSTRACT: This project work undertaken, aims at designing 8-bit ALU with carry select adder. An arithmetic logic unit acts as the basic building

More information

Design and Implementation of Carry Select Adder Using Binary to Excess-One Converter

Design and Implementation of Carry Select Adder Using Binary to Excess-One Converter Design and Implementation of Carry Select Adder Using Binary to Excess-One Converter Paluri Nagaraja 1 Kanumuri Koteswara Rao 2 Nagaraja.paluri@gmail.com 1 koti_r@yahoo.com 2 1 PG Scholar, Dept of ECE,

More information

Fast Placement Optimization of Power Supply Pads

Fast Placement Optimization of Power Supply Pads Fast Placement Optimization of Power Supply Pads Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of Illinois at Urbana-Champaign

More information

High Performance Low-Power Signed Multiplier

High Performance Low-Power Signed Multiplier High Performance Low-Power Signed Multiplier Amir R. Attarha Mehrdad Nourani VLSI Circuits & Systems Laboratory Department of Electrical and Computer Engineering University of Tehran, IRAN Email: attarha@khorshid.ece.ut.ac.ir

More information

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder High Speed Vedic Multiplier Designs Using Novel Carry Select Adder 1 chintakrindi Saikumar & 2 sk.sahir 1 (M.Tech) VLSI, Dept. of ECE Priyadarshini Institute of Technology & Management 2 Associate Professor,

More information

Combining Multipath and Single-Path Time-Interleaved Delta-Sigma Modulators Ahmed Gharbiya and David A. Johns

Combining Multipath and Single-Path Time-Interleaved Delta-Sigma Modulators Ahmed Gharbiya and David A. Johns 1224 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 12, DECEMBER 2008 Combining Multipath and Single-Path Time-Interleaved Delta-Sigma Modulators Ahmed Gharbiya and David A.

More information

Index Terms: Low Power, CSLA, Area Efficient, BEC.

Index Terms: Low Power, CSLA, Area Efficient, BEC. Modified LowPower and AreaEfficient Carry Select Adder using DLatch Veena V Nair MTech student, ECE Department, Mangalam College of Engineering, Kottayam, India Abstract Carry Select Adder (CSLA) is one

More information

Design and Performance Analysis of a Reconfigurable Fir Filter

Design and Performance Analysis of a Reconfigurable Fir Filter Design and Performance Analysis of a Reconfigurable Fir Filter S.karthick Department of ECE Bannari Amman Institute of Technology Sathyamangalam INDIA Dr.s.valarmathy Department of ECE Bannari Amman Institute

More information

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER 1 ZUBER M. PATEL 1 S V National Institute of Technology, Surat, Gujarat, Inida E-mail: zuber_patel@rediffmail.com Abstract- This paper presents

More information

NOWADAYS, many Digital Signal Processing (DSP) applications,

NOWADAYS, many Digital Signal Processing (DSP) applications, 1 HUB-Floating-Point for improving FPGA implementations of DSP Applications Javier Hormigo, and Julio Villalba, Member, IEEE Abstract The increasing complexity of new digital signalprocessing applications

More information

LOW-POWER FFT VIA REDUCED PRECISION

LOW-POWER FFT VIA REDUCED PRECISION LOW-POWER FFT VIA REDUCED PRECISION REDUNDANCY Srinivasa R. Sridhara and Naresh R. Shanbhag Coordinated Science LaboratoryECE Dcpartmcnt University of Illinois at Urbana-Champaign 1308 West Main Street,

More information

An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products

An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products 21st International Conference on VLSI Design An Inversion-Based Synthesis Approach for Area and Power efficient Arithmetic Sum-of-Products Sabyasachi Das Synplicity Inc Sunnyvale, CA, USA Email: sabya@synplicity.com

More information

Design as You See FIT: System-Level Soft Error Analysis of Sequential Circuits

Design as You See FIT: System-Level Soft Error Analysis of Sequential Circuits Design as You See FIT: System-Level Soft Error Analysis of Sequential Circuits Dan Holcomb Wenchao Li Sanjit A. Seshia Department of EECS University of California, Berkeley Design Automation and Test in

More information

Fixed Point Lms Adaptive Filter Using Partial Product Generator

Fixed Point Lms Adaptive Filter Using Partial Product Generator Fixed Point Lms Adaptive Filter Using Partial Product Generator Vidyamol S M.Tech Vlsi And Embedded System Ma College Of Engineering, Kothamangalam,India vidyas.saji@gmail.com Abstract The area and power

More information

Lecture 11: Clocking

Lecture 11: Clocking High Speed CMOS VLSI Design Lecture 11: Clocking (c) 1997 David Harris 1.0 Introduction We have seen that generating and distributing clocks with little skew is essential to high speed circuit design.

More information

Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL

Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL 1 Shaik. Mahaboob Subhani 2 L.Srinivas Reddy Subhanisk491@gmal.com 1 lsr@ngi.ac.in 2 1 PG Scholar Dept of ECE Nalanda

More information

International Journal of Modern Trends in Engineering and Research

International Journal of Modern Trends in Engineering and Research Scientific Journal Impact Factor (SJIF): 1.711 e-issn: 2349-9745 p-issn: 2393-8161 International Journal of Modern Trends in Engineering and Research www.ijmter.com FPGA Implementation of High Speed Architecture

More information

MOS CURRENT MODE LOGIC BASED PRIORITY ENCODERS

MOS CURRENT MODE LOGIC BASED PRIORITY ENCODERS MOS CURRENT MODE LOGIC BASED PRIORITY ENCODERS Neeta Pandey 1, Kirti Gupta 2, Stuti Gupta 1, Suman Kumari 1 1 Dept. of Electronics and Communication, Delhi Technological University, New Delhi (India) 2

More information

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA Sooraj.N.P. PG Scholar, Electronics & Communication Dept. Hindusthan Institute of Technology, Coimbatore,Anna University ABSTRACT Multiplications

More information

PUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER CSEA2012 ISSN: ; e-issn:

PUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER   CSEA2012 ISSN: ; e-issn: New BEC Design For Efficient Multiplier NAGESWARARAO CHINTAPANTI, KISHORE.A, SAROJA.BODA, MUNISHANKAR Dept. of Electronics & Communication Engineering, Siddartha Institute of Science And Technology Puttur

More information

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST ǁ Volume 02 - Issue 01 ǁ January 2017 ǁ PP. 06-14 Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST Ms. Deepali P. Sukhdeve Assistant Professor Department

More information

Design of 32-bit Carry Select Adder with Reduced Area

Design of 32-bit Carry Select Adder with Reduced Area Design of 32-bit Carry Select Adder with Reduced Area Yamini Devi Ykuntam M.V.Nageswara Rao G.R.Locharla ABSTRACT Addition is the heart of arithmetic unit and the arithmetic unit is often the work horse

More information

Totally Self-Checking Carry-Select Adder Design Based on Two-Rail Code

Totally Self-Checking Carry-Select Adder Design Based on Two-Rail Code Totally Self-Checking Carry-Select Adder Design Based on Two-Rail Code Shao-Hui Shieh and Ming-En Lee Department of Electronic Engineering, National Chin-Yi University of Technology, ssh@ncut.edu.tw, s497332@student.ncut.edu.tw

More information

Lossy Compression of Permutations

Lossy Compression of Permutations 204 IEEE International Symposium on Information Theory Lossy Compression of Permutations Da Wang EECS Dept., MIT Cambridge, MA, USA Email: dawang@mit.edu Arya Mazumdar ECE Dept., Univ. of Minnesota Twin

More information

CHAPTER 5 DESIGN OF COMBINATIONAL LOGIC CIRCUITS IN QCA

CHAPTER 5 DESIGN OF COMBINATIONAL LOGIC CIRCUITS IN QCA 90 CHAPTER 5 DESIGN OF COMBINATIONAL LOGIC CIRCUITS IN QCA 5.1 INTRODUCTION A combinational circuit consists of logic gates whose outputs at any time are determined directly from the present combination

More information

A High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction

A High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction 1514 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 8, DECEMBER 2000 A High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction Bai-Jue Shieh, Yew-San Lee,

More information