Optimization and Modeling of FPGA Circuitry in Advanced Process Technology. Charles Chiasson

Size: px
Start display at page:

Download "Optimization and Modeling of FPGA Circuitry in Advanced Process Technology. Charles Chiasson"

Transcription

1 Optimization and Modeling of FPGA Circuitry in Advanced Process Technology by Charles Chiasson A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto Copyright 2013 by Charles Chiasson

2 Abstract Optimization and Modeling of FPGA Circuitry in Advanced Process Technology Charles Chiasson Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2013 We develop a new fully-automated transistor sizing tool for FPGAs that features area, delay and wire load modeling enhancements over prior work to improve its accuracy in advanced process nodes. We then use this tool to investigate a number of FPGA circuit design related questions in a 22nm process. We find that building FPGAs out of transmission gates instead of the currently dominant pass-transistors, whose performance and reliability are degrading with technology scaling, yields FPGAs that are 15% larger but are 10-25% faster depending on the allowable level of gate boosting. We also show that transmission gate FPGAs with a separate power supply for their gate terminal enable a low-voltage FPGA with 50% less power and good delay. Finally, we show that, at a possible cost in routability, restricting the portion of a routing channel that can be accessed by a logic block input can improve delay by 17%. ii

3 Acknowledgements First, I would like to express my sincerest gratitude to my supervisor Vaughn Betz for his guidance and motivation, for his technical help and for the tidbits of wisdom that he shared with me, knowingly or unknowingly, over the past two years. I learned so much in so little time and cannot imagine having had a better mentor. I also extend thanks to the other graduate students in Vaughn Betz s research group for all their help and support. Also, thanks for the lunch outings, the coffee breaks and the squash matches, among other things, that provided those much needed distractions. I would like to thank the Natural Sciences and Engineering Research Council of Canada, Altera Corporation and the University of Toronto for their financial support. Thanks also go to CMC Microsystems for providing the CAD tools used throughout this research. I would also like to thank David Lewis from Altera Corporation for the insightful discussions. Finally, thanks must undoubtedly go to my parents for nurturing my inherent desire to know why, for making me love the smell of new books, and simply, for being the best parents a kid could ask for. All of my accomplishments are certainly due to them. iii

4 Contents 1 Introduction Motivation Thesis Organization Background FPGA Architecture Logic Block Architecture Routing Architecture Commercial BLE Architectures FPGA Architecture Assessment Methodology FPGA Circuit Design SRAM cells Routing Multiplexers Lookup Tables Flip-Flops Modeling of FPGA Circuitry Area Modeling Delay Modeling Automated Transistor Sizing COFFE: Automated Optimization of FPGA Circuitry Introduction to COFFE Architecture Circuit Topologies Area Modeling Delay Modeling Non-Linearity of Transistor Resistance and Capacitance Topology Dependence of Transistor Resistance Wire Load Modeling Transistor Sizing Algorithm Divide-and-Conquer Pre-Determined P/N Ratios Detailed Algorithm Impact of Improved Wire Load Modeling iv

5 3.8.1 Base Architecture Target Process Technology Results Integration of COFFE with VPR Efficient FPGA Circuitry F c out for Single-Driver Routing and Multiple BLE Outputs Transmission Gate FPGAs Pass-Transistor Scaling Challenges Replacing Pass-Transistors with Transmission Gates Gate-Boosting Strategy Methodology Results Area and Delay Breakdown Separating V DD and V G for Low-Power FPGAs Track-Access Locality Conclusions and Future Work Summary Future Work A N-well Sharing Sample Layout 51 B FPGA Circuitry Schematics 53 C Detailed Transistor Sizing Results 56 D Area and Delay Breakdown 61 Bibliography 62 v

6 List of Tables 3.1 COFFE s expected input architecture parameters Resistance of a 4 minimum-width NMOS transistor for different circuit topologies (Figure 3.9) and switching-thresholds Rise-fall re-balancing and the effect of M on COFFE s transistor sizing solutions (example) Base architecture parameters Subcircuit count per tile for base architecture Metal layer data used by COFFE for all circuit design investigations (ITRS [19]) Impact of wire loading Effect of F c out on channel width and switch block multiplexers Area and delay for different F c out values Pass-transistor and transmission gate FPGA tile area for different levels of gate boosting Switch block multiplexer transistor sizes for PT and TG implementations for different levels of gate boosting (see Figure 4.2 for transistor labels). Note that with the exception of P/N ratios, COFFE uses integer granularity Pass-transistor and transmission gate FPGA critical path delay for different levels of gate boosting (VTR benchmarks) Pass-transistor and transmission gate FPGA area-delay product for different levels of gate boosting (VTR benchmarks) Pass-transistor and transmission gate FPGA relative dynamic power for different levels of gate boosting (VTR benchmarks) Effect of cluster output track-access locality on area and delay. Input track-access span is set to Effect of cluster input track-access locality on area and delay. Output track-access span is set to C.1 Lookup table transistor sizes C.2 Switch block multiplexer transistor sizes C.3 Connection block multiplexer transistor sizes C.4 Local routing multiplexer transistor sizes. Note: We don t give a size for buf2 of the local routing multiplexer as it is replaced by the LUT input driver of Figure B C.5 BLE output to local interconnect C.6 BLE output to general routing C.7 Flip-flop and register selection multiplexer transistor sizes vi

7 C.8 LUT input driver A C.9 LUT input driver B C.10 LUT input driver C with register feedback multiplexer (Figure B.3) C.11 LUT input driver D C.12 LUT input driver E C.13 LUT input driver F D.1 Tile area breakdown D.2 Critical path delay breakdown vii

8 List of Figures 1.1 Architecture exploration with manual (a) and automated (b) transistor-level design Tile-based FPGA Basic logic element (BLE) Logic cluster architecture Routing segment lengths Multi-driver and single-driver routing architectures FPGA architecture assessment methodology with VPR Six transistor SRAM cell Different 8:1 pass-transistor multiplexer topologies Multiplexer followed by two-stage buffer with PMOS level-restorer Fully encoded MUX tree 3-LUT Minimum-width transistor area model Increasing drive strength with diffusion widening (b) or parallel diffusion regions (c). Note: Although not shown in the figure for simplicity, parallel diffusions must be connected together FPGA design flow COFFE s supported tile architecture COFFE s routing multiplexer circuit topologies Fully encoded MUX tree 6-LUT with internal re-buffering (partial view) Static transmission gate-based master-slave register Transistor area prediction accuracy of original (Eq. 2.2) and improved (Eq. 3.1) area models against TSMC 65nm layouts Combining diffusion widening and parallel diffusion regions yields denser layouts (c) A switch-level model Circuits used to measure transistor resistance Inverter NMOS and PMOS resistivity vs. transistor width COFFE s transistor sizing algorithm V DD and V T h scaling trends [12] Generic two-level routing multiplexer with two-stage buffer implemented with pass-transistors (a) and transmission gates (b) Effect of different gate boosting strategies on transmission gate switch block multiplexer delay (V DD = 0.8V ) viii

9 4.4 CAD flow for each FPGA Tile area and critical path delay breakdown Critical path delay for pass-transistor (PT) and transmission gate (TG) FPGAs for different V DD and V G Dynamic power for pass-transistor (PT) and transmission gate (TG) FPGAs for different V DD and V G Power-delay product for pass-transisor (PT) and transmission gate (TG) FPGAs for different V DD and V G Cluster output wire load for different locality Cluster input wire load for different locality A.1 A single-level 4:1 pass-transistor multiplexer with two-stage buffer and level restorer A.2 Sample multiplexer layout with N-well sharing B.1 6-LUT B.2 LUT input driver B.3 LUT input driver with register feedback multiplexer B.4 Two-level multiplexer used for switch block, connection block and local routing multiplexers. 54 B.5 2:1 multiplexer used for BLE outputs B.6 Flip-flop with register input selection multiplexer ix

10 Chapter 1 Introduction 1.1 Motivation The design and fabrication of modern digital integrated circuits costs tens to hundreds of millions of dollars, requires large teams of engineers and years of effort. Indeed, the cost of developping a new 20nm chip has been estimated to be as high as $160 million USD [1]. While this may be acceptable for high-volume applications, it can be a significant burden for lower-volume designs, often preventing them from being fabricated in the latest process technologies. Instead of being fabricated as a custom chip such as a standard cell-based application-specific integrated circuit (ASIC) or a full custom design, a digital design can be implemented in a field-programmable gate array (FPGA). FPGAs are pre-fabricated, programmable devices into which one can implement any arbitrary digital design in a matter of seconds. Therefore, FPGAs are an attractive alternative to ASICs or full custom designs because they allow the high non-recurring engineering costs and lengthy design times associated with semiconductor manufacturing to be completely avoided. However, digital designs that require high-density, high-performance or low-power might not find FPGAs as attractive. It has been shown that FPGAs require 35 more silicon area, are 4 slower and consume 14 more dynamic power than ASICs [25]. Accordingly, minimizing the FPGA-to-ASIC gap, that is, making FPGAs as efficient as possible such that they become a competitive implementation medium for all types of applications, is one of the primary drivers of FPGA research for both academic researchers and commercial FPGA manufacturers. The area, performance and power characteristics of an FPGA can be optimized at two main levels: architecture and transistor-level design. The architecture of an FPGA is defined by a number of parameters that describe the style and flexibility of its soft-logic blocks, dedicated hard-blocks and interconnect. Finding an architecture that meets specific design goals and constraints involves setting these architectural parameters to specific values. However, these parameters interact in complex ways to produce area, delay and power trade-offs that are very difficult to quantify through analytical methods. For that reason, finding the right architectural parameter values is usually accomplished experimentally with automated architecture explorations tools such as VPR [7]. For any architecture, there are a number of different transistor-level implementations. Transistorlevel design consists of choosing circuit topologies for an architecture as well as sizing the transistors of those circuits. Both circuit topology selection and transistor sizing provide opportunities to optimize the area, delay and power of the architecture. In prior FPGA research work, transistor-level design was often 1

11 Chapter 1. Introduction 2 Initial architecture parameters Manual transistorlevel design Initial architecture parameters Automated transistorlevel design Evaluate architecture Evaluate architecture Change architecture parameters Change architecture parameters (a) Manual transistor-level design. (b) Automated transistor-level design. Figure 1.1: Architecture exploration with manual (a) and automated (b) transistor-level design. performed manually making it a task that required a significant amount of time and effort. This often had a negative impact on the architecture exploration flow, which would proceed as follows. Manual transistor-level design would be performed on some initial architecture. Then, this architecture would be assessed with an architecture exploration tool such as VPR. Based on the results of the assessment, the architecture parameters would be adjusted and the evaluation process would be repeated. Ideally, one would then re-optimize the transistor-level design to match the new architecture parameters. However, since manual transistor-level design was such a time and effort intensive task, this step would often be skipped. It was assumed that transistor sizes obtained with a previous architecture still applied to the new architecture and this new architecture was then evaluated without re-optimizing its transistor-level design. This architecture exploration flow is illustrated in Figure 1.1a. The new architecture could likely be made more efficient if it s transistor sizes were re-optimized. As well, the detailed impact of new wire loads as the architecture and its area changed have often not been rigorously modeled, possibly leading to inaccurate architecture conclusions. In an environment where FPGAs need to be as efficient as possible to compete with ASICs, new architectures should be evaluated in their most efficient state. It follows that re-optimizing the transistor sizes as the FPGA architecture is changed provides a more thorough design space exploration and should yield more efficient FPGAs. Automating the transistor-level design of FPGAs enables such frequent re-optimization (Figure 1.1b). In addition, an automated transistor-level design tool facilitates investigations relating to efficient FPGA circuitry. For example, an automated transistor-level design tool could be used to explore the impact of different circuit topologies or the impact of different layout choices on the area, delay and power of an FPGA. This thesis consists of two parts. In the first, we develop COFFE (Circuit Optimization For FPGA Exploration), a new fully-automated transistor sizing tool for FPGAs. Although an FPGA-specific transistor sizing tool has been developed in prior work [24], we have made significant improvements that are necessary in advanced process nodes. In the second part of this thesis, we use COFFE to investigate a number of circuit design related questions in advanced process technology.

12 Chapter 1. Introduction Thesis Organization This thesis is organized as follows. Chapter 2 provides background information on FPGA architecture, circuit design, modeling and optimization. Chapter 3 describes COFFE, a fully-automated transistor sizing tool for FPGAs developed as part of this thesis, as well as our area and delay modeling enhancements. A number of FPGA circuit design investigations are performed with COFFE in Chapter 4. Finally, Chapter 5 concludes this thesis and suggests future work.

13 Chapter 2 Background This thesis is focused on the transistor-level design of SRAM-based FPGAs and related computer-aided design (CAD) tools. We develop a fully-automated transistor sizing tool for FPGAs in Chapter 3 and use it to investigate a number of FPGA circuit design related questions in Chapter 4. This chapter provides relevant background material. First, we review FPGA architecture and the standard FPGA architecture assessment methodology. Then, we describe common practices in FPGA circuit design as well as commonly used area and delay modeling techniques for these circuits. Finally, we review prior work on automated transistor sizing. 2.1 FPGA Architecture An FPGA consists of an array of tiles that can each implement a small amount of logic and routing. Horizontal and vertical routing channels run on top of the tiles and allow them to be stitched together to perform larger functions. Figure 2.1 illustrates FPGA tile architecture at a high-level. A logic block (LB) supplies the tile s logic functionality. Connection blocks (CBs) provide connectivity between logic block inputs and routing channels. A switch block (SB) connects logic block outputs to routing channels and provides connectivity between wires within the routing channels. One replicates this basic tile to obtain a complete FPGA. Although Figure 2.1 shows logic and switching functions as distinct sub-blocks, an interleaved layout is more realistic and is what we assume throughout this work. The FPGA architecture described in the previous paragraph represents a generic soft-logic-based FPGA. Modern FPGAs are more heterogeneous. That is, in addition to general purpose soft-logic blocks, they also contain dedicated hard-blocks such as multipliers, block memories or even embedded processors [36, 51, 4, 38]. In this work, we focus on the architecture and circuit design of the soft-logic portion as it still forms the backbone of an FPGA and typically accounts for a large fraction of it s area 1 and critical path delay as shown in Section However, since hard-blocks are an important part of modern FPGA architectures, all our VPR [7] experiments are performed with architecture files that contain multipliers and block memories along with our soft-logic blocks. We use the same multiplier and block memory designs across all our VPR experiments, and hence they are constant and do not affect the conclusions of our soft-logic investigations. 1 In [50], it was reported that the core area of the largest Stratix III FPGA consists of 72% soft-logic and associated programmable routing; the other 28% being block memory and multipliers. 4

14 Chapter 2. Background 5 FPGA Tile LB CB CB SB Routing Channel Figure 2.1: Tile-based FPGA. K-LUT FF Figure 2.2: Basic logic element (BLE) Logic Block Architecture Most FPGAs are built around the idea of using lookup tables (LUTs) to implement logic functions. A K-input LUT can implement any combinational logic function of K inputs. Since digital designs are rarely purely combinational, the basic logic element (BLE) of an FPGA consists of a K-LUT and a flip-flop (FF) that both feed a 2:1 multiplexer which allows the output of the BLE to be driven by either the LUT output or the FF output as illustrated in Figure 2.2 [7]. Although an FPGA logic block could consist of a single BLE, it is much more common to group several BLEs together in the same logic block to form a locally interconnected logic cluster as this fast local interconnect can improve performance and save general routing area [7, 2]. The number of inputs to a LUT (K) and the number of BLEs in a logic cluster (N) are two important architectural parameters affecting the area and performance of an FPGA. Ahmed and Rose showed in [2] that K = 4 to 6 and N = 4 to 10 are good choices in terms of area-delay product. Modern commercial architectures use comparable values for N and K (Virtex 7: K=6, N=8 [51] and Stratix V: K=6, N=10 [35]). As illustrated in Figure 2.3, a logic cluster s local interconnect consists of two types of wires: local feedback wires and cluster input wires. There are typically N local feedback wires in a cluster; one for each BLE. Often, many BLEs in a cluster will share common inputs. Accordingly, the number of inputs to a cluster (I) is less than the number of distinct BLE inputs in a cluster (i.e. N K). It was shown in [2] that (2.1) is a good estimate of the number of inputs required to achieve 98% LUT utilization. I = K (N + 1) (2.1) 2

15 Chapter 2. Background 6 Local feedback wires K-LUT FF K local routing MUXes per BLE Total of N BLEs BLE BLE with internal details shown N BLE outputs BLE I cluster inputs Figure 2.3: Logic cluster architecture. Local routing multiplexers connect multiple local interconnect wires to each BLE input. These multiplexers are generally sparsely populated [29]. That is, BLE inputs can be connected to only a fraction of the wires in the local interconnect; we refer to this fraction as F c local. Sparsely populating the local routing multiplexers reduces their size and thus saves area. In [29], it was shown that reducing F c local from 1.0 to 0.5 reduces area by 10% with no degradation in critical path delay. However, as recommended by [29], between 2 to 5 spare cluster inputs should be added to (2.1) when sparsely populating the local routing multiplexers to maintain routability Routing Architecture Logic blocks are interconnected by programmable routing channels that run horizontally and vertically on top of a tile (Figure 2.1). The number of tracks in a routing channel is refered to as its width (W). In this work, we assume that the width of horizontal and vertical routing channels are equivalent, but it is possible for them to be different. For example, the horizontal routing channels on Stratix FPGAs are wider than the vertical channels due to the rectangular layout of their logic blocks [34]. A routing track is composed of wire segments that span one or more tiles. The length (L) of a routing segment specifies the number of tiles that it spans. For example, Figure 2.4 shows a routing channel that consists of four tracks of L = 2 wire segments and four tracks of L = 4 wire segments. Note that staggering the start point of wire segments as in Figure 2.4 is necessary for a tile-based layout as it ensures that all tiles remain identical [8]. A horizontal and a vertical routing channel intersect at each tile. The set of programmable switches that allow connections to be made between routing tracks at this intersection is called a switch block (SB in Figure 2.1). Switch block flexibility (F S ) specifies the number tracks to which any track can connect in a switch block. An F S of 3, where each horizontal track connects to another horizontal track and two vertical tracks (and vice-versa), is common [49]. The specific tracks to which each track connects is determined by the switch block pattern [7, 37] as well as the routing driver architecture. In

16 Chapter 2. Background 7 FPGA tiles Length 2 wire segments Length 4 wire segments Figure 2.4: Routing segment lengths. a multi-driver routing architecture (Figure 2.5a), a wire can be driven by multiple tri-state drivers at multiple points along its length. In contrast, in a single-driver routing architecture (Figure 2.5b), a wire can only be driven by a single multiplexer-based driver usually placed at one end of the wire. Figures 2.5a and 2.5b also show that logic block outputs connect to the routing tracks differently based on the routing driver architecture. That is, multi-driver architectures connect logic block outputs directly to the routing wires while single-driver architectures connect logic block outputs to the routing wires through switch-block multiplexers. Although multi-driver routing architectures have been widely used in the past [7, 2], single-driver routing has become the dominant routing architecture style in both academic research [28, 27, 24] and commercial FPGAs [34, 33]. In this work, we focus on single-driver routing architectures. In [28], Lemieux et al. found that FPGAs with single-driver routing had 9% lower delay and were 25% smaller than FPGAs with multi-driver routing. Connection block multiplexers connect multiple routing tracks to each logic block input (see Figure 2.5). The number of tracks that can connect to each logic block input is called the connection block input flexibility (F c in ). Similarly, the number routing wires that each logic block output can connect to is given by the connection block output flexibility (F c out ). Reducing F c in from W to 0.2W as the logic cluster size increases from N = 1 to 20 and using an F c out of W/N were found to be good choices in [7]. These interconnect flexibility values have generally been used as rules of thumb in subsequent FPGA research Commercial BLE Architectures The BLEs of modern commercial FPGAs are much more complex than the commonly used academic BLE described in Section (Figure 2.2). Instead of a single K-LUT, some modern FPGA architectures [33, 35, 51] use fracturable LUTs, which are LUTs that can be configured as one large LUT or multiple smaller LUTs. For example, the Stratix V fracturable 6-LUT can be split into two 5-LUTs or four 4-LUTs provided that the functions being mapped to these LUTs meet certain constraints [35]. Modern BLEs also commonly support configuring LUTs as memories (LUTRAM) or shift registers and usually contain hard arithmetic carry logic [35, 52]. However, to keep the scope of this work tractable, we only consider regular K-LUTs, which are still relevant, and we do not consider carry logic as current academic CAD tools do not fully support this functionality. The commonly used academic BLE shown in Figure 2.2 has a very limited ability to use both the lookup table and flip-flop together. Modern commercial BLEs include additional 2:1 multiplexers to allow the lookup table and flip-flop to be used in concert in many more ways [3, 52]. These extra multiplexers are included in our designs and will be described in more detail in Section 3.2.

17 Chapter 2. Background 8 LB LB CB CB CB LB CB SB Connection block MUX Tri-state drivers LB output connects to routing wire via tri-state driver (a) Multi-driver architecture. Drivers at mid-points LB LB CB CB CB LB CB SB Connection block MUX LB output connects to routing wire via SB mux (b) Single-driver architecture. Switch block MUX No drivers at mid-points Figure 2.5: Multi-driver and single-driver routing architectures.

18 Chapter 2. Background 9 Benchmark circuits Architecture description Synthesize and map circuits to FPGA LUTs, FF, etc. Synthesized benchmark circuits Pack into logic clusters VPR architecture description file Place clusters into FPGA Route connections between clusters Analyze timing and area VPR Figure 2.6: FPGA architecture assessment methodology with VPR. 2.2 FPGA Architecture Assessment Methodology The quality of an FPGA in terms of area, performance and power consumption is a function of the architectural parameters described in Section 2.1. These architecture parameters interact in complex ways; hence determining the best choice for each parameter is a challenging task. Although there has been some work towards developing analytical models to evaluate FPGA architectures [46, 26, 16], the standard architecture assessment procedure used by both commercial FPGA manufacturers and academic researchers is an experimental one that consists of implementing benchmark circuits on a candidate architecture in order to evaluate its area, delay and power. Figure 2.6 shows the standard academic CAD flow used to evaluate FPGA architectures [7]. The CAD flow proceeds as follows. Benchmark circuits are first synthesized and mapped into lookup tables (LUTs), flip-flops (FF) and hard-blocks (multipliers and block memories) based on a description of the architecture. LUTs and FFs are then packed into clusters in a manner that attempts to keep related LUTs and FFs in the same cluster such that connections between them can be routed through the logic cluster s fast local interconnect. Next, each cluster is placed into a specific logic block on the FPGA that minimizes both the delay and the wire length of connections between logic clusters as much as possible. Once all logic clusters have been placed, connections between logic blocks are routed through the FPGA s general purpose interconnect. The routing algorithm tries to minimize the benchmark circuit s critical path delay, while using the least amount of routing resources possible. Finally, timing analysis is performed to determine the implemented benchmark circuit s critical path delay and area is calculated based on tile area and the number of logic blocks required by the placement. The packing, placement and routing phases of the flow of Figure 2.6 are performed by VPR [7]. Since many of the algorithms used by VPR are timing-based, the VPR architecture file must describe

19 Chapter 2. Background 10 WL V SRAM+ BL BL V SRAM- Figure 2.7: Six transistor SRAM cell. the delays through the lookup tables, routing multiplexers and any other circuitry that makes up the FPGA. The delay of these circuits depend on the circuit topologies used, as well as the transistor sizing of the FPGA circuitry. Consequently, evaluting an FPGA architecture requires first completing its transistor-level design. 2.3 FPGA Circuit Design As mentioned in Section 2.1, we only consider soft-logic-based FPGAs with single-driver routing architectures in this thesis. Soft-logic FPGA architectures consists entirely of SRAM cells, routing multiplexers, lookup tables and flip-flops. This section describes commonly used circuit topologies and circuit design practices for these structures SRAM cells An FPGA typically contains millions of memory bits used to configure routing multiplexers and store lookup table logic functions. Because there are so many of them, a key design goal for these memory bits is small area. Stability is also important, as state flipping would cause problems such as incorrectly configured routing multiplexers. A six transistor SRAM cell (Figure 2.7) has been the standard implementation in FPGA research [7] as it achieves both design goals reasonably well Routing Multiplexers Routing multiplexers account for a large fraction of the area and delay of an FPGA. Consequently, it is crucial to choose a circuit implementation that is as efficient as possible. There are a number of approaches that can be taken to build a multiplexer but most commercial FPGAs and almost all academic FPGA studies use an NMOS pass-transistor-based approach because each switch requires only one transistor, minimizing area. Figure 2.8 shows three of the most commonly used pass-transistor multiplexer topologies. Each multiplexer style possesses a different area-delay tradeoff that is a function of the number of multiplexer inputs [27, 9]. For example, since it has just one pass-transistor on the signal path, a 1-level multiplexer is generally faster than a 2-level multiplexer. But, for a large number of inputs, a 1-level multiplexer requires more SRAM cells than a 2-level multiplexer and can thus have larger area. Furthermore, if the the number of inputs is large enough, a 1-level multiplexer could even become slower than a 2-level multiplexer due to a greater number of transistors loading the output node.

20 Chapter 2. Background 11 SRAM cell SRAM cell SRAM cell out out out (a) Tree MUX. (b) 1-level MUX. (c) 2-level MUX. Figure 2.8: Different 8:1 pass-transistor multiplexer topologies. Level-restorer out MUX 2-stage buffer Figure 2.9: Multiplexer followed by two-stage buffer with PMOS level-restorer. It was shown in [9] that a 2-level multiplexer generally yields a lower area-delay product than a 1-level or tree multiplexer. Commercial FPGAs also commonly use 2-level multiplexers [33]. Although they are beneficial in terms of area, pass-transistors have an important disadvantage: they are incapable of passing a full logic-high voltage. That is, their output voltage saturates at approximately V G V T h where V G is the gate voltage and V T h is the threshold voltage of the transistor. In FPGA circuitry, the output of a pass-transistor-based routing multiplexer is typically driven by a multi-stage buffer [7, 30, 33]. Static power dissipation in these buffers caused by the reduced voltage swing of passtransistors has long been a cause for concern [7]. To mitigate this problem, gate boosting [7] (applying a voltage larger than the supply voltage (V DD ) on the pass-transistor gate) and PMOS level-restorers [30, 33] have been used to help pull pass-transistor output voltages up to V DD. Figure 2.9 shows a routing multiplexer followed by a two-stage buffer equiped with a PMOS-level restorer.

21 Chapter 2. Background 12 SRAM cells A B C LUT inputs Level-restorer out 3-LUT 2-stage buffer Figure 2.10: Fully encoded MUX tree 3-LUT Lookup Tables Like routing multiplexers, lookup tables also use pass-transistor-based multiplexer circuitry but, the multiplexer input and control connectivity is reversed. In a lookup table, SRAM cells connect to the inputs of the multiplexer and hold the logic functions truth table, while the gates of the multiplexer are controlled by the lookup table inputs. Consequently, lookup tables are generally implemented as fully-encoded multiplexer trees, such that each level of the tree can be connected to a LUT input [7]. Figure 2.10 shows a 3-input fully encoded multiplexer tree lookup table followed by a two-stage buffer Flip-Flops Flip-flops are generally implemented as standard master-slave registers [7]. However, some commercial FPGAs use flip flops that are more advanved. For example, Altera s Stratix V FPGAs use flip-flops based on pulse latches and configurable pulse width generators to improve performance [35]. 2.4 Modeling of FPGA Circuitry Evaluating an FPGA architecture with the assessment methodology described in Section 2.2 requires that we develop models that allow us to estimate the area and delay of FPGA circuitry because fabricating an integrated circuit for each architecture to measure area and delay is obviously not practical. In this section, we describe commonly used area and delay modeling approaches for FPGAs. These models are also useful for transistor sizing, which we will discuss in Section 2.5.

22 Chapter 2. Background 13 Minimum-width transistor Space to neighboring transistors Minimum-width transistor area Diffusion Metal contact Metal/polysilicon gate Figure 2.11: Minimum-width transistor area model Area Modeling Creating a complete layout is the best way to determine the exact area of an FPGA. However, this process is much too time consuming when multiple designs need to be explored. A variety of different approaches have been used to more quickly estimate area such as counting transistors or counting SRAM cells, but the most widely used in FPGA research is the minimum-width transistor area model introduced in [7]. In this model, layout area is expressed in units of minimum-width transistor areas. A minimumwidth transistor is defined as the smallest possible contactable transistor for a specific process technology and one minimum-width transistor area is the area of this transistor plus the spacing to neighboring transistors as shown in Figure Unlike area models that simply count transistors or SRAM cells, the minimum-width transistor area model provides an actual estimate of layout area. This is an important distinction because as well as being more accurate, actual layout area estimates enable better estimates of wire loads since wire lengths are layout dependent. Transistors in FPGA circuitry often require more drive strength than that provided by a minimumwidth transistor. A transistor s drive strength can be increased by either widening its diffusion region (Figure 2.12b) or by adding parallel diffusion regions (Figure 2.12c). Consequently, increasing a transistor s drive strength increases it s area. The widely-used area model of [7] estimates the layout area of a transistor with drive strength x, in units of minimum-width transistor areas, with (2.2), which was obtained by averaging the layout areas that result from either widening the diffusion region or adding parallel diffusion regions to increase drive strength. Area(x) = x (2.2) Then, [7] calculates the area of an FPGA subcircuit by simply summing the areas of all the transistors in that subcircuit. Note from (2.2) that doubling a transistor s drive strength does not double it s area. This is due to the fact that increasing a transistor s drive strength only increases certain transistor dimensions while others remain constant. For example, the spacing to neighboring transistors remains the same regardless of a transistor s drive strength.

23 Chapter 2. Background 14 2 parallel diffusions 1x minimum contactable width 2x minimum contactable width 1x minimum contactable width (a) Minimum drive strength. (b) 2 minimum drive strength. (c) 2 minimum drive strength. Figure 2.12: Increasing drive strength with diffusion widening (b) or parallel diffusion regions (c). Note: Although not shown in the figure for simplicity, parallel diffusions must be connected together Delay Modeling Time-domain circuit simulators such as HSPICE are generally the most accurate way to estimate the delay of a circuit. However, time-domain simulation can be computationally intensive making it impractical when a large number of delay measurements need to be obtained. For example, the timing analysis phase of the architecture assessment flow described in Section 2.2 involves measuring delay for the thousands of nets in a benchmark circuit; performing time-domain simulation for each one would lead to prohibitively long runtimes. Instead, previous FPGA research work has typically modeled wires and transistors as linear resistances and capacitances, such that a transistor-based circuit can be modeled as an RC-tree network [22, 7, 24]. The delay of this network can then be estimated with the Elmore [15] or the Penfield-Rubinstein [20] delay models, which are much quicker than time-domain simulations. With the Elmore delay model, the delay T D of a path is given by: T D = R i C(subtree i ) (2.3) i path where R i is the equivalent resistance of element i along the path and C(subtree i ) is the total downstream capacitance rooted at element i. An enhanced version of the Elmore delay model was proposed in [39]. Since it is more difficult to model a buffer as a simple RC circuit due to the buffer s intrinsic delay, [39] combines the Elmore delay model with a common model of buffer delay where a buffer is modeled as a constant delay and a resistor. This approach maps well to FPGA circuitry, which consists mostly of pass-transistors and buffers, and was adopted as the delay model for VPR in [7]. With this model, the delay T D of a path is given by: T D = R i C(subtree i ) + T buf,i (2.4) i path where T buf,i is the buffer s intrinsic delay if element i is a buffer or 0 otherwise [7]. 2.5 Automated Transistor Sizing Transistor sizing is a well-studied problem that consists of improving a circuit s performance by increasing the sizes of its transistors and thus provides yet another level, in addition to architecture and circuit design, at which the area and delay characteristics of an FPGA can be adjusted. The transistor sizing optimization problem is usually formulated in one of three ways:

24 Chapter 2. Background Minimize some function of area and delay. 2. Minimize area subject to a delay constraint. 3. Minimize delay subject to an area constraint. There has been much prior work on automated transistor sizing for custom circuitry. Fishburn and Dunlop showed in [17] that modeling transistors as linear resistances and capacitances and calculating the delay of the resulting RC circuits with the Elmore [15] or the Penfield-Rubinstein [20] delay model (i.e. (2.3)) allows the transistor sizing problem to be formulated as a convex optimization problem, which guarantees that any local minimum is the global minimum. With this useful property, [17] develops TILOS, a transistor sizing tool for custom circuits based on a heuristic method that iteratively identifies a circuit s critical path and increases transistor sizes on that path until all timing constraints are met. Despite the convexity of the problem, TILOS s heuristic is such that it can terminate with a suboptimal solution [45]. Algorithms guaranteeing the optimal solution through convex optimization [44] or mathematical relaxation techniques [10, 47] have subsequently been proposed but these algorithms, along with TILOS, all suffer from their reliance on linear device models and the Elmore delay, which have long been known to be inaccurate [40, 21]. To enhance accuracy, at the cost of increased computational complexity, some transistor sizing algorithms have used time-domain simulation to obtain delay estimates [14, 13]. The programmability of FPGAs adds unique features to the transistor sizing problem which makes FPGA-specific transistor sizing tools valuable. Kuon and Rose proposed such a tool in [24]. Their FPGA transistor sizing approach is different than transistor sizing algorithms for custom circuits because it deals with two features unique to FPGAs. The first of these unique features is repitition. As described in Section 2.1, an FPGA consists of an array of tiles. Since these tiles are all identical, transistor-level design only needs to be performed for one of them. This design can then be replicated to obtain a complete FPGA. Similar design space reductions can be found within a tile. For example, a switch block can include over 100 logically equivalent multiplexers whose transistor-level design should be kept identical. Consequently, only 80 unique transistors need to be sized when designing an FPGA s soft-logic despite there being billions of transistors on the chip, which is in contrast to transistor sizing for custom circuits where the whole chip must be considered. This reduced design space makes HSPICE-based optimization practical for FPGAs, but as we show in Section 3.7, we must still search this space intelligently to keep runtime reasonable. The second unique feature to FPGA transistor sizing is their undefined critical path. Because they are programmable, FPGAs have application-dependent critical paths which implies that at design time, there is no clear critical path to optimize for delay. To deal with this issue, [24] optimizes a representative path that contains one of each type of FPGA subcircuit (LUTs, MUXes, etc.). Delay is taken as a weighted sum of the delay of each subcircuit and the weighting scheme is chosen based on the frequency with which each subcircuit was found on the critical paths of placed and routed benchmark circuits. Optimizing a representative critical path still presents a huge design space which Kuon and Rose tackle with a two-phased algorithm that consists of an exploratory phase that utilizes linear device models and a TILOS-like transistor sizing heuristic to keep CPU times reasonable, followed by an HSPICE-based fine-tuning phase that adjusts the transistor sizes to account for the inaccuracies of linear models. In [46], Smith et al. present a method that enables the rapid and concurrent optimization of highlevel architecture parameters and transistor sizes for FPGAs through the use of analytic architecture

25 Chapter 2. Background 16 models, linear device models and a convex optimization-based transistor sizing algorithm. They show that this concurrent optimization can have a significant impact on architectural conclusions versus a separate optimization.

26 Chapter 3 COFFE: Automated Optimization of FPGA Circuitry When developing a new chip, FPGA architects are faced with two main tasks: choosing an architecture for their FPGA and performing the transistor-level design of that architecture. As described in Section 2.2, choosing an architecture is typically accomplished experimentally with architecture exploration tools such as VPR [7]. By implementing benchmark circuits on a proposed FPGA, these tools allow architects to evaluate the area, delay and power impact of various architectural choices. Then, based on their observations, architects can select an FPGA architecture that meets their design goals and constraints. Transistor-level design consists of selecting circuit topologies for the various subcircuits that implement the chosen architecture, as well as sizing the transistors of those subcircuits. Transistor-level design is an essential precursor to the evaluation of an architecture because it provides accurate area, delay and power estimates of the underlying FPGA circuitry; these estimates are required inputs to the architecture exploration tools. Transistor sizing also provides an additional opportunity to tune the area, delay and power of an FPGA. Therefore, developing a new FPGA is an iterative process that involves performing the transistor-level design of various architectures before evaluating them through synthesis, placement and routing experiments. This interdependence between architecture exploration and transistor-level design necessitates automated design tools if high-quality results are to be obtained in reasonable amounts of time. In this chapter, we describe COFFE (Circuit Optimization For FPGA Exploration), a fully-automated transistor sizing tool for FPGAs that we developed as part of this thesis. COFFE enables the design flow detailed above by providing area, delay and power estimates of properly sized FPGA circuitry. COFFE also enables design exploration of FPGA circuitry and we will use COFFE in such a capacity in Chapter 4. Although COFFE solves the same problem as Kuon and Rose s FPGA transistor sizing tool [24] (see Section 2.5), we have made significant improvements which are necessary for FPGAs in advanced process nodes; these improvements will be described in the following sections. 3.1 Introduction to COFFE Figure 3.1 shows the FPGA design flow we wish to enable with COFFE. COFFE is used to perform transistor-level optimization for some architecture of interest, thus producing accurate area and delay 17

27 Chapter 3. COFFE: Automated Optimization of FPGA Circuitry 18 Process models Optimization objective Benchmark circuits HSPICE Area model Wire load model Circuit Optimizer Subcircuit areas and delays (VPR arch. file) Typical critical path (delay weights) Pack Place Route Generate subcircuit SPICE netlists Subcircuit SPICE netlists COFFE Architecture parameters Analyze timing and area VPR Figure 3.1: FPGA design flow. estimates for the subcircuits of this architecture (LUTs, routing multiplexers, etc.). These estimates are used by VPR to evaluate the architecture through place and route experiments. Based on the results of the assessment, the architecture parameters are adjusted and sent back to COFFE to begin a new iteration of optimization and evaluation. COFFE s circuit optimizer makes area and performance trade-offs through transistor sizing. Like [24], COFFE s optimization objective is of the form Area b Delay c thus allowing for different area and performance tradeoffs by varying b and c. Creating a complete layout is the most accurate way to obtain the area and delay measurements needed during transistor sizing. However, for the iterative design flow of Figure 3.1, this approach is impractical as layout is a very time consuming task. Instead, COFFE estimates area with an improved version of the minimum-width transistor area model (see Section 3.4) and measures delay with HSPICE simulations. Although previous FPGA transistor sizing tools have used linearized models of transistors to measure delay during certain phases of the optimization, we show in Section 3.5 that such models are highly inaccurate for the fine-grained transistor-level design we wish to undertake in advanced process nodes such as the 22nm process we use in this work. COFFE automatically generates the SPICE netlists required for delay measurement based on the input architecture parameters and the circuit topologies described in Sections 3.2 and 3.3 respectively. These netlists are parametrized such that COFFE s circuit optimizer can change the transistor sizes by simply changing a transistor size parameter list. To obtain meaningful delays, COFFE is careful to ensure that these netlists include realistic transistor and wire loading. Transistor loads are relatively easy to determine based on architectural parameters and circuit topologies. Wire loads, on the other hand, are layout dependent making them more difficult to determine since the exact layout is not known. COFFE estimates wire loads with the model described in Section Architecture Figure 3.2 shows the tile architecture that COFFE supports in its designs and Table 3.1 lists the architecture parameters that COFFE expects as inputs. Parameters listed in the top portion of Table 3.1

COFFE: Fully-Automated Transistor Sizing for FPGAs

COFFE: Fully-Automated Transistor Sizing for FPGAs COFFE: Fully-Automated Transistor Sizing for FPGAs Charles Chiasson and Vaughn Betz Department of Electrical and Computer Engineering University of Toronto, Toronto, ON, Canada {charlesc,vaughn}@eecg.utoronto.ca

More information

SHOULD FPGAS ABANDON THE PASS-GATE? Charles Chiasson and Vaughn Betz

SHOULD FPGAS ABANDON THE PASS-GATE? Charles Chiasson and Vaughn Betz SHOULD FPGAS ABANDON THE PASS-GATE? Charles Chiasson and Vaughn Betz Department of Electrical and Computer Engineering University of Toronto, Toronto, ON, Canada {charlesc,vaughn}@eecg.utoronto.ca ABSTRACT

More information

UNIT-II LOW POWER VLSI DESIGN APPROACHES

UNIT-II LOW POWER VLSI DESIGN APPROACHES UNIT-II LOW POWER VLSI DESIGN APPROACHES Low power Design through Voltage Scaling: The switching power dissipation in CMOS digital integrated circuits is a strong function of the power supply voltage.

More information

AUTOMATING TRANSISTOR RESIZING DESIGN OF FIELD-PROGRAMMABLE GATE ARRAYS IN THE. By Anthony Bing-Yan Chan. Supervisor: Jonathan Rose

AUTOMATING TRANSISTOR RESIZING DESIGN OF FIELD-PROGRAMMABLE GATE ARRAYS IN THE. By Anthony Bing-Yan Chan. Supervisor: Jonathan Rose AUTOMATING TRANSISTOR RESIZING IN THE DESIGN OF FIELD-PROGRAMMABLE GATE ARRAYS By Anthony Bing-Yan Chan Supervisor: Jonathan Rose April 2003 AUTOMATING TRANSISTOR RESIZING IN THE DESIGN OF FIELD-PROGRAMMABLE

More information

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques. Introduction EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Techniques Cristian Grecu grecuc@ece.ubc.ca Course web site: http://courses.ece.ubc.ca/353/ What have you learned so far?

More information

Low Power, Area Efficient FinFET Circuit Design

Low Power, Area Efficient FinFET Circuit Design Low Power, Area Efficient FinFET Circuit Design Michael C. Wang, Princeton University Abstract FinFET, which is a double-gate field effect transistor (DGFET), is more versatile than traditional single-gate

More information

A Case Study of Nanoscale FPGA Programmable Switches with Low Power

A Case Study of Nanoscale FPGA Programmable Switches with Low Power A Case Study of Nanoscale FPGA Programmable Switches with Low Power V.Elamaran 1, Har Narayan Upadhyay 2 1 Assistant Professor, Department of ECE, School of EEE SASTRA University, Tamilnadu - 613401, India

More information

UNIT-III POWER ESTIMATION AND ANALYSIS

UNIT-III POWER ESTIMATION AND ANALYSIS UNIT-III POWER ESTIMATION AND ANALYSIS In VLSI design implementation simulation software operating at various levels of design abstraction. In general simulation at a lower-level design abstraction offers

More information

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Safeen Huda and Jason Anderson International Symposium on Physical Design Santa Rosa, CA, April 6, 2016 1 Motivation FPGA power increasingly

More information

Towards PVT-Tolerant Glitch-Free Operation in FPGAs

Towards PVT-Tolerant Glitch-Free Operation in FPGAs Towards PVT-Tolerant Glitch-Free Operation in FPGAs Safeen Huda and Jason H. Anderson ECE Department, University of Toronto, Canada 24 th ACM/SIGDA International Symposium on FPGAs February 22, 2016 Motivation

More information

White Paper Stratix III Programmable Power

White Paper Stratix III Programmable Power Introduction White Paper Stratix III Programmable Power Traditionally, digital logic has not consumed significant static power, but this has changed with very small process nodes. Leakage current in digital

More information

A Survey of the Low Power Design Techniques at the Circuit Level

A Survey of the Low Power Design Techniques at the Circuit Level A Survey of the Low Power Design Techniques at the Circuit Level Hari Krishna B Assistant Professor, Department of Electronics and Communication Engineering, Vagdevi Engineering College, Warangal, India

More information

A Dual-V DD Low Power FPGA Architecture

A Dual-V DD Low Power FPGA Architecture A Dual-V DD Low Power FPGA Architecture A. Gayasen 1, K. Lee 1, N. Vijaykrishnan 1, M. Kandemir 1, M.J. Irwin 1, and T. Tuan 2 1 Dept. of Computer Science and Engineering Pennsylvania State University

More information

Static Power and the Importance of Realistic Junction Temperature Analysis

Static Power and the Importance of Realistic Junction Temperature Analysis White Paper: Virtex-4 Family R WP221 (v1.0) March 23, 2005 Static Power and the Importance of Realistic Junction Temperature Analysis By: Matt Klein Total power consumption of a board or system is important;

More information

II. Previous Work. III. New 8T Adder Design

II. Previous Work. III. New 8T Adder Design ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: High Performance Circuit Level Design For Multiplier Arun Kumar

More information

PE713 FPGA Based System Design

PE713 FPGA Based System Design PE713 FPGA Based System Design Why VLSI? Dept. of EEE, Amrita School of Engineering Why ICs? Dept. of EEE, Amrita School of Engineering IC Classification ANALOG (OR LINEAR) ICs produce, amplify, or respond

More information

Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting

Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting C. Guardiani, C. Forzan, B. Franzini, D. Pandini Adanced Research, Central R&D, DAIS,

More information

Lecture 9: Cell Design Issues

Lecture 9: Cell Design Issues Lecture 9: Cell Design Issues MAH, AEN EE271 Lecture 9 1 Overview Reading W&E 6.3 to 6.3.6 - FPGA, Gate Array, and Std Cell design W&E 5.3 - Cell design Introduction This lecture will look at some of the

More information

2009 Spring CS211 Digital Systems & Lab 1 CHAPTER 3: TECHNOLOGY (PART 2)

2009 Spring CS211 Digital Systems & Lab 1 CHAPTER 3: TECHNOLOGY (PART 2) 1 CHAPTER 3: IMPLEMENTATION TECHNOLOGY (PART 2) Whatwillwelearninthischapter? we learn in this 2 How transistors operate and form simple switches CMOS logic gates IC technology FPGAs and other PLDs Basic

More information

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering FPGA Fabrics Reference Wayne Wolf, FPGA-Based System Design Pearson Education, 2004 CPLD / FPGA CPLD Interconnection of several PLD blocks with Programmable interconnect on a single chip Logic blocks executes

More information

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES 44 CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES 3.1 INTRODUCTION The design of high-speed and low-power VLSI architectures needs efficient arithmetic processing units,

More information

TRENDS in technology scaling make leakage power an

TRENDS in technology scaling make leakage power an IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 3, MARCH 2006 423 Active Leakage Power Optimization for FPGAs Jason H. Anderson, Student Member, IEEE, and Farid

More information

Lecture #2 Solving the Interconnect Problems in VLSI

Lecture #2 Solving the Interconnect Problems in VLSI Lecture #2 Solving the Interconnect Problems in VLSI C.P. Ravikumar IIT Madras - C.P. Ravikumar 1 Interconnect Problems Interconnect delay has become more important than gate delays after 130nm technology

More information

Yet, many signal processing systems require both digital and analog circuits. To enable

Yet, many signal processing systems require both digital and analog circuits. To enable Introduction Field-Programmable Gate Arrays (FPGAs) have been a superb solution for rapid and reliable prototyping of digital logic systems at low cost for more than twenty years. Yet, many signal processing

More information

Low Power Design of Schmitt Trigger Based SRAM Cell Using NBTI Technique

Low Power Design of Schmitt Trigger Based SRAM Cell Using NBTI Technique Low Power Design of Schmitt Trigger Based SRAM Cell Using NBTI Technique M.Padmaja 1, N.V.Maheswara Rao 2 Post Graduate Scholar, Gayatri Vidya Parishad College of Engineering for Women, Affiliated to JNTU,

More information

Introduction to CMOS VLSI Design (E158) Lecture 9: Cell Design

Introduction to CMOS VLSI Design (E158) Lecture 9: Cell Design Harris Introduction to CMOS VLSI Design (E158) Lecture 9: Cell Design David Harris Harvey Mudd College David_Harris@hmc.edu Based on EE271 developed by Mark Horowitz, Stanford University MAH E158 Lecture

More information

Power-Area trade-off for Different CMOS Design Technologies

Power-Area trade-off for Different CMOS Design Technologies Power-Area trade-off for Different CMOS Design Technologies Priyadarshini.V Department of ECE Sri Vishnu Engineering College for Women, Bhimavaram dpriya69@gmail.com Prof.G.R.L.V.N.Srinivasa Raju Head

More information

Department of Electrical and Computer Systems Engineering

Department of Electrical and Computer Systems Engineering Department of Electrical and Computer Systems Engineering Technical Report MECSE-31-2005 Asynchronous Self Timed Processing: Improving Performance and Design Practicality D. Browne and L. Kleeman Asynchronous

More information

DESIGN & IMPLEMENTATION OF SELF TIME DUMMY REPLICA TECHNIQUE IN 128X128 LOW VOLTAGE SRAM

DESIGN & IMPLEMENTATION OF SELF TIME DUMMY REPLICA TECHNIQUE IN 128X128 LOW VOLTAGE SRAM DESIGN & IMPLEMENTATION OF SELF TIME DUMMY REPLICA TECHNIQUE IN 128X128 LOW VOLTAGE SRAM 1 Mitali Agarwal, 2 Taru Tevatia 1 Research Scholar, 2 Associate Professor 1 Department of Electronics & Communication

More information

Design and Implementation of Complex Multiplier Using Compressors

Design and Implementation of Complex Multiplier Using Compressors Design and Implementation of Complex Multiplier Using Compressors Abstract: In this paper, a low-power high speed Complex Multiplier using compressor circuit is proposed for fast digital arithmetic integrated

More information

CHAPTER 3 NEW SLEEPY- PASS GATE

CHAPTER 3 NEW SLEEPY- PASS GATE 56 CHAPTER 3 NEW SLEEPY- PASS GATE 3.1 INTRODUCTION A circuit level design technique is presented in this chapter to reduce the overall leakage power in conventional CMOS cells. The new leakage po leepy-

More information

POWER GATING. Power-gating parameters

POWER GATING. Power-gating parameters POWER GATING Power Gating is effective for reducing leakage power [3]. Power gating is the technique wherein circuit blocks that are not in use are temporarily turned off to reduce the overall leakage

More information

EECS 427 Lecture 22: Low and Multiple-Vdd Design

EECS 427 Lecture 22: Low and Multiple-Vdd Design EECS 427 Lecture 22: Low and Multiple-Vdd Design Reading: 11.7.1 EECS 427 W07 Lecture 22 1 Last Time Low power ALUs Glitch power Clock gating Bus recoding The low power design space Dynamic vs static EECS

More information

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS Charlie Jenkins, (Altera Corporation San Jose, California, USA; chjenkin@altera.com) Paul Ekas, (Altera Corporation San Jose, California, USA; pekas@altera.com)

More information

Engr354: Digital Logic Circuits

Engr354: Digital Logic Circuits Engr354: Digital Logic Circuits Chapter 3: Implementation Technology Curtis Nelson Chapter 3 Overview In this chapter you will learn about: How transistors are used as switches; Integrated circuit technology;

More information

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis N. Banerjee, A. Raychowdhury, S. Bhunia, H. Mahmoodi, and K. Roy School of Electrical and Computer Engineering, Purdue University,

More information

Memory Basics. historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities

Memory Basics. historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities Memory Basics RAM: Random Access Memory historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities ROM: Read Only Memory no capabilities for

More information

FPGA Based System Design

FPGA Based System Design FPGA Based System Design Reference Wayne Wolf, FPGA-Based System Design Pearson Education, 2004 Why VLSI? Integration improves the design: higher speed; lower power; physically smaller. Integration reduces

More information

Leakage Power Modeling and Reduction Techniques for Field Programmable Gate Arrays

Leakage Power Modeling and Reduction Techniques for Field Programmable Gate Arrays Leakage Power Modeling and Reduction Techniques for Field Programmable Gate Arrays by Akhilesh Kumar A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree

More information

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER K. RAMAMOORTHY 1 T. CHELLADURAI 2 V. MANIKANDAN 3 1 Department of Electronics and Communication

More information

ALPS: An Automatic Layouter for Pass-Transistor Cell Synthesis

ALPS: An Automatic Layouter for Pass-Transistor Cell Synthesis ALPS: An Automatic Layouter for Pass-Transistor Cell Synthesis Yasuhiko Sasaki Central Research Laboratory Hitachi, Ltd. Kokubunji, Tokyo, 185, Japan Kunihito Rikino Hitachi Device Engineering Kokubunji,

More information

High Performance Low-Power Signed Multiplier

High Performance Low-Power Signed Multiplier High Performance Low-Power Signed Multiplier Amir R. Attarha Mehrdad Nourani VLSI Circuits & Systems Laboratory Department of Electrical and Computer Engineering University of Tehran, IRAN Email: attarha@khorshid.ece.ut.ac.ir

More information

Policy-Based RTL Design

Policy-Based RTL Design Policy-Based RTL Design Bhanu Kapoor and Bernard Murphy bkapoor@atrenta.com Atrenta, Inc., 2001 Gateway Pl. 440W San Jose, CA 95110 Abstract achieving the desired goals. We present a new methodology to

More information

International Journal of Advance Engineering and Research Development

International Journal of Advance Engineering and Research Development Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 05, May -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 COMPARATIVE

More information

Sophisticated design of low power high speed full adder by using SR-CPL and Transmission Gate logic

Sophisticated design of low power high speed full adder by using SR-CPL and Transmission Gate logic Scientific Journal of Impact Factor(SJIF): 3.134 International Journal of Advance Engineering and Research Development Volume 2,Issue 3, March -2015 e-issn(o): 2348-4470 p-issn(p): 2348-6406 Sophisticated

More information

BASIC PHYSICAL DESIGN AN OVERVIEW The VLSI design flow for any IC design is as follows

BASIC PHYSICAL DESIGN AN OVERVIEW The VLSI design flow for any IC design is as follows Unit 3 BASIC PHYSICAL DESIGN AN OVERVIEW The VLSI design flow for any IC design is as follows 1.Specification (problem definition) 2.Schematic(gate level design) (equivalence check) 3.Layout (equivalence

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11

More information

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic CONTENTS PART I: THE FABRICS Chapter 1: Introduction (32 pages) 1.1 A Historical

More information

CHAPTER 6 GDI BASED LOW POWER FULL ADDER CELL FOR DSP DATA PATH BLOCKS

CHAPTER 6 GDI BASED LOW POWER FULL ADDER CELL FOR DSP DATA PATH BLOCKS 87 CHAPTER 6 GDI BASED LOW POWER FULL ADDER CELL FOR DSP DATA PATH BLOCKS 6.1 INTRODUCTION In this approach, the four types of full adders conventional, 16T, 14T and 10T have been analyzed in terms of

More information

FIELD-PROGRAMMABLE gate array (FPGA) chips

FIELD-PROGRAMMABLE gate array (FPGA) chips IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 54, NO. 11, NOVEMBER 2007 2489 3-D nfpga: A Reconfigurable Architecture for 3-D CMOS/Nanomaterial Hybrid Digital Circuits Chen Dong, Deming

More information

TECHNOLOGY scaling, aided by innovative circuit techniques,

TECHNOLOGY scaling, aided by innovative circuit techniques, 122 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 2, FEBRUARY 2006 Energy Optimization of Pipelined Digital Systems Using Circuit Sizing and Supply Scaling Hoang Q. Dao,

More information

Keywords: VLSI; CMOS; Pass Transistor Logic (PTL); Gate Diffusion Input (GDI); Parellel In Parellel Out (PIPO); RAM. I.

Keywords: VLSI; CMOS; Pass Transistor Logic (PTL); Gate Diffusion Input (GDI); Parellel In Parellel Out (PIPO); RAM. I. Comparison and analysis of sequential circuits using different logic styles Shofia Ram 1, Rooha Razmid Ahamed 2 1 M. Tech. Student, Dept of ECE, Rajagiri School of Engg and Technology, Cochin, Kerala 2

More information

Digital Systems Design

Digital Systems Design Digital Systems Design Digital Systems Design and Test Dr. D. J. Jackson Lecture 1-1 Introduction Traditional digital design Manual process of designing and capturing circuits Schematic entry System-level

More information

A 0.9 V Low-power 16-bit DSP Based on a Top-down Design Methodology

A 0.9 V Low-power 16-bit DSP Based on a Top-down Design Methodology UDC 621.3.049.771.14:621.396.949 A 0.9 V Low-power 16-bit DSP Based on a Top-down Design Methodology VAtsushi Tsuchiya VTetsuyoshi Shiota VShoichiro Kawashima (Manuscript received December 8, 1999) A 0.9

More information

Low Power Design of Successive Approximation Registers

Low Power Design of Successive Approximation Registers Low Power Design of Successive Approximation Registers Rabeeh Majidi ECE Department, Worcester Polytechnic Institute, Worcester MA USA rabeehm@ece.wpi.edu Abstract: This paper presents low power design

More information

A Novel Radiation Tolerant SRAM Design Based on Synergetic Functional Component Separation for Nanoscale CMOS.

A Novel Radiation Tolerant SRAM Design Based on Synergetic Functional Component Separation for Nanoscale CMOS. A Novel Radiation Tolerant SRAM Design Based on Synergetic Functional Component Separation for Nanoscale CMOS. Abstract This paper presents a novel SRAM design for nanoscale CMOS. The new design addresses

More information

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI ELEN 689 606 Techniques for Layout Synthesis and Simulation in EDA Project Report On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital

More information

A new 6-T multiplexer based full-adder for low power and leakage current optimization

A new 6-T multiplexer based full-adder for low power and leakage current optimization A new 6-T multiplexer based full-adder for low power and leakage current optimization G. Ramana Murthy a), C. Senthilpari, P. Velrajkumar, and T. S. Lim Faculty of Engineering and Technology, Multimedia

More information

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate Preface to Third Edition p. xiii Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate Design p. 6 Basic Logic Functions p. 6 Implementation

More information

A Novel Low-Power Scan Design Technique Using Supply Gating

A Novel Low-Power Scan Design Technique Using Supply Gating A Novel Low-Power Scan Design Technique Using Supply Gating S. Bhunia, H. Mahmoodi, S. Mukhopadhyay, D. Ghosh, and K. Roy School of Electrical and Computer Engineering, Purdue University, West Lafayette,

More information

Domino Static Gates Final Design Report

Domino Static Gates Final Design Report Domino Static Gates Final Design Report Krishna Santhanam bstract Static circuit gates are the standard circuit devices used to build the major parts of digital circuits. Dynamic gates, such as domino

More information

Design of Adders with Less number of Transistor

Design of Adders with Less number of Transistor Design of Adders with Less number of Transistor Mohammed Azeem Gafoor 1 and Dr. A R Abdul Rajak 2 1 Master of Engineering(Microelectronics), Birla Institute of Technology and Science Pilani, Dubai Campus,

More information

Mapping Multiplexers onto Hard Multipliers in FPGAs

Mapping Multiplexers onto Hard Multipliers in FPGAs Mapping Multiplexers onto Hard Multipliers in FPGAs Peter Jamieson and Jonathan Rose The Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto Modern FPGAs Consist

More information

Lecture 11: Clocking

Lecture 11: Clocking High Speed CMOS VLSI Design Lecture 11: Clocking (c) 1997 David Harris 1.0 Introduction We have seen that generating and distributing clocks with little skew is essential to high speed circuit design.

More information

Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements

Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements Christophe Giacomotto 1, Mandeep Singh 1, Milena Vratonjic 1, Vojin G. Oklobdzija 1 1 Advanced Computer systems Engineering Laboratory,

More information

DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers

DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers Muhammad Nummer and Manoj Sachdev University of Waterloo, Ontario, Canada mnummer@vlsi.uwaterloo.ca, msachdev@ece.uwaterloo.ca

More information

An Interconnect-Centric Approach to Cyclic Shifter Design

An Interconnect-Centric Approach to Cyclic Shifter Design An Interconnect-Centric Approach to Cyclic Shifter Design Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College. David M. Harris Harvey Mudd College. 1 Outline Motivation Previous Work Approaches Fanout-Splitting

More information

THERE is a growing need for high-performance and. Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment

THERE is a growing need for high-performance and. Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment 1014 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 7, JULY 2005 Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment Dongwoo Lee, Student

More information

PROGRAMMABLE ASIC INTERCONNECT

PROGRAMMABLE ASIC INTERCONNECT PROGRAMMABLE ASIC INTERCONNECT The structure and complexity of the interconnect is largely determined by the programming technology and the architecture of the basic logic cell The first programmable ASICs

More information

FPGA-SPICE: A Simulation-based Power Estimation Framework for FPGAs

FPGA-SPICE: A Simulation-based Power Estimation Framework for FPGAs FPGA-SPICE: A Simulation-based Power Estimation Framework for FPGAs ifan Tang, Pierre-Emmanuel Gaillardon and Giovanni De icheli Integrated Systems aboratory (SI), École Polytechnique Fédérale de ausanne

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 6 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Contents Array subsystems Gate arrays technology Sea-of-gates Standard cell Macrocell

More information

Leakage Power Minimization in Deep-Submicron CMOS circuits

Leakage Power Minimization in Deep-Submicron CMOS circuits Outline Leakage Power Minimization in Deep-Submicron circuits Politecnico di Torino Dip. di Automatica e Informatica 1019 Torino, Italy enrico.macii@polito.it Introduction. Design for low leakage: Basics.

More information

Digital Microelectronic Circuits ( ) Pass Transistor Logic. Lecture 9: Presented by: Adam Teman

Digital Microelectronic Circuits ( ) Pass Transistor Logic. Lecture 9: Presented by: Adam Teman Digital Microelectronic Circuits (361-1-3021 ) Presented by: Adam Teman Lecture 9: Pass Transistor Logic 1 Motivation In the previous lectures, we learned about Standard CMOS Digital Logic design. CMOS

More information

DESIGNING powerful and versatile computing systems is

DESIGNING powerful and versatile computing systems is 560 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 5, MAY 2007 Variation-Aware Adaptive Voltage Scaling System Mohamed Elgebaly, Member, IEEE, and Manoj Sachdev, Senior

More information

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume. 1, Issue 5, September 2014, PP 30-42 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org

More information

Fast Placement Optimization of Power Supply Pads

Fast Placement Optimization of Power Supply Pads Fast Placement Optimization of Power Supply Pads Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of Illinois at Urbana-Champaign

More information

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4 CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4 1 2 3 4 5 6 7 8 9 10 Sum 30 10 25 10 30 40 10 15 15 15 200 1. (30 points) Misc, Short questions (a) (2 points) Postponing the introduction of signals

More information

Designing Information Devices and Systems II Fall 2017 Note 1

Designing Information Devices and Systems II Fall 2017 Note 1 EECS 16B Designing Information Devices and Systems II Fall 2017 Note 1 1 Digital Information Processing Electrical circuits manipulate voltages (V ) and currents (I) in order to: 1. Process information

More information

Electronic Circuits EE359A

Electronic Circuits EE359A Electronic Circuits EE359A Bruce McNair B206 bmcnair@stevens.edu 201-216-5549 1 Memory and Advanced Digital Circuits - 2 Chapter 11 2 Figure 11.1 (a) Basic latch. (b) The latch with the feedback loop opened.

More information

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Ramon Canal NCD Master MIRI. NCD Master MIRI 1 Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/

More information

NanoFabrics: : Spatial Computing Using Molecular Electronics

NanoFabrics: : Spatial Computing Using Molecular Electronics NanoFabrics: : Spatial Computing Using Molecular Electronics Seth Copen Goldstein and Mihai Budiu Computer Architecture, 2001. Proceedings. 28th Annual International Symposium on 30 June-4 4 July 2001

More information

Implementation of Efficient 5:3 & 7:3 Compressors for High Speed and Low-Power Operations

Implementation of Efficient 5:3 & 7:3 Compressors for High Speed and Low-Power Operations Volume-7, Issue-3, May-June 2017 International Journal of Engineering and Management Research Page Number: 42-47 Implementation of Efficient 5:3 & 7:3 Compressors for High Speed and Low-Power Operations

More information

Ruixing Yang

Ruixing Yang Design of the Power Switching Network Ruixing Yang 15.01.2009 Outline Power Gating implementation styles Sleep transistor power network synthesis Wakeup in-rush current control Wakeup and sleep latency

More information

Andrew Clinton, Matt Liberty, Ian Kuon

Andrew Clinton, Matt Liberty, Ian Kuon Andrew Clinton, Matt Liberty, Ian Kuon FPGA Routing (Interconnect) FPGA routing consists of a network of wires and programmable switches Wire is modeled with a reduced RC network Drivers are modeled as

More information

Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique

Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique Ch. Mohammad Arif 1, J. Syamuel John 2 M. Tech student, Department of Electronics Engineering, VR Siddhartha Engineering College,

More information

Low-Power Digital CMOS Design: A Survey

Low-Power Digital CMOS Design: A Survey Low-Power Digital CMOS Design: A Survey Krister Landernäs June 4, 2005 Department of Computer Science and Electronics, Mälardalen University Abstract The aim of this document is to provide the reader with

More information

EC 1354-Principles of VLSI Design

EC 1354-Principles of VLSI Design EC 1354-Principles of VLSI Design UNIT I MOS TRANSISTOR THEORY AND PROCESS TECHNOLOGY PART-A 1. What are the four generations of integrated circuits? 2. Give the advantages of IC. 3. Give the variety of

More information

A Novel Flipflop Topology for High Speed and Area Efficient Logic Structure Design

A Novel Flipflop Topology for High Speed and Area Efficient Logic Structure Design IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 6, Issue 2 (May. - Jun. 2013), PP 72-80 A Novel Flipflop Topology for High Speed and Area

More information

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling EE241 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolic Lecture 15 Low-Power Design: Supply Voltage Scaling Announcements Homework #2 due today Midterm project reports due next Thursday

More information

The challenges of low power design Karen Yorav

The challenges of low power design Karen Yorav The challenges of low power design Karen Yorav The challenges of low power design What this tutorial is NOT about: Electrical engineering CMOS technology but also not Hand waving nonsense about trends

More information

Geared Oscillator Project Final Design Review. Nick Edwards Richard Wright

Geared Oscillator Project Final Design Review. Nick Edwards Richard Wright Geared Oscillator Project Final Design Review Nick Edwards Richard Wright This paper outlines the implementation and results of a variable-rate oscillating clock supply. The circuit is designed using a

More information

IMPLEMANTATION OF D FLIP FLOP BASED ON DIFFERENT XOR /XNOR GATE DESIGNS

IMPLEMANTATION OF D FLIP FLOP BASED ON DIFFERENT XOR /XNOR GATE DESIGNS IMPLEMANTATION OF D FLIP FLOP BASED ON DIFFERENT XOR /XNOR GATE DESIGNS 1 MADHUR KULSHRESTHA, 2 VIPIN KUMAR GUPTA 1 M. Tech. Scholar, Department of Electronics & Communication Engineering, Suresh Gyan

More information

POWER ESTIMATION FOR FIELD PROGRAMMABLE GATE ARRAYS. Kara Ka Wing Poon B.A.Sc, University of British Columbia, 1999

POWER ESTIMATION FOR FIELD PROGRAMMABLE GATE ARRAYS. Kara Ka Wing Poon B.A.Sc, University of British Columbia, 1999 POWER ESTIMATION FOR FIELD PROGRAMMABLE GATE ARRAYS by Kara Ka Wing Poon B.A.Sc, University of British Columbia, 999 A thesis submitted in partial fulfillment of the requirements for the degree of Master

More information

Learning Outcomes. Spiral 2 8. Digital Design Overview LAYOUT

Learning Outcomes. Spiral 2 8. Digital Design Overview LAYOUT 2-8.1 2-8.2 Spiral 2 8 Cell Mark Redekopp earning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as

More information

A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI)

A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI) A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI) Mahendra Kumar Lariya 1, D. K. Mishra 2 1 M.Tech, Electronics and instrumentation Engineering, Shri G. S. Institute of Technology

More information

UMAINE ECE Morse Code ROM and Transmitter at ISM Band Frequency

UMAINE ECE Morse Code ROM and Transmitter at ISM Band Frequency UMAINE ECE Morse Code ROM and Transmitter at ISM Band Frequency Jamie E. Reinhold December 15, 2011 Abstract The design, simulation and layout of a UMAINE ECE Morse code Read Only Memory and transmitter

More information

Lecture 4&5 CMOS Circuits

Lecture 4&5 CMOS Circuits Lecture 4&5 CMOS Circuits Xuan Silvia Zhang Washington University in St. Louis http://classes.engineering.wustl.edu/ese566/ Worst-Case V OL 2 3 Outline Combinational Logic (Delay Analysis) Sequential Circuits

More information

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS The major design challenges of ASIC design consist of microscopic issues and macroscopic issues [1]. The microscopic issues are ultra-high

More information

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng.

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng. MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng., UCLA - http://nanocad.ee.ucla.edu/ 1 Outline Introduction

More information

Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators

Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators Jan Doutreloigne Abstract This paper describes two methods for the reduction of the peak

More information