Recovery-Based Design for Variation-Tolerant SoCs

Size: px
Start display at page:

Download "Recovery-Based Design for Variation-Tolerant SoCs"

Transcription

1 Recovery-Based Design for Variation-Tolerant SoCs Vivek Kozhikkottu, Sujit Dey and Anand Raghunathan School of Electrical and Computer Engineering, Purdue University School of Electrical and Computer Engineering, UC San Diego ABSTRACT Parameter variations have emerged as a significant threat to continued CMOS scaling in the nanometer regime. Due to increasing performance penalties associated with worst-case design, recovery based design has emerged as a promising approach for dealing with the impact of variations. Previous work has applied recovery based design at the circuit and micro-architecture levels of abstraction. In this work, we address the problem of designing variation-tolerant SoCs using the recovery based design paradigm. We demonstrate that a monolithic implementation of recovery based design fails to scale for large SoCs. We propose the concept of recovery islands, wherein each island consists of one or more SoC components that can recover independent of the rest of the SoC, and demonstrate how our proposal can be easily realized via minor changes to a traditional SoC design flow. We study the tradeoffs involved in applying recovery based design at the system level. We demonstrate that it is critical to account for (i) the inherent diversity of the error-voltage profiles among various components in an SoC, and (ii) the impact of error recovery in a component on overall system performance. We then propose a systematic recovery-based SoC design methodology that partitions a given SoC into recovery islands and also computes the optimal operating points for each island, taking into account the various system level trade-offs involved. We evaluate our framework on three different SoC designs, an b MAC processor, an MPEG encoder and a Wireless Video Capture system and demonstrate an average of 32% energy savings over conventional designs. Categories and Subject Descriptors B.7.1 [INTEGRATED CIRCUITS]: VLSI (Very large scale integration) General Terms Algorithms, Design Keywords System-on-chip, Variation Aware Design, Variation Tolerance, Low Power Design 1. INTRODUCTION Continued scaling of CMOS technologies has resulted in parameter variations emerging as a critical design concern. Parameter variations can be broadly classified as process variations caused due to the inherent nature of the manufacturing This material is based upon work supported in part by the National Science Foundation under Grant No Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 2012, June 3-7, 2012, San Francisco, California, USA. Copyright 2012 ACM ACM /12/06...$ process and environmental variations due to fluctuations in temperature and supply voltage. These parameter variations manifest as statistical behavior in the delay and power consumption of circuits, and have traditionally been dealt with by over-design. However, with continued scaling into the nanometer regime, the gap between typical-case and worst-case design is growing too large, and the performance and energy cost of worst-case design can no longer be ignored. To overcome the problems with worst-case design, recovery based design techniques such as Razor [1] and EDS [2] have been proposed. These techniques employ embedded error detection and recovery circuitry to help detect and recover from timing errors induced by variations. They help eliminate conservative voltage guard bands by dynamically controlling the supply voltage in response to the occurrence of timing errors. Moreover, components can be voltage overscaled even beyond their zero error operating points to achieve considerable energy reductions for a negligible loss in performance [1]. These recovery based design techniques have hitherto been applied only at the circuit and micro-architecture levels [3,4]. We believe that ours is the first effort to explore the application of recovery-based design in a systematic manner to entire SoCs. 1.1 Paper Overview and Contributions In this work, we address the problem of designing variationtolerant SoCs using the recovery based design paradigm. The significant contributions of our work are as follows: We demonstrate that applying recovery based design in a monolithic fashion is not scalable for large SoC designs. We propose a new design approach in which SoCs are divided into multiple recovery islands, each of which can detect and recover from errors independent of the rest of the SoC. We also demonstrate that the communication architecture serves as an ideal variable latency interface for partitioning the SoC into recovery islands. We study the trade-offs involved in applying recovery based design at the system level. We demonstrate that each component s distinct error-voltage characteristics as well as its impact on overall system performance need to be considered while clustering them into recovery islands and computing their operating points. We propose a methodology that systematically partitions a given SoC into recovery islands and also computes the optimal operating point for each island. The framework takes into account the above trade-offs, as well as the complex interactions between different islands, using an emulation based performance analysis framework. We apply recovery based SoC design to three different SoC designs an b MAC processor, an MPEG encoder and a Wireless Video Capture system and obtain an average of 32% energy savings over conventional designs. The rest of this paper is organized as follows. Section 2 summarizes prior work on variation-aware system design. Section 3 describes the challenges involved in applying recovery 826

2 based design to SoCs. Section 4 gives an overview of the proposed concept of recovery islands and the various interfaces needed to enable it. Section 5 analyzes the various systemlevel trade-offs involved in recovery island based SoC design with the help of an example. Section 6 describes our systematic recovery based SoC design methodology. Section 7 describes our experimental setup and presents the results obtained by applying the proposed framework to three example SoC designs. 2. RELATED WORK In the context of SoCs, several previous efforts have demonstrated the strong potential of addressing variations at the system level. In the context of multiple voltage-frequency island based SoC design, several efforts [5, 6] have exploited the inherent flexibility of the multi-island design paradigm to mitigate the impact of variations. Techniques for analyzing the impact of process variations on system performance and power were developed in [7] and [8]. A variation tolerant onchip communication architecture was discussed in [9]. In [10], the authors develop techniques to optimize system-level power management policies under the impact of variations. [11] proposes partitioning an SoC into fine grained body bias islands to help mitigate the impact of within-die leakage variations. However, most of these techniques only deal with manufacturing induced process variations and do not deal with workload, voltage and temperature based variations. Recovery based design, due to its dynamic and adaptive nature [1] [2], deals with all sources of variations and thus eliminates the need for conservative design margins. Due to the various power-performance penalties associated with worst-case design, researchers have started actively developing recovery based design techniques. Razor [1] and EDS [2] propose circuit level mechanisms to detect and correct timing based errors, providing a safety net that allows the elimination of guard bands and design margins. Furthermore, these mechanisms achieve substantial energy savings by facilitating voltage overscaling, a technique of scaling the supply voltage beyond the circuit s critical operating point, resulting in timing errors. In this context, [12] and [13] have proposed using cell sizing and dual threshold voltage cells to modify the timing slack of the frequently-occurring, near-critical timing paths to facilitate further voltage overscaling, thereby achieving additional energy savings. Similarly, at the architecture level, [14] and [15] have suggested architectural modifications to reshape the error-voltage profiles of underlying micro-architectural blocks so as to increase their potential for voltage overscaling. In [3] and [4] the authors argue that finegrained adaptive biasing and voltage interpolation based techniques can be applied to processors instrumented with recovery mechanisms to help mitigate the impact of within-die parameter variations. However, as noted earlier, these techniques focus on the circuit and micro-architecture level trade-offs involved in applying recovery based design. In this paper, we focus on identifying the key system level characteristics and trade-offs that must be taken into account for applying recovery based design in the context of SoCs. 3. MOTIVATION In this section, we motivate the need for a new approach to recovery based design for SoCs, by outlining two major scalability concerns associated with applying recovery based techniques in a monolithic fashion. We utilize an example SoC design to help quantify these concerns. Figure 1 shows the block diagram of a Wireless Video Capture Device (WVCD) SoC consisting of ten components connected to a system bus. The SoC performs two main functions, namely, i) it encodes video frames stored in an on-chip frame buffer, and ii) it packetizes the frames using the b protocol and sends the packets out to a wireless interface for transmission. The four important compute-intensive functions i) Checksum Computation (CRC), ii) Wired Equivalent Privacy encryption (WEP), iii) Motion Estimation (ME), and iv) DCT compression (DCT), are all implemented as hardware accelerators. Figure 1: Wireless video capture SoC The first major factor limiting the scalability of a monolithic recovery based scheme is the impact of within-die parameter variations [16]. Within-die variations cause components within a given instance of the SoC to have differing performancepower characteristics. Recovery based design techniques typically try to operate a component at its optimal operating voltage point so as to eliminate the conservative voltage guard bands needed to deal with variations. However, in a monolithic implementation, the operating voltage of the entire SoC would be determined by the voltage of its slowest component that has been impacted most negatively by variations. As a consequence, a large number of components would be forced to operate at sub-optimal voltages, leading to reduced energy benefits. Figure 2 shows the mean energy savings (for WVCD SoC chips) obtained by a monolithic implementation of recovery based design, for increasing values of within-die process variations. The figure shows that for higher values of within-die variations, the energy savings attained by monolithic recovery based Figure 2: Mean energy savings vs. within-die variations design decreases significantly. Moreover, increased within-die variations in other important parameters such as voltage, temperature and workload, would only exacerbate this effect. In summary, within-die parameter variations pose a severe challenge to scaling recovery based design to large SoCs. The second major concern affecting the scalability of monolithic recovery based design is the strict timing constraint required for performing error detection and correction. The timing constraint can be expressed as follows: T clk tree + T delay sample + T clk error + T error agg <T clk period, (1) where T clk tree is the clock to flip-flop delay, T delay sample and T clk error represent the delays associated with generation of the error signal by the shadow flipflop and finally T error agg refers to the delay required for aggregating all the error signals back to gate the clock source. All the above delays must add up to less 827

3 than the system s clock period (T clk period ) so as to successfully perform clock gating before the start of the next clock cycle, when an error is detected. However, with increasing SoC sizes and the poor scaling of interconnect delays [17], the global delay components (T clk tree and T error agg) restrict the applicability of monolithic implementations of recovery based design to small systems. For example, for an operating frequency of 1 GHz at the 45nm technology node, our analysis suggests that the monolithic scheme would be feasible only for circuits of size up to 0.9mm 2. Thus, interconnect delays enforce a strict limit on the size of SoC designs for which monolithic recovery baseddesignisapplicable. Due to the above mentioned limitations associated with monolithic recovery based design, we propose applying recovery based techniques to SoCs in a more fine grained manner. There are two key issues that must be addressed in order to realize this proposal. First, we need to allow components to recover independent from the rest of the SoC, while maintaining correct operation. Second, we must explore the design space of possible partitions ranging from a monolithic recovery based design on one extreme to a fine-grained partitioning whereeachsoccomponentisinitsownrecoveryisland. In doing so, it is necessary to consider the area and energy overheads associated with creation of recovery islands. We address these issues in the following sections. 4. RECOVERY ISLANDS: SCALING RE- COVERY BASED DESIGNS TO SOCS Traditionally, SoCs are partitioned into coarse-grained voltage and frequency islands. We propose partitioning these voltage-frequency islands further into more fine grained recovery islands. Each recovery island consists of one or more SoC components and must possess the following key properties i) the ability to detect and recover from errors in any of its components independent of the rest of the SoC, and ii) the dimensions of the recovery island must allow the timing constraints imposed by Equation 1 to be satisfied. One of the challenges associated with enabling recovery islands to recover independently is to find suitable points for partitioning a given SoC into islands. Creating recovery islands from arbitrary partitions of logic would require extensive re-design of the underlying components and their respective interfaces. From an external perspective, each recovery island can take a variable number of cycles to respond to a transaction, depending on whether a component in the island has encountered an error or not. We note that the system-level communication architecture (bus or network-on-chip) used in most SoCs is already designed so as to tolerate variable latencies, be it due to bus contention or a component being busy. The communication architecture thus serves as an ideal variable latency interface to partition the SoC into recovery islands. However, one cannot just directly connect the recovery islands to the interconnect fabric, as they would violate the established interface protocols. Appropriate cross recovery-island interfaces need to be designed to interface recovery islands with the rest of the system. Figure 3 presents the structure of a generic recovery island based SoC. The SoC shown in the figure has been partitioned into three recovery islands, each consisting of one or more SoC components. Each of the recovery islands are connected to the system interconnect, with the help of cross-recovery island interfaces. We note that the system interconnect fabric itself is excluded from the recovery island partitioning process and therefore needs to be designed conservatively so as to avoid timing errors. A more detailed description of these interfaces is provided in Section A of the supplementary material. For each recovery island, the timing critical flip-flops are in- Figure 3: Recovery island based design strumentedwitherrordetectionandrecoverycircuitry[1,2]. When an error is detected, the clock is gated for the next cycle to allow the correct values to be restored to all the flipflops. All the error signals are then aggregated and fed into the operating point controller [1], which is responsible for dynamically controlling the supply voltage of the island to maintain a desired error rate. We achieve supply voltage scaling by utilizing voltage interpolation [18] so as to avoid the significant overheads associated with voltage regulators and converters. Voltage interpolation provides the ability for different groups of logic gates within a block to select between two static supply voltages VDDH and VDDL. The scheme enables dynamic modulation of a circuit s delay by choosing an appropriate combination of logic segments within a block to be connected to VDDH and VDDL, respectively. The recovery island based design methodology incurs area and energy penalties associated with additional cross-island interfaces and operating point control mechanisms. As a consequence, partitioning the SoC at the granularity of individual components can lead to considerable energy overheads and may significantly diminish the system level energy benefits obtained by the framework. Thus, it is imperative to find an energy-optimal partitioning of the SoC into recovery islands. 5. DESIGN TRADEOFFS In this section, we explore the various system level design trade-offs involved in recovery island based SoC design. We utilize the previously described WVCD SoC (Figure 1) for illustrating some of these trade-offs. Error Rate cpu wep crc me dct total Voltage Figure 4: Error rate versus voltage profile We first describe the component level characteristics to be considered when partitioning an SoC into recovery islands. The operating voltage of a component in recovery based design depends on its inherent error-voltage profile, which is in turn

4 1 CPU 0.8 WEP CRC 0.6 Total Voltage WEP CRC CPU Error Rate (%) determined by various factors such as circuit structure, component size, application workload (path activation probabilities) as well as process, voltage and temperature variations. For the example WVCD SoC, Figure 4 shows the error versus voltage profiles for the largest five components. We also plot the total error versus voltage profile for the entire SoC. As the figure shows, the total system error is mostly dominated by errors in the CPU. In a scheme in which the entire SoC is treated as a single recovery island (monolithic implementation of recovery based design), each component would be operated at a voltage mostly determined by the CPU s error-voltage profile. On the other hand, if we perform recovery at a component level, each component would be operated at its own optimum voltage based on its error-voltage profile, which would lead to substantial energy savings. However this scheme would also involve excessive overheads associated with implementing recovery islands. In general, SoC s have components that tend to greatly differ in their structure, complexity, size and workload, leading to diverse error-voltage profiles across components. This diversity is further amplified because of intra-die variations in process parameters, temperature gradients across the chip, as well as local voltage fluctuations. The partitioning scheme thus needs to incorporate this inherent diversity in the errorvoltage profiles among various components, and the overheads associated with recovery islands, in addition to system-level factors as discussed next. System Performance Loss (%) ME DCT CPU WEP CRC Error rate (%) Figure 5: System performance loss versus error rate We now motivate the need to consider system level effects while choosing optimal operating points for each recovery island. Errors in a component force it to spend clock cycles in recovery and thereby affect system performance. However, depending on how critical a component is to overall system performance, the same error rate in different components can have different effects on system performance. Also, due to complex inter-dependencies between components (e.g., concurrent execution and synchronization), the system performance impact due to errors in different components need not be additive. Figure 5 plots the system performance loss versus error rate for different components of the WVCD SoC. As can be seen, errors in the ME accelerator have the greatest impact on system performance, followed by the DCT accelerator. Therefore, for a given system level performance target, a configuration in which the rest of the components (CRC, CPU, WEP) operate at higher error rates and thereby can be voltage scaled more aggressively, is more energy efficient than a configuration in which the ME and DCT components operate at higher error rates. In complex SoCs, the correlation between error rate and system performance loss can be quite varied across components and this diversity needs to be considered while selecting the optimal operating points for each SoC component. Recovery islands that consist of components that are more critical to system performance should be operated at lower error rates, whereas those that contain components with a lower impact on system performance should be voltage scaled more aggressively so as to reduce overall system power. Also note that it is beneficial during partitioning to group together components that have similar impact on system performance. 6. RECOVERY BASED SOC DESIGN In this section, we describe a systematic methodology for recovery based SoC design that considers the issues and tradeoffs described in the previous section. The proposed methodology, shown in Figure 6, takes as its input the given SoC architecture, the application software, the desired performance target and component-level clustering constraints derived from the SoC floorplan. It produces as its output, the best SoC partitioning scheme along with optimized operating points for each island. The methodology consists of three main steps. In the component characterization step, we compute the error rate, system performance loss and energy savings for each SoC component at each possible operating voltage. The island partitioning and optimization step partitions the SoC appropriately into recovery islands, and computes the best operating point for each island. Finally, the local search step further tunes the operating points obtained in the previous step while considering the complex performance interactions between different SoC components. We elaborate upon these steps in the rest of this section. Error Rate System Performance Loss (%) Figure 6: Recovery island based design methodology 6.1 Component Characterization In this step, we first obtain the error-voltage profile and the error-system performance loss profile for each component. For generating the error-voltage profiles, we first capture bus level input traces for each component by performing cycle-accurate functional simulation of the SoC for representative workloads. We then use the captured traces as input vectors to perform post-synthesis simulations at different operating voltages to obtain the error-voltage profile for each component. For the error vs. system performance loss profile, we use an emulation based performance analysis framework. We instrument each SoC component with error injectors and circuitry that mimics error recovery, and obtain the system performance loss for increasing error rates in each component. More details on the emulation setup are provided in Supplemental Section B. We now combine both the profiles with component level energy estimates to obtain an error, system performance loss and energy tuple for each operating point (voltage). 6.2 Island Partitioning and Optimization In this step we derive an optimized partition of the SoC into recovery islands and obtain the best operating point for each 829

5 island. The number of ways an SoC consisting of N components can be partitioned into k recovery islands can be quite large (N k ). The search space involved in identifying an optimal operating point for each island further increases the design space by O k, where O is the number of operating points. To efficiently explore this design space, we adopt an iterative procedure wherein we start off with each component in a separate partition, and iteratively apply the operating point selection and island clustering steps until we can no longer find a better partition. Consider the initial partition where each component is in a separate recovery island. We can compute the best operating point for each component by modeling it as a convex optimization problem as shown in Equation 2. minimize V i n E i(v i) i=1 subject to n P i(v i) ζ; V min V i V max i =1,...,n (2) In the above equation, E i(v i)andp i(v i) refer to the energy and system performance loss respectively of the i-th component operating at voltage V i and ζ is the constraint on acceptable system performance loss. For this step, we make the simplifying assumption that system performance loss due to N different components is linearly additive. This is not true in general, due to effects such as communication dependencies, shared resources, system-level critical paths, etc. We ignore these effects in the island partitioning and optimization step to make the problem tractable, but account for them in the subsequent local search step. Once we obtain the optimal operating points for each component, we group together the two components whose operating points are closest, if this clustering is valid based upon the floorplan derived constraints. These constraints are represented as a clustering matrix that specifies which component pairs could be grouped together, based on their proximity in the SoC s floorplan. Grouping components reduces the overheads associated with recovery islands, at the cost of forcing the grouped components to operate at the same voltage. The grouping heuristic therefore minimizes the sub-optimality in operating points. We choose the two components j and k that, when grouped together, give the best energy savings E = E ri (E k (V j) E k (V k )). The first term E ri refers to the energy savings due to the reduced overheads of having one less recovery island and the second term refers to the energy loss due to one of the components (in this case, k) going from a lower operating voltage V k toahigherpointv j. We iteratively perform the island partitioning and operating point selection steps until we find no further grouping that can lead to energy savings. 6.3 Local Search In this step, we tune the operating points for the partitioned SoC obtained from the previous phase taking into account the inter-dependencies between various SoC components. We achieve this by first performing emulation of the SoC at the operating points obtained from the previous step to compute the actual system performance loss. Next, based on whether the performance loss is larger or smaller than the given target, we increase or decrease the operating voltages for each island by one unit step and measure the resulting system performance loss. We now greedily change the operating voltage of the island with the best energy savings to performance loss ratio, and repeat this process until the specified performance target is just satisfied. 7. EXPERIMENTAL RESULTS In this section, we first describe our experimental set up i=1 and the example SoCs used in our study. We then present the energy savings obtained by utilizing our framework on three different SoC designs. Our experimental methodology to evaluate the proposed concepts consists of various commercial and research tools. For obtaining the error-voltage profiles, we first perform logic synthesis of each component with Synopsys Design Compiler using the IBM 45nm technology cell library. We utilize VAR- IUS [19] for modeling the impact of inter-die and intra-die process and temperature variations on component-level errorvoltage profiles. For each of our experiments we generate chips, each of which has different intra-die variation profiles for V th and L eff values. To obtain the temperature distribution across the chip, we provide average power consumption values of each SoC component, along with the SoC floorplan to the HotSpot thermal modeling tool [20]. We use NANOSIM [21], a transistor level simulator, to obtain the power consumption data for each of the SoC components at different operating voltages. The memory energy consumption and access times are modeled using CACTI5.3 [22]. We use an Altera DE3 board [23] as our emulation platform for obtaining the component level error-rate versus system performance loss profiles. We evaluate our framework on three example SoC designs, an b MAC processor, an MPEG encoder, and a Wireless Video Capture Device. The WVCD system was described in detail in Section 3. We now briefly describe the MPEG and MAC systems. MPEG encoding entails two compute-intensive operations - Motion Estimation (ME) and DCT Compression, which are implemented as hardware accelerators. The input frames to be encoded are stored in an on-chip frame buffer and an embedded processor is in charge of co-ordinating the transfer of frames between the frame buffer and the two accelerators, and also executes the remaining tasks. The MAC processor implements the key steps of the b MAC protocol, and consists of a processor, hardware accelerators for CRC and WEP computation, and peripherals connected by a system interconnect. In order to verify the functional correctness of the various cross island interfaces as well as to accurately model the impact of errors on system performance, each SoC was partitioned into recovery islands using the proposedmethodologyandemulatedonthede3platform Figure 7: Energy distribution for conventional and recovery based SoC designs Figure 7 presents a box whisker plot of the normalized energy consumption of distinct chip instances for each of the three example SoCs. For each SoC, we evaluate the energy consumption under three different design schemes. Traditional refers to a guard band based design scheme wherein the voltage is chosen based on timing analysis using the worst case process/temperature corner provided in the cell library. 830

6 One Island refers to a recovery-based design wherein the entire SoC is treated as a single recovery island, ignoring the feasibility of timing constraints described in Equation 1. Finally, RBD refers to the proposed recovery island based design framework. Both the One Island and RBD cases are designed for a target system performance loss of no more than 2% due to error recovery. As can be seen from the figure, the RBD design achieves the best energy distribution - the median of the energy distribution is reduced by 31%-33% compared to the Traditional design. The One Island design is able to eliminate the overheads associated with die-to-die variations and thereby achieves 18%-21% improvements in the median of the energy distribution. RBD outperforms the One Island case by 11%-14%. These results clearly illustrate that (i) recovery based SoC design can significantly optimize energy consumption under variations, and (ii) the proposed recovery island based SoC design framework maximally leverages the potential of recovery based design. Figure 8 plots the percentage energy savings offered by the RBD scheme over the One Island scheme with increasing values of within die manufacturing variations. With increasing values of σ/μ, the Figure 8: Energy savings sensitivity to magnitude of WID vari- energy savings offered by the RBD ations scheme increases as it is able to locally reconfigure the operating voltage to each island s characteristics. Also note that the RBD scheme performs better for the Wireless Video Capture Device as it has a larger number of components and hence displays more diversity across components. Figure 9 plots the percentage energy savings obtained for the WVCD SoC as a function of the number of recovery islands. As can be seen from the figure, the optimal energy savings are obtained for an SoC partitioned into three recovery islands. For larger numbers of recovery islands, the overheads associated with recovery Figure 9: Energy savings sensitivity to number of recovery Islands for the WVCD system islands begin to dominate over the potential energy savings attainable by performing recovery at a finer granularity. The above example clearly demonstrates the need for performing recovery based design at an optimal granularity and hence the partitioning methodology presented in Section 6. Table 1 details the number of components, the area overheads and the number of recovery islands in the final clustering, for each of the three example SoC designs. The area and energy overheads of the required cross-recovery island interfaces and operating point controllers were estimated by synthesizing them using the IBM 45nm library. These overheads are added to the overheads reported in [1] to estimate the total overheads of recovery based design. In summary, we believe that our experiments clearly illustrate the potential benefits of recovery based SoC design in Table 1: Recovery island design details SoC No. of Area (%) Best No. of Design Components Overhead Recovery Islands MAC 8 3.2% 2 MPEG 8 4.7% 3 WVCD % 3 optimizing energy consumption under variations. 8. CONCLUSION We explored the concept of recovery based design and demonstrated how one can implement such a paradigm in the context of modern SoC designs. We presented a variation aware framework for partitioning an SoC into recovery islands and also finding the optimal operating points for each island. We applied the proposed framework to three example SoCs and demonstrated substantial energy benefits over traditional guard band based design. 9. REFERENCES [1] D. Ernst et al., Razor: a low-power pipeline based on circuit-level timing speculation, in Proc. MICRO, 2003, pp [2] K. Bowman, J. Tschanz, C. Wilkerson, S. Lu, T. Karnik, V. De, and S. Borkar, Circuit techniques for dynamic variation tolerance, in Proc. DAC, 2009, pp [3] M. Gupta, J. Rivers, P. Bose, G. Wei, and D. Brooks, Tribeca: Design for PVT variations with local recovery and fine-grained adaptation, in Proc.Micro, 2009, pp [4] S. Sarangi, B. Greskamp, A. Tiwari, and J. Torrellas, EVAL: Utilizing processors with variation-induced timing errors, in Proc. MICRO, 2008, pp [5] U.Y.Ogras,R.Marculescu,andD.Marculescu, Variation-adaptive feedback control for networks-on-chip with multiple clock domains, in Proc. DAC, 2008, pp [6] S. Garg and D. Marculescu, System-level throughput analysis for process variation aware multiple voltage-frequency island designs, ACM TODAES, vol. 13, no. 4, pp. 1 25, [7] V. J. Kozhikkottu, R. Venkatesan, A. Raghunathan, and S. Dey, VESPA: Variability emulation for System-on-Chip performance analysis, in Proc. DATE, 2011, pp [8] S. Chandra, K. Lahiri, A. Raghunathan, and S. Dey, Considering process variations during system-level power analysis, in Proc. ISLPED, 2006, pp [9] S. Pasricha, Y. Park, N. Dutt, and F. J. Kurdahi, System-level PVT variation-aware power exploration of on-chip communication architectures, ACM TODAES, vol. 14, no. 2, pp. 1 25, [10] S. Chandra, K. Lahiri, A. Raghunathan, and S. Dey, Variation-tolerant dynamic power management at the system-level, IEEE TVLSI, vol. 17, no. 9, pp , [11] S. Garg and D. Marculescu, System-level mitigation of WID leakage power variability using body-bias islands, in Proc. CODES+ISSS, 2008, pp [12] A. Kahng, S. Kang, R. Kumar, and J. Sartori, Designing a processor from the ground up to allow voltage/reliability tradeoffs, in Proc. HPCA, 2010, pp [13] L. Wan and D. Chen, DynaTune: circuit-level optimization for timing speculation considering dynamic path behavior, in Proc. ICCAD, 2009, pp [14] B. Greskamp et al., Blueshift: Designing processors for timing speculation from the ground up. in Proc. HPCA, 2009, pp [15] N. Zea, J. Sartori, B. Ahrens, and R. Kumar, Optimal power/performance pipelining for error resilient processors, in Proc. ICCD, 2010, pp [16] K. Bowman, A. Alameldeen, S. Srinivasan, and C. Wilkerson, Impact of die-to-die and within-die parameter variations on the clock frequency and throughput of multi-core processors, IEEE TVLSI, vol. 17, no. 12, pp , dec [17] ITRS, [18] K. Brownell, G. Wei, and D. Brooks, Evaluation of voltage interpolation to address process variations, in Proc.ICCAD, 2008, pp [19] S.R. Sarangi et al., VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects, Semiconductor Manufacturing, IEEE Trans., vol.21,no.1,pp.3 13,2008. [20] Skadron, K. et al., Temperature-aware microarchitecture, in Proc. ISCA, 2003, pp [21] Nanosim, Synopsys Inc. [22] CACTI-5.3, [23] ALTERA, 831

7 SUPPLEMENTAL SECTION A. CROSS ISLAND INTERFACES In this section, we give an overview of the cross island interfaces needed to ensure correct functioning of the recovery island based designs. As noted earlier, each recovery island needs to adhere to the existing system bus protocols and hence appropriate wrappers need to be designed for each island. In this work, we implemented and tested interface wrappers (explained below) for both master and slave interfaces of a commercially available communication architecture, the Avalon Interconnect Fabric from Altera [23]. A similar procedure can be applied to design interface wrappers for any other standard communication architecture. B. EMULATION AND ERROR INJECTION FRAMEWORK In this section, we describe in detail the emulation and error injection framework utilized to obtain the error vs. system performance loss profiles for each SoC component. Figure 11 gives an overview of the proposed emulation based error injection framework. To perform the required analysis we first instrument each SoC component with cross island interfaces described in the previous section. To analyze the impact of error recovery on system performance, we mimic the error aggregation signal generated by shadow latches using a synthetic error injection module and clock gate each component using the generated error signal. The error injection circuit consists of a random number generator (LSFR) and a software programmable control register. The error signal is produced by comparing the generated random number to the threshold value programmed into the control register. Thus, the error rate in a component can be appropriately controlled by writing the required value into a threshold register through software. Figure 10: Cross island interface logic Figure 10 shows an overview of the wrapper interface for a Read-Write Avalon Master. The wrapper needs to deal with two scenarios. First, it needs to ensure that a read or write request sent out by the component during an error recovery cycle is not interpreted by the communication architecture as two requests. Second, it needs to make sure that any data returned by the communication architecture during an error recovery cycle is always captured and not lost. As can be seen from the figure, the wrapper consists of two major components, the intervention detection logic and the selection and sampling logic. The intervention detection logic analyzes the error signal coming from an island, the request signals from the master and the data valid signal from the bus to determine if there is a need to intervene in the current cycle. The selection logic is a set of multiplexers that perform the desired modification to the bus signals. Consider a scenario wherein the master interface sends out a write request during a cycle in which an error occurred. The intervention detection logic should detect this scenario and deassert the write request signal in the next cycle, so that the system bus does not treat it as two distinct write requests. This functionality is achieved with the help of simple multiplexer logic. The more complicated scenario arises when the master issues a read request and the system bus responds to it during a recovery cycle. In this case, we need to ensure that the data returned is appropriately captured and is available to the master in the next cycle. This functionality is achieved with the help of sampler logic which always stores a clock delayed version of the read data bus signal. The selection logic now sets the data valid signal high on the next cycle and the sampled read data signal is appropriately routed to the master interface. Figure 11: Emulation based error injection framework The application program that runs on the SoC is instrumented with a software control loop that is in charge of programming the error rate for a given component, executing the application and finally measuring the overall system performance with the help of hardware performance counters. We note that the system performance metric is chosen by the system designer and can be anything ranging from throughput, latency or a pre-defined performance score over a set of benchmarks. The emulation board used for our experiments is an Altera DE3 board equipped with a Stratix III EPS3SL150 FPGA. The proposed methodology can also be applied to any state-of-the-art emulation platform. C. DISCUSSION For the recovery based design paradigm to be widely applicable to a large class of SoCs it needs to be compatible with current design flows. In this section, we discuss key considerations in this regard. We also explore alternative methodologies that could be incorporated into our proposed framework for differing design requirements. Incorporating recovery based design into current design flows: A key requirement needed to utilize the proposed 832

8 recovery based design paradigm is the ability to partition an SoC into multiple recovery islands. As noted in Section 4, variable latency interfaces in the system serve as ideal points around which the system may be partitioned. Most interfaces which exist in commercial SoCs such as communication channels, system buses and on-chip networks utilize latency insensitive protocols and hence can be appropriately re-designed or instrumented with interface wrappers to ensure correct functionality even when a component is unavailable during the error recovery process. Most commercial SoCs include components which are equipped with recovery mechanisms like pipeline flushes, state machine rollback etc. These mechanisms are essential for correcting errors from sources such as speculative execution and soft errors. Although in this study we chose to utilize a singlecycle clock gating based recovery scheme for each island, they can instead utilize their own inbuilt recovery mechanisms for dealing with timing violations. The proposed framework is not restricted to any specific error recovery scheme and can easily be adapted to deal with multiple recovery mechanisms that can exist within an SoC. Current SoC platforms make use of various power management schemes to dynamically adapt to an application s time varying power-performance requirements. Dynamic voltagefrequency scaling (DVFS) is one such widely used mechanism which modulates the voltage and frequency of individual SoC components/islands based on workload characteristics. The voltage interpolation scheme utilized by the framework requires two supply voltage rails VDDH and VDDL. One possible integration scheme with DVFS would involve utilizing existing DVFS controllers to decide the VDDH and VDDL operating voltages. The recovery based design scheme s operating point controller can then perform more fine grained voltage interpolation based on the current error rate. Thus, the proposed framework can be integrated with DVFS with minimal changes to the overall design flow. Another common practice employed in current commercial SoC design involves using IP modules procured from external vendors. These components are often non-modifiable and cannot be instrumented with the required circuitry needed to detect and recover from errors. In such a scenario, these components alone may be operated with design margins and only other system components are considered in the recovery based design process. Alternative design methodologies: In this work, we made several choices such as floorplan driven clustering, emulation based local search for eliminating non-linearities associated with component inter-dependencies, number of operating points under consideration and a static error-voltage profile based evaluation methodology. We now analyze the various alternative choices that could be adopted for achieving different end design objectives such as improved energy benefits, reduced emulation runtime etc. One such alternative involves performing cluster driven floor planning wherein various components are first clustered together without considering the delay constraints essential for correct functionality. The clustered system is then floorplanned and evaluated for delay violations. Performing partitioning prior to floorplanning could potentially help in physically grouping together components that are most suited for clustering, thereby leading to improved energy savings. However, if the current clustering configuration violates the delay requirements, the above described process would have to repeated with the next best clustering configuration. Also in some commercial design flows, floorplanning is done quite early in the design cycle and may not be flexible to changes thereafter. In this study we considered ten distinct operating points at which each component was characterized. Thus, a k component system would require 10k emulation runs to derive the error vs. system performance loss profiles. Reducing the number of operating points proportionately decreases the total run time at the cost of increased energy consumption due to a more coarse grained search space. The emulation based local search phase could also be replaced with an appropriate analytical performance model to attain similar run time savings. However, this scheme would be applicable only for systems that have relaxed overall system performance constraints as the complex inter-component interactions cannot be completely captured by an analytical framework. 833

UNIT-II LOW POWER VLSI DESIGN APPROACHES

UNIT-II LOW POWER VLSI DESIGN APPROACHES UNIT-II LOW POWER VLSI DESIGN APPROACHES Low power Design through Voltage Scaling: The switching power dissipation in CMOS digital integrated circuits is a strong function of the power supply voltage.

More information

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS Charlie Jenkins, (Altera Corporation San Jose, California, USA; chjenkin@altera.com) Paul Ekas, (Altera Corporation San Jose, California, USA; pekas@altera.com)

More information

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis N. Banerjee, A. Raychowdhury, S. Bhunia, H. Mahmoodi, and K. Roy School of Electrical and Computer Engineering, Purdue University,

More information

A Survey of the Low Power Design Techniques at the Circuit Level

A Survey of the Low Power Design Techniques at the Circuit Level A Survey of the Low Power Design Techniques at the Circuit Level Hari Krishna B Assistant Professor, Department of Electronics and Communication Engineering, Vagdevi Engineering College, Warangal, India

More information

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Ramon Canal NCD Master MIRI. NCD Master MIRI 1 Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/

More information

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS The major design challenges of ASIC design consist of microscopic issues and macroscopic issues [1]. The microscopic issues are ultra-high

More information

POWER GATING. Power-gating parameters

POWER GATING. Power-gating parameters POWER GATING Power Gating is effective for reducing leakage power [3]. Power gating is the technique wherein circuit blocks that are not in use are temporarily turned off to reduce the overall leakage

More information

A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication

A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication Peggy B. McGee, Melinda Y. Agyekum, Moustafa M. Mohamed and Steven M. Nowick {pmcgee, melinda, mmohamed,

More information

White Paper Stratix III Programmable Power

White Paper Stratix III Programmable Power Introduction White Paper Stratix III Programmable Power Traditionally, digital logic has not consumed significant static power, but this has changed with very small process nodes. Leakage current in digital

More information

The challenges of low power design Karen Yorav

The challenges of low power design Karen Yorav The challenges of low power design Karen Yorav The challenges of low power design What this tutorial is NOT about: Electrical engineering CMOS technology but also not Hand waving nonsense about trends

More information

Control Synthesis and Delay Sensor Deployment for Efficient ASV designs

Control Synthesis and Delay Sensor Deployment for Efficient ASV designs Control Synthesis and Delay Sensor Deployment for Efficient ASV designs C H A O FA N L I < C H AO F @ TA M U. E D U >, T E X A S A & M U N I V E RS I T Y S A C H I N S. S A PAT N E K A R, U N I V E RS

More information

A Novel Low-Power Scan Design Technique Using Supply Gating

A Novel Low-Power Scan Design Technique Using Supply Gating A Novel Low-Power Scan Design Technique Using Supply Gating S. Bhunia, H. Mahmoodi, S. Mukhopadhyay, D. Ghosh, and K. Roy School of Electrical and Computer Engineering, Purdue University, West Lafayette,

More information

DESIGNING powerful and versatile computing systems is

DESIGNING powerful and versatile computing systems is 560 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 5, MAY 2007 Variation-Aware Adaptive Voltage Scaling System Mohamed Elgebaly, Member, IEEE, and Manoj Sachdev, Senior

More information

Low Power Design of Successive Approximation Registers

Low Power Design of Successive Approximation Registers Low Power Design of Successive Approximation Registers Rabeeh Majidi ECE Department, Worcester Polytechnic Institute, Worcester MA USA rabeehm@ece.wpi.edu Abstract: This paper presents low power design

More information

TECHNOLOGY scaling, aided by innovative circuit techniques,

TECHNOLOGY scaling, aided by innovative circuit techniques, 122 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 2, FEBRUARY 2006 Energy Optimization of Pipelined Digital Systems Using Circuit Sizing and Supply Scaling Hoang Q. Dao,

More information

Low Power System-On-Chip-Design Chapter 12: Physical Libraries

Low Power System-On-Chip-Design Chapter 12: Physical Libraries 1 Low Power System-On-Chip-Design Chapter 12: Physical Libraries Friedemann Wesner 2 Outline Standard Cell Libraries Modeling of Standard Cell Libraries Isolation Cells Level Shifters Memories Power Gating

More information

VARIATION-TOLERANT MOTION ESTIMATION ARCHITECTURE. Girish V. Varatkar and Naresh R. Shanbhag

VARIATION-TOLERANT MOTION ESTIMATION ARCHITECTURE. Girish V. Varatkar and Naresh R. Shanbhag VARIATION-TOLERANT MOTION ESTIMATION ARCHITECTURE Girish V. Varatkar and Naresh R. Shanbhag Coordinated Science Laboratory/ECE Department University of Illinois at Urbana-Champaign 138 W Main St., Urbana

More information

Low-Power Digital CMOS Design: A Survey

Low-Power Digital CMOS Design: A Survey Low-Power Digital CMOS Design: A Survey Krister Landernäs June 4, 2005 Department of Computer Science and Electronics, Mälardalen University Abstract The aim of this document is to provide the reader with

More information

Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements

Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements Christophe Giacomotto 1, Mandeep Singh 1, Milena Vratonjic 1, Vojin G. Oklobdzija 1 1 Advanced Computer systems Engineering Laboratory,

More information

THERE is a growing need for high-performance and. Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment

THERE is a growing need for high-performance and. Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment 1014 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 7, JULY 2005 Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment Dongwoo Lee, Student

More information

Reduce Power Consumption for Digital Cmos Circuits Using Dvts Algoritham

Reduce Power Consumption for Digital Cmos Circuits Using Dvts Algoritham IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE) e-issn: 2278-1676,p-ISSN: 2320-3331, Volume 10, Issue 5 Ver. II (Sep Oct. 2015), PP 109-115 www.iosrjournals.org Reduce Power Consumption

More information

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of

More information

UNIT-III POWER ESTIMATION AND ANALYSIS

UNIT-III POWER ESTIMATION AND ANALYSIS UNIT-III POWER ESTIMATION AND ANALYSIS In VLSI design implementation simulation software operating at various levels of design abstraction. In general simulation at a lower-level design abstraction offers

More information

INF3430 Clock and Synchronization

INF3430 Clock and Synchronization INF3430 Clock and Synchronization P.P.Chu Using VHDL Chapter 16.1-6 INF 3430 - H12 : Chapter 16.1-6 1 Outline 1. Why synchronous? 2. Clock distribution network and skew 3. Multiple-clock system 4. Meta-stability

More information

Transistor Network Restructuring Against NBTI Degradation. P. F. Butzen a, V. Dal Bem a, A. I. Reis b, R. P. Ribas b.

Transistor Network Restructuring Against NBTI Degradation. P. F. Butzen a, V. Dal Bem a, A. I. Reis b, R. P. Ribas b. Transistor Network Restructuring Against NBTI Degradation. P. F. Butzen a, V. Dal Bem a, A. I. Reis b, R. P. Ribas b. a PGMICRO, Federal University of Rio Grande do Sul, Porto Alegre, Brazil b Institute

More information

PROCESS and environment parameter variations in scaled

PROCESS and environment parameter variations in scaled 1078 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 10, OCTOBER 2006 Reversed Temperature-Dependent Propagation Delay Characteristics in Nanometer CMOS Circuits Ranjith Kumar

More information

Jeffrey Davis Georgia Institute of Technology School of ECE Atlanta, GA Tel No

Jeffrey Davis Georgia Institute of Technology School of ECE Atlanta, GA Tel No Wave-Pipelined 2-Slot Time Division Multiplexed () Routing Ajay Joshi Georgia Institute of Technology School of ECE Atlanta, GA 3332-25 Tel No. -44-894-9362 joshi@ece.gatech.edu Jeffrey Davis Georgia Institute

More information

Chapter 1 Introduction

Chapter 1 Introduction Chapter 1 Introduction 1.1 Introduction There are many possible facts because of which the power efficiency is becoming important consideration. The most portable systems used in recent era, which are

More information

An Area Efficient Decomposed Approximate Multiplier for DCT Applications

An Area Efficient Decomposed Approximate Multiplier for DCT Applications An Area Efficient Decomposed Approximate Multiplier for DCT Applications K.Mohammed Rafi 1, M.P.Venkatesh 2 P.G. Student, Department of ECE, Shree Institute of Technical Education, Tirupati, India 1 Assistant

More information

Power Spring /7/05 L11 Power 1

Power Spring /7/05 L11 Power 1 Power 6.884 Spring 2005 3/7/05 L11 Power 1 Lab 2 Results Pareto-Optimal Points 6.884 Spring 2005 3/7/05 L11 Power 2 Standard Projects Two basic design projects Processor variants (based on lab1&2 testrigs)

More information

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1 Design Of Low Power Approximate Mirror Adder Sasikala.M 1, Dr.G.K.D.Prasanna Venkatesan 2 ME VLSI student 1, Vice Principal, Professor and Head/ECE 2 PGP college of Engineering and Technology Nammakkal,

More information

DATA ENCODING TECHNIQUES FOR LOW POWER CONSUMPTION IN NETWORK-ON-CHIP

DATA ENCODING TECHNIQUES FOR LOW POWER CONSUMPTION IN NETWORK-ON-CHIP DATA ENCODING TECHNIQUES FOR LOW POWER CONSUMPTION IN NETWORK-ON-CHIP S. Narendra, G. Munirathnam Abstract In this project, a low-power data encoding scheme is proposed. In general, system-on-chip (soc)

More information

Instruction-Driven Clock Scheduling with Glitch Mitigation

Instruction-Driven Clock Scheduling with Glitch Mitigation Instruction-Driven Clock Scheduling with Glitch Mitigation ABSTRACT Gu-Yeon Wei, David Brooks, Ali Durlov Khan and Xiaoyao Liang School of Engineering and Applied Sciences, Harvard University Oxford St.,

More information

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI ELEN 689 606 Techniques for Layout Synthesis and Simulation in EDA Project Report On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital

More information

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering Low-Power VLSI Seong-Ook Jung 2013. 5. 27. sjung@yonsei.ac.kr VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering Contents 1. Introduction 2. Power classification & Power performance

More information

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling EE241 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolic Lecture 15 Low-Power Design: Supply Voltage Scaling Announcements Homework #2 due today Midterm project reports due next Thursday

More information

LEAKAGE POWER REDUCTION IN CMOS CIRCUITS USING LEAKAGE CONTROL TRANSISTOR TECHNIQUE IN NANOSCALE TECHNOLOGY

LEAKAGE POWER REDUCTION IN CMOS CIRCUITS USING LEAKAGE CONTROL TRANSISTOR TECHNIQUE IN NANOSCALE TECHNOLOGY LEAKAGE POWER REDUCTION IN CMOS CIRCUITS USING LEAKAGE CONTROL TRANSISTOR TECHNIQUE IN NANOSCALE TECHNOLOGY B. DILIP 1, P. SURYA PRASAD 2 & R. S. G. BHAVANI 3 1&2 Dept. of ECE, MVGR college of Engineering,

More information

ESTIMATION OF LEAKAGE POWER IN CMOS DIGITAL CIRCUIT STACKS

ESTIMATION OF LEAKAGE POWER IN CMOS DIGITAL CIRCUIT STACKS ESTIMATION OF LEAKAGE POWER IN CMOS DIGITAL CIRCUIT STACKS #1 MADDELA SURENDER-M.Tech Student #2 LOKULA BABITHA-Assistant Professor #3 U.GNANESHWARA CHARY-Assistant Professor Dept of ECE, B. V.Raju Institute

More information

Fast Placement Optimization of Power Supply Pads

Fast Placement Optimization of Power Supply Pads Fast Placement Optimization of Power Supply Pads Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of Illinois at Urbana-Champaign

More information

Lecture #2 Solving the Interconnect Problems in VLSI

Lecture #2 Solving the Interconnect Problems in VLSI Lecture #2 Solving the Interconnect Problems in VLSI C.P. Ravikumar IIT Madras - C.P. Ravikumar 1 Interconnect Problems Interconnect delay has become more important than gate delays after 130nm technology

More information

CMOS circuits and technology limits

CMOS circuits and technology limits Section I CMOS circuits and technology limits 1 Energy efficiency limits of digital circuits based on CMOS transistors Elad Alon 1.1 Overview Over the past several decades, CMOS (complementary metal oxide

More information

Managing Cross-talk Noise

Managing Cross-talk Noise Managing Cross-talk Noise Rajendran Panda Motorola Inc., Austin, TX Advanced Tools Organization Central in-house CAD tool development and support organization catering to the needs of all design teams

More information

Low-Power CMOS VLSI Design

Low-Power CMOS VLSI Design Low-Power CMOS VLSI Design ( 范倫達 ), Ph. D. Department of Computer Science, National Chiao Tung University, Taiwan, R.O.C. Fall, 2017 ldvan@cs.nctu.edu.tw http://www.cs.nctu.tw/~ldvan/ Outline Introduction

More information

Ruixing Yang

Ruixing Yang Design of the Power Switching Network Ruixing Yang 15.01.2009 Outline Power Gating implementation styles Sleep transistor power network synthesis Wakeup in-rush current control Wakeup and sleep latency

More information

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

Design A Redundant Binary Multiplier Using Dual Logic Level Technique Design A Redundant Binary Multiplier Using Dual Logic Level Technique Sreenivasa Rao Assistant Professor, Department of ECE, Santhiram Engineering College, Nandyala, A.P. Jayanthi M.Tech Scholar in VLSI,

More information

Design as You See FIT: System-Level Soft Error Analysis of Sequential Circuits

Design as You See FIT: System-Level Soft Error Analysis of Sequential Circuits Design as You See FIT: System-Level Soft Error Analysis of Sequential Circuits Dan Holcomb Wenchao Li Sanjit A. Seshia Department of EECS University of California, Berkeley Design Automation and Test in

More information

Sub-threshold Logic Circuit Design using Feedback Equalization

Sub-threshold Logic Circuit Design using Feedback Equalization Sub-threshold Logic Circuit esign using Feedback Equalization Mahmoud Zangeneh and Ajay Joshi Electrical and Computer Engineering epartment, Boston University, Boston, MA, USA {zangeneh, joshi}@bu.edu

More information

Run-Length Based Huffman Coding

Run-Length Based Huffman Coding Chapter 5 Run-Length Based Huffman Coding This chapter presents a multistage encoding technique to reduce the test data volume and test power in scan-based test applications. We have proposed a statistical

More information

Policy-Based RTL Design

Policy-Based RTL Design Policy-Based RTL Design Bhanu Kapoor and Bernard Murphy bkapoor@atrenta.com Atrenta, Inc., 2001 Gateway Pl. 440W San Jose, CA 95110 Abstract achieving the desired goals. We present a new methodology to

More information

Towards PVT-Tolerant Glitch-Free Operation in FPGAs

Towards PVT-Tolerant Glitch-Free Operation in FPGAs Towards PVT-Tolerant Glitch-Free Operation in FPGAs Safeen Huda and Jason H. Anderson ECE Department, University of Toronto, Canada 24 th ACM/SIGDA International Symposium on FPGAs February 22, 2016 Motivation

More information

Pulse propagation for the detection of small delay defects

Pulse propagation for the detection of small delay defects Pulse propagation for the detection of small delay defects M. Favalli DI - Univ. of Ferrara C. Metra DEIS - Univ. of Bologna Abstract This paper addresses the problems related to resistive opens and bridging

More information

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Presentation of UniServer Horizon 2020 European project findings: X-Gene server chips, voltage-noise characterization, high-bandwidth voltage measurements,

More information

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing *

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing * Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing * Radu Teodorescu, Jun Nakano, Abhishek Tiwari and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

Physical Synthesis of Bus Matrix for High Bandwidth Low Power On-chip Communications

Physical Synthesis of Bus Matrix for High Bandwidth Low Power On-chip Communications Physical Synthesis of Bus Matrix for High Bandwidth Low Power On-chip Communications Renshen Wang 1, Evangeline Young 2, Ronald Graham 1 and Chung-Kuan Cheng 1 1 University of California San Diego 2 The

More information

Tiago Reimann Cliff Sze Ricardo Reis. Gate Sizing and Threshold Voltage Assignment for High Performance Microprocessor Designs

Tiago Reimann Cliff Sze Ricardo Reis. Gate Sizing and Threshold Voltage Assignment for High Performance Microprocessor Designs Tiago Reimann Cliff Sze Ricardo Reis Gate Sizing and Threshold Voltage Assignment for High Performance Microprocessor Designs A grain of rice has the price of more than a 100 thousand transistors Source:

More information

Mixed Synchronous/Asynchronous State Memory for Low Power FSM Design

Mixed Synchronous/Asynchronous State Memory for Low Power FSM Design Mixed Synchronous/Asynchronous State Memory for Low Power FSM Design Cao Cao and Bengt Oelmann Department of Information Technology and Media, Mid-Sweden University S-851 70 Sundsvall, Sweden {cao.cao@mh.se}

More information

CMOS Process Variations: A Critical Operation Point Hypothesis

CMOS Process Variations: A Critical Operation Point Hypothesis CMOS Process Variations: A Critical Operation Point Hypothesis Janak H. Patel Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign jhpatel@uiuc.edu Computer Systems

More information

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir Parallel Computing 2020: Preparing for the Post-Moore Era Marc Snir THE (CMOS) WORLD IS ENDING NEXT DECADE So says the International Technology Roadmap for Semiconductors (ITRS) 2 End of CMOS? IN THE LONG

More information

Temperature-adaptive voltage tuning for enhanced energy efficiency in ultra-low-voltage circuits

Temperature-adaptive voltage tuning for enhanced energy efficiency in ultra-low-voltage circuits Microelectronics Journal 39 (2008) 1714 1727 www.elsevier.com/locate/mejo Temperature-adaptive voltage tuning for enhanced energy efficiency in ultra-low-voltage circuits Ranjith Kumar, Volkan Kursun Department

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DESIGN OF LOW POWER MULTIPLIERS USING APPROXIMATE ADDER MR. PAWAN SONWANE 1, DR.

More information

Study On Two-stage Architecture For Synchronous Buck Converter In High-power-density Power Supplies title

Study On Two-stage Architecture For Synchronous Buck Converter In High-power-density Power Supplies title Study On Two-stage Architecture For Synchronous Buck Converter In High-power-density Computing Click to add presentation Power Supplies title Click to edit Master subtitle Tirthajyoti Sarkar, Bhargava

More information

Supply-Adaptive Performance Monitoring/Control Employing ILRO Frequency Tuning for Highly Efficient Multicore Processors

Supply-Adaptive Performance Monitoring/Control Employing ILRO Frequency Tuning for Highly Efficient Multicore Processors EE 241 Project Final Report 2013 1 Supply-Adaptive Performance Monitoring/Control Employing ILRO Frequency Tuning for Highly Efficient Multicore Processors Jaeduk Han, Student Member, IEEE, Angie Wang,

More information

Design of Low Power Vlsi Circuits Using Cascode Logic Style

Design of Low Power Vlsi Circuits Using Cascode Logic Style Design of Low Power Vlsi Circuits Using Cascode Logic Style Revathi Loganathan 1, Deepika.P 2, Department of EST, 1 -Velalar College of Enginering & Technology, 2- Nandha Engineering College,Erode,Tamilnadu,India

More information

LOW POWER & LOW VOLTAGE APPROXIMATION ADDERS IMPLEMENTATION FOR DIGITAL SIGNAL PROCESSING Raja Shekhar P* 1, G. Anad Babu 2

LOW POWER & LOW VOLTAGE APPROXIMATION ADDERS IMPLEMENTATION FOR DIGITAL SIGNAL PROCESSING Raja Shekhar P* 1, G. Anad Babu 2 ISSN 2277-2685 IJESR/October 2014/ Vol-4/Issue-10/666-671 Raja Shekhar P et al./ International Journal of Engineering & Science Research ABSTRACT LOW POWER & LOW VOLTAGE APPROXIMATION ADDERS IMPLEMENTATION

More information

RECENT technology trends have lead to an increase in

RECENT technology trends have lead to an increase in IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004 1581 Noise Analysis Methodology for Partially Depleted SOI Circuits Mini Nanua and David Blaauw Abstract In partially depleted silicon-on-insulator

More information

A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability

A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability L. Wanner, C. Apte, R. Balani, Puneet Gupta, and Mani Srivastava University of California, Los Angeles puneet@ee.ucla.edu

More information

Research Statement. Sorin Cotofana

Research Statement. Sorin Cotofana Research Statement Sorin Cotofana Over the years I ve been involved in computer engineering topics varying from computer aided design to computer architecture, logic design, and implementation. In the

More information

A FPGA Implementation of Power Efficient Encoding Schemes for NoC with Error Detection

A FPGA Implementation of Power Efficient Encoding Schemes for NoC with Error Detection IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 70-76 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org A FPGA Implementation of Power

More information

LSI and Circuit Technologies for the SX-8 Supercomputer

LSI and Circuit Technologies for the SX-8 Supercomputer LSI and Circuit Technologies for the SX-8 Supercomputer By Jun INASAKA,* Toshio TANAHASHI,* Hideaki KOBAYASHI,* Toshihiro KATOH,* Mikihiro KAJITA* and Naoya NAKAYAMA This paper describes the LSI and circuit

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH 2009 427 Power Management of Voltage/Frequency Island-Based Systems Using Hardware-Based Methods Puru Choudhary,

More information

Analysis and Reduction of On-Chip Inductance Effects in Power Supply Grids

Analysis and Reduction of On-Chip Inductance Effects in Power Supply Grids Analysis and Reduction of On-Chip Inductance Effects in Power Supply Grids Woo Hyung Lee Sanjay Pant David Blaauw Department of Electrical Engineering and Computer Science {leewh, spant, blaauw}@umich.edu

More information

An Overview of Static Power Dissipation

An Overview of Static Power Dissipation An Overview of Static Power Dissipation Jayanth Srinivasan 1 Introduction Power consumption is an increasingly important issue in general purpose processors, particularly in the mobile computing segment.

More information

Transmission-Line-Based, Shared-Media On-Chip. Interconnects for Multi-Core Processors

Transmission-Line-Based, Shared-Media On-Chip. Interconnects for Multi-Core Processors Design for MOSIS Educational Program (Research) Transmission-Line-Based, Shared-Media On-Chip Interconnects for Multi-Core Processors Prepared by: Professor Hui Wu, Jianyun Hu, Berkehan Ciftcioglu, Jie

More information

Signal Integrity Management in an SoC Physical Design Flow

Signal Integrity Management in an SoC Physical Design Flow Signal Integrity Management in an SoC Physical Design Flow Murat Becer Ravi Vaidyanathan Chanhee Oh Rajendran Panda Motorola, Inc., Austin, TX Presenter: Rajendran Panda Talk Outline Functional and Delay

More information

ON CHIP COMMUNICATION ARCHITECTURE POWER ESTIMATION IN HIGH FREQUENCY HIGH POWER MODEL

ON CHIP COMMUNICATION ARCHITECTURE POWER ESTIMATION IN HIGH FREQUENCY HIGH POWER MODEL ON CHIP COMMUNICATION ARCHITECTURE POWER ESTIMATION IN HIGH FREQUENCY HIGH POWER MODEL Khalid B. Suliman 1, Rashid A. Saeed and Raed A. Alsaqour 3 1 Department of Electrical and Electronic Engineering,

More information

EDA Challenges for Low Power Design. Anand Iyer, Cadence Design Systems

EDA Challenges for Low Power Design. Anand Iyer, Cadence Design Systems EDA Challenges for Low Power Design Anand Iyer, Cadence Design Systems Agenda Introduction ti LP techniques in detail Challenges to low power techniques Guidelines for choosing various techniques Why is

More information

Automated FSM Error Correction for Single Event Upsets

Automated FSM Error Correction for Single Event Upsets Automated FSM Error Correction for Single Event Upsets Nand Kumar and Darren Zacher Mentor Graphics Corporation nand_kumar{darren_zacher}@mentor.com Abstract This paper presents a technique for automatic

More information

Dual-K K Versus Dual-T T Technique for Gate Leakage Reduction : A Comparative Perspective

Dual-K K Versus Dual-T T Technique for Gate Leakage Reduction : A Comparative Perspective Dual-K K Versus Dual-T T Technique for Gate Leakage Reduction : A Comparative Perspective S. P. Mohanty, R. Velagapudi and E. Kougianos Dept of Computer Science and Engineering University of North Texas

More information

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations Sno Projects List IEEE 1 High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations 2 A Generalized Algorithm And Reconfigurable Architecture For Efficient And Scalable

More information

Design of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi

Design of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 105 Design of Baugh Wooley Multiplier with Adaptive Hold Logic M.Kavia, V.Meenakshi Abstract Mostly, the overall

More information

A Novel Continuous-Time Common-Mode Feedback for Low-Voltage Switched-OPAMP

A Novel Continuous-Time Common-Mode Feedback for Low-Voltage Switched-OPAMP 10.4 A Novel Continuous-Time Common-Mode Feedback for Low-oltage Switched-OPAMP M. Ali-Bakhshian Electrical Engineering Dept. Sharif University of Tech. Azadi Ave., Tehran, IRAN alibakhshian@ee.sharif.edu

More information

Evaluation of Low-Leakage Design Techniques for Field Programmable Gate Arrays

Evaluation of Low-Leakage Design Techniques for Field Programmable Gate Arrays Evaluation of Low-Leakage Design Techniques for Field Programmable Gate Arrays Arifur Rahman and Vijay Polavarapuv Department of Electrical and Computer Engineering, Polytechnic University, Brooklyn, NY

More information

MTCMOS Post-Mask Performance Enhancement

MTCMOS Post-Mask Performance Enhancement JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.4, NO.4, DECEMBER, 2004 263 MTCMOS Post-Mask Performance Enhancement Kyosun Kim*, Hyo-Sig Won**, and Kwang-Ok Jeong** Abstract In this paper, we motivate

More information

Contents 1 Introduction 2 MOS Fabrication Technology

Contents 1 Introduction 2 MOS Fabrication Technology Contents 1 Introduction... 1 1.1 Introduction... 1 1.2 Historical Background [1]... 2 1.3 Why Low Power? [2]... 7 1.4 Sources of Power Dissipations [3]... 9 1.4.1 Dynamic Power... 10 1.4.2 Static Power...

More information

RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM

RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, Shaojun Wei Institute of Microelectronics Tsinghua University The 45th International

More information

Low Power and High Speed Multi Threshold Voltage Interface Circuits Sherif A. Tawfik and Volkan Kursun, Member, IEEE

Low Power and High Speed Multi Threshold Voltage Interface Circuits Sherif A. Tawfik and Volkan Kursun, Member, IEEE IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 Low Power and High Speed Multi Threshold Voltage Interface Circuits Sherif A. Tawfik and Volkan Kursun, Member, IEEE Abstract Employing

More information

DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers

DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers Muhammad Nummer and Manoj Sachdev University of Waterloo, Ontario, Canada mnummer@vlsi.uwaterloo.ca, msachdev@ece.uwaterloo.ca

More information

Area and Energy-Efficient Crosstalk Avoidance Codes for On-Chip Buses

Area and Energy-Efficient Crosstalk Avoidance Codes for On-Chip Buses Area and Energy-Efficient Crosstalk Avoidance Codes for On-Chip Buses Srinivasa R. Sridhara, Arshad Ahmed, and Naresh R. Shanbhag Coordinated Science Laboratory/ECE Department University of Illinois at

More information

Domino Static Gates Final Design Report

Domino Static Gates Final Design Report Domino Static Gates Final Design Report Krishna Santhanam bstract Static circuit gates are the standard circuit devices used to build the major parts of digital circuits. Dynamic gates, such as domino

More information

Dynamic Voltage and Frequency Scaling for Power- Constrained Design using Process Voltage and Temperature Sensor Circuits

Dynamic Voltage and Frequency Scaling for Power- Constrained Design using Process Voltage and Temperature Sensor Circuits Journal of Information Processing Systems, Vol.7, No.1, March 2011 DOI : 10.3745/JIPS.2011.7.1.093 Dynamic Voltage and Frequency Scaling for Power- Constrained Design using Process Voltage and Temperature

More information

LSI Design Flow Development for Advanced Technology

LSI Design Flow Development for Advanced Technology LSI Design Flow Development for Advanced Technology Atsushi Tsuchiya LSIs that adopt advanced technologies, as represented by imaging LSIs, now contain 30 million or more logic gates and the scale is beginning

More information

Novel implementation of Data Encoding and Decoding Techniques for Reducing Power Consumption in Network-on-Chip

Novel implementation of Data Encoding and Decoding Techniques for Reducing Power Consumption in Network-on-Chip Novel implementation of Data Encoding and Decoding Techniques for Reducing Power Consumption in Network-on-Chip Rathod Shilpa M.Tech, VLSI Design and Embedded Systems, Department of Electronics & CommunicationEngineering,

More information

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic CONTENTS PART I: THE FABRICS Chapter 1: Introduction (32 pages) 1.1 A Historical

More information

Designing of Low-Power VLSI Circuits using Non-Clocked Logic Style

Designing of Low-Power VLSI Circuits using Non-Clocked Logic Style International Journal of Advancements in Research & Technology, Volume 1, Issue3, August-2012 1 Designing of Low-Power VLSI Circuits using Non-Clocked Logic Style Vishal Sharma #, Jitendra Kaushal Srivastava

More information

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Ho Young Kim, Robert Maxwell, Ankil Patel, Byeong Kil Lee Abstract The purpose of this study is to analyze and compare the

More information

Automatic Package and Board Decoupling Capacitor Placement Using Genetic Algorithms and M-FDM

Automatic Package and Board Decoupling Capacitor Placement Using Genetic Algorithms and M-FDM June th 2008 Automatic Package and Board Decoupling Capacitor Placement Using Genetic Algorithms and M-FDM Krishna Bharath, Ege Engin and Madhavan Swaminathan School of Electrical and Computer Engineering

More information

On-silicon Instrumentation

On-silicon Instrumentation On-silicon Instrumentation An approach to alleviate the variability problem Peter Y. K. Cheung Department of Electrical and Electronic Engineering 18 th March 2014 U. of York How we started (in 2006)!

More information

cq,reg clk,slew min,logic hold clk slew clk,uncertainty

cq,reg clk,slew min,logic hold clk slew clk,uncertainty Clock Network Design for Ultra-Low Power Applications Mingoo Seok, David Blaauw, Dennis Sylvester EECS, University of Michigan, Ann Arbor, MI, USA mgseok@umich.edu ABSTRACT Robust design is a critical

More information

Research in Support of the Die / Package Interface

Research in Support of the Die / Package Interface Research in Support of the Die / Package Interface Introduction As the microelectronics industry continues to scale down CMOS in accordance with Moore s Law and the ITRS roadmap, the minimum feature size

More information