Improving Energy-Efficiency of Multicores using First-Order Modeling

Size: px
Start display at page:

Download "Improving Energy-Efficiency of Multicores using First-Order Modeling"

Transcription

1 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1404 Improving Energy-Efficiency of Multicores using First-Order Modeling VASILEIOS SPILIOPOULOS ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2016 ISSN ISBN urn:nbn:se:uu:diva

2 Dissertation presented at Uppsala University to be publicly examined in ITC/2446, Lägerhyddsvägen 2, Uppsala, Thursday, 29 September 2016 at 13:00 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Professor Lieven Eeckhout (Ghent University, Belgium). Abstract Spiliopoulos, V Improving Energy-Efficiency of Multicores using First-Order Modeling. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology pp. Uppsala: Acta Universitatis Upsaliensis. ISBN In the recent decades, power consumption has evolved to one of the most critical resources in a computer system. In the form of electricity bill in data centers, battery life in mobile devices, or thermal constraints in desktops and laptops, power consumption imposes several limitations in today s processors and improving power and energy efficiency is one of the most urgent research topics of Computer Architecture. Dynamic Voltage and Frequency Scaling (DVFS) and Cache Resizing are among the most popular energy saving techniques. Previous work, however, has focused on developing heuristics and trial-and-error methods that yield acceptable savings, but fail to provide insight and understanding of how these techniques affect power and performance of a computer system. In contrast, this Thesis proposes the use of first-order modeling to improve the energy efficiency of computer systems. A first-order model needs to be (i) accurate enough to efficiently drive DVFS and Cache Resizing decisions, and (ii) simple enough to eliminate the overhead of collecting the required inputs to the model. We show that such models can be constructed and successfully applied in modern systems. For DVFS, we propose to scale frequency down to exploit applications memory slack, i.e., periods that the processor spends waiting for data to be fetched from the main memory. In such cases, the processor frequency can be scaled down to save energy without inordinate performance penalty. Our DVFS models can detect slack and predict the impact of DVFS in both power and performance with great accuracy. Cache Resizing, on the other hand, relies on the fact that many applications do not benefit from the vast amount of cache that modern processors are equipped with. In such cases, the cache can be resized to save static energy consumption at limited performance cost. Since both techniques are related with the memory behavior of applications, we propose a unified model to manage the two techniques in tandem and maximize energy efficiency through synergistic DVFS and Cache Resizing. Finally, our experience with DVFS in real systems motivated us to contribute to the integration of DVFS into the gem5 simulator. Unlike other simulators that ignore the role of OS in DVFS, we extend the gem5 simulator by developing the hardware and software components that allow existing Linux DVFS infrastructure to be seamlessly integrated in the simulator. Keywords: Computer Architecture, DVFS, Cache Resizing, Interval modeling, Power modeling Vasileios Spiliopoulos, Department of Information Technology, Computer Architecture and Computer Communication, Box 337, Uppsala University, SE Uppsala, Sweden. Vasileios Spiliopoulos 2016 ISSN ISBN urn:nbn:se:uu:diva (

3 To my parents and my loving wife.

4

5 List of papers This thesis is based on the following papers, which are referred to in the text by their Roman numerals. I II III IV V Georgios Keramidas, Vasileios Spiliopoulos, Stefanos Kaxiras, Interval-Based Models for Run-Time DVFS Orchestration in SuperScalar Processors, In Proc. International Conference on Computing Frontiers (CF), 2010 I am the primary author of this paper. Georgios Keramidas contributed in writing the text of the paper. Vasileios Spiliopoulos, Stefanos Kaxiras, Georgios Keramidas, Green Governors: A framework for Continuously Adaptive DVFS, In Proc. International Green Computing Conference and Workshops (IGCC), 2011 I am the primary author of this paper. Vasileios Spiliopoulos, Andreas Sembrant, Stefanos Kaxiras, Power-Sleuth: A Tool for Investigating your Program s Power Behavior, In Proc. International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2012 I am the primary author of this paper. Andreas Sembrant provided the phase-detection tool and contributed in discussions. Vasileios Spiliopoulos, Akash Bagdia, Andreas Hansson, Peter Aldworth, Stefanos Kaxiras, Introducing DVFS-Management in a Full-System Simulator, In Proc. International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2013 I am the primary author of this paper. Vasileios Spiliopoulos, Andreas Sembrant, Georgios Keramidas, Erik Hagersten, Stefanos Kaxiras, A Unified DVFS-Cache Resizing Framework, Technical Report , Department of Information Technology, Uppsala University, 2016 I am the primary author of this paper. Andreas Sembrant and Georgios Keramidas were involved in discussions. Reprints were made with permission from the publishers.

6 Other publications not included: Konstantinos Koukos, David Black-Schaffer, Vasileios Spiliopoulos, Stefanos Kaxiras, Towards more efficient execution: a decoupled accessexecute approach, In Proc. International Conference on Supercomputing (ICS), 2013 I developed the power model used in the paper. Alexandra Jimborean, Konstantinos Koukos, Vasileios Spiliopoulos, David Black-Schaffer, Stefanos Kaxiras, Fix the code. Don t tweak the hardware: A new compiler approach to Voltage-Frequency scaling, In Proc. International Symposium on Code Generation and Optimization (CGO), 2014 I developed the power model used in the paper. Konstantinos Koukos, Per Ekemark, Georgios Zacharopoulos, Vasileios Spiliopoulos, Stefanos Kaxiras, Alexandra Jimborean, Multiversioned decoupled access-execute: the key to energy-efficient compilation of general-purpose programs, In Proc. International Conference on Compiler Construction (CC), 2016 I developed the power model used in the paper. Kai Lampka, Björn Forsberg, Vasileios Spiliopoulos, Keep it cool and in time: With runtime monitoring to thermal-aware execution speeds for deadline constrained systems, In Journal of Parallel and Distributed Computing (JPDC), 2016 I contributed with technical details about the gem5 and McPAT and provided scripts to estimate power consumption.

7 Contents 1 Introduction Power Consumption in Computer Systems Power & Energy Efficiency Metrics Power and Performance Modeling for Energy Efficiency DVFS Performance Modeling Interval-based DVFS Performance Model Stall-based model Miss-based model Implementing the Models in Real Processors Model Accuracy Model Extensions DVFS Power Modeling Measuring Power Consumption From the voltage regulator From the motherboard ATX connector A Frequency-Independent Power Model Understanding the Power Behavior of Applications Phase Detection Utilizing Program Phases to Aid Performance Event Monitoring Estimating Power and Performance in Different Frequencies Improving Energy Efficiency with Linux Frequency Governors Green Governors Multicore Management Co-ordinated DVFS and Cache Resizing Management Unified DVFS-Cache Resizing Model Estimating MLP in Different Cache Configurations Estimating Performance Frequency-LLC Size Adaptation Introducing DVFS Support in gem5 Simulator Hardware Extensions Software Integration Validation and Use-cases... 39

8 8 Summary Svensk Sammanfattning Bakgrund Sammanfattning av Forskningen Acknowledgements References... 49

9 1. Introduction For many years, improving performance had been the main concern of research in Computer Architecture. Maximizing performance at any cost led to increased complexity in different levels of the design process. At the circuit level, the shrinking of manufacturing process combined with aggressive pipelining led to faster processors through increased clock frequency. At the architectural level, an abundance of sophisticated techniques have contributed to improving performance: out-of-order execution, several cache levels, aggressive prefetching, branch prediction are only a few of the advances in the field of Computer Architecture in recent years. In the past decade, however, optimizing performance at any cost was not possible any more, as energy consumption evolved to one of the most important design constraints. Energy consumption raised as a crucial limitation in modern computer systems due to two main reasons. First, computers hit the power wall. With the increase of frequency, higher power consumption led to increased thermal dissipation, approaching the physical limits of the devices; further increasing energy consumption and thermal dissipation would simply lead to temperatures that chips cannot tolerate. Second, energy itself has become a first-priority resource, in the form of electricity cost for large data-centers and battery life for mobile devices. Consequently, in the past 15 years, lowering energy consumption has become an important concern even in high-performance computing. However, performance is still important for computer systems, hence architects strive for what is known as energy efficiency: improving performance should only come at a reasonable energy cost, or, in other words, the energy overhead should not be more than the performance benefit achieved by a given optimization. Similarly, there are many different energy-saving techniques, the challenge, however, is to reduce energy consumption with limited performance degradation. In this Thesis, we develop modeling techniques that can be used to improve the energy efficiency of computer systems. 1.1 Power Consumption in Computer Systems In CMOS circuits, power consumption is broken down in dynamic and static power consumption. Dynamic power is given by the following equation [23]: P dynamic = afcv 2 (1.1) 1

10 Dynamic power is consumed due to the switching activity of the transistors. The activity factor a denotes the percentage of transistors that switch state on every cycle, and depends on the input of the circuit. C is the load capacitance, and f and V are the frequency and voltage respectively. Different power-saving techniques aim to reduce dynamic power consumption by targeting different components of the above equation. For instance, designing more compact and efficient systems leads to smaller load capacitance C, which in turn reduces power consumption. Clock-gating aims at reducing the activity factor a by cutting-off the clock at idle components, to prevent transistors switching state when they don t have to. And Dynamic Voltage and Frequency Scaling (DVFS) scales down voltage and frequency to reduce power consumption at the expense of reduced performance. Voltage and frequency should always be treated in combination, since there is a close connection between the two: for a target frequency, there is a minimum voltage that the circuit should be supplied with to ensure proper functionality. Static power is consumed by transistors due to various leakage currents. On a high level, static power is given by the following equation [23]: P static = V I leak (1.2) Equation 1.2 suggests that static power consumption can be reduced by reducing either supply voltage or the leakage current. Although DVFS can be used to reduce voltage, the voltage scaling range is significantly limited in modern processors [10], hence DVFS cannot aggressively attack the problem of static dissipation. An effective technique to minimize static power is to reduce the leakage current by shutting down parts of the processor that are not used. Recent Intel and AMD processors apply this technique by power-gating idle cores. Moreover, shutting down parts of the caches is another technique that has been extensively studied [45, 46, 47], due to the fact that caches are responsible for a big portion of the chip s total static power consumption. 1.2 Power & Energy Efficiency Metrics Nowadays, energy consumption is equally important to performance, hence computer architects and system designers often choose to trade performance for energy and power savings. Depending on design requirements, different optimization metrics can be used: Power: In some cases, system performance is sacrificed to reduce power consumption, even though this does not guarantee that the total energy consumption for a given task will be reduced. Due to the fact that power consumption directly correlates with thermal dissipation, such strategies are usually applied when there are thermal constraints in the system. Energy: When low energy consumption is the ultimate design metric, minimizing energy consumption at any performance cost can be accept- 2

11 able. For instance, microcontrollers used in embedded systems use simple designs that greatly prolong battery life at the expense of performance. Energy Delay Product (EDP): In many cases, both performance and energy consumption are critical resources in a system. For example, a mobile device (e.g. smartphones and tablet-computers) should have both good performance and long battery life, and a data-center should be fast, but not at unreasonable electricity cost. In such cases, the product of execution time and energy consumption has been proposed [23] as a metric that depicts how well a technique trades one for the other. EDP is a lower-is-better metric, giving equal weights to both performance and energy consumption. In case energy/performance is valued more, higher-order variations of this metric are used (e.g., ED 2 P). 1.3 Power and Performance Modeling for Energy Efficiency This Thesis presents modeling techniques for optimizing energy efficiency in modern Out-of-Order processors. It focuses on two very popular power-saving techniques that have been studied in the past, but mainly through empirical methods: Dynamic Voltage and Frequency Scaling (DVFS) [18, 17, 26, 44, 42] and Cache Resizing [45, 46, 47]. DVFS relies on the fact that applications cannot always take full advantage of a fast processor. This is due to memory slack: in memory-intensive applications, the CPU spends a significant amount of time stalled, waiting for the data to arrive from the main memory. In such cases, the processor s voltage and clock frequency can be scaled down to save energy at limited performance cost. Existing techniques rely on empirical models to determine how performance of an application is affected by operating the processor in different DVFS levels. In Chapter 2, we show that a simple analytical model can accurately estimate the execution time of applications under frequency scaling and can be used to find the optimal DVFS-point. To reason about and optimize energy efficiency, it is important to estimate the power consumption of a processor at runtime. Building power models for real processors and using them to apply different power-saving techniques is a well-studied topic [19, 7, 14, 33, 20]. However, previous studies only demonstrated how to generate models for a single frequency. In Chapter 3, we extend previous modeling methodologies to estimate power consumption across different frequencies. Then, in Chapter 4, we show that the performance and power models presented at Chapters 2 and 3 can be combined to create a powerful tool that allows us to understand the power behavior of different applications with respect to DVFS. This tool, called Power-Sleuth, only needs to profile an application by running it once, in a single frequency. 3

12 With our DVFS power and performance models, the tool can estimate how application behavior varies in any frequency of interest. To build our power model and evaluate its effectiveness, we demonstrate different methodologies for measuring power consumption on real processors. Apart from providing insights to understand the power behavior of different applications, our models can efficiently drive runtime optimizations. In Chapter 5, we use our models to implement Linux frequency governors running on real platforms. These governors are highly flexible, enforcing a variety of different energy-efficiency policies: they can accurately estimate the impact of frequency scaling on each of the applications running on a multicore and apply the frequency that optimizes the power, energy and performance requirements set by the user. Although the model inputs are not always obtainable through the existing performance-counter events, we show that using certain approximations leads to near-optimal DVFS decisions. Dynamic power consumption dominates total power consumption, static power, however, is not negligible, especially for vast Last-Level Caches (LLCs) that correspond to a significant part of the chip area (~50%). Dynamic Cache Resizing is a popular technique that targets at turning-off parts of the cache to save static energy consumption and improve energy efficiency. However, resizing the cache has an impact on the memory behavior of the application, which directly correlates to the application behavior in different DVFS levels. In Chapter 6, we demonstrate a unified model to estimate the impact of DVFS and LLC Resizing at the same time. We then use the model to manage core frequency and LLC size, and show that energy efficiency can be optimized by applying these techniques in a co-ordinated way. Finally, architectural simulators are powerful tools that allow researchers to fine-tune parameters and investigate research ideas that are not always applicable in existing hardware. Therefore, making simulators as realistic as possible is a significant part of research in Computer Architecture. Most simulators provide only basic DVFS support, taking shortcuts and disregarding aspects that can be important in real hardware. In Chapter 7, we demonstrate how we introduce full DVFS-support in one of the most popular full-system simulators, the gem5 simulator. Our extensions allow gem5 users to model different clock and voltage domain topologies. We also provide full Linux support by developing drivers that are compatible with the existing Linux DVFS infrastructure. Hence, default Linux frequency governors, like interactive and ondemand, can be used out-of-the-box in the simulator. 4

13 2. DVFS Performance Modeling Dynamic Voltage and Frequency Scaling (DVFS) is one of the most popular power/energy saving techniques. DVFS relies on the fact that, by reducing the clock frequency, the circuit can tolerate higher latencies and therefore voltage can also be reduced. This has a cubic impact on dynamic power consumption (Equation 1.1), while it also impacts static power consumption (Equation 1.2). Most DVFS approaches take advantage of system slack or idleness. Kaxiras and Martonosi [23] demonstrate the different levels that slack can appear. At the system-level, DVFS mechanisms take advantage of the processor being idle and scale frequency down to minimize idle periods [41, 13]. At the hardware-level, DVFS can be applied at a very fine-grained level, targeting at the slack that appears in the hardware operations [9]. Finally, at the programlevel (or program-phase-level), slack can appear due to long-latency memory operations that force the processor to stall while such operations are pending. Regardless of the level that a DVFS technique targets, the heart of every approach is to detect and exploit the slack to save energy without inordinately penalizing system performance. In Paper I, we focus on the program-level DVFS, and we propose a simple and accurate analytical model to quantify how the execution time of an application is affected by frequency scaling. To develop our model, we use a previously proposed analytical model [22, 12] that estimates performance based on different miss-events experienced by the processor. Although DVFS has been extensively studied in previous work, most approaches rely on empirical models and trial-and-error methods [17, 18, 26, 42, 28, 44]. Our analytical models, on the other hand, investigate DVFS from a different perspective and open up new opportunities for energy efficiency optimizations. Paper I focuses on developing, evaluating and using our runtime models to improve energy efficiency on a simulator. However, the simplicity of our models and the fact that DVFS is available in most modern commercial processors motivated us to use our models in real systems. Papers II, III discuss how our models can be ported in commodity processors, and demonstrate how we can use them to understand the behavior of different applications under DVFS and optimize energy efficiency. 2.1 Interval-based DVFS Performance Model The interval-based analytical model, proposed by Karkhanis and Smith [22] and further enhanced by Eyerman et. al. [12], breaks execution of a program into intervals. During the steady-state intervals, the processor executes 5

14 Instructions executed on-chip miss Steady State branch miss total cycles LLC miss Steady State Figure 2.1. The baseline analytical interval-based model. Steady-state intervals are shaped by the machine width and the program s ILP. Miss-intervals are due to miss events such as cache misses and branch mispredictions and introduce stall-cycles to the machine. instructions at a rate that is only limited by the width of the processor and the workload s Instruction Level Parallelism (ILP). Steady-state intervals are punctuated by miss-intervals, introducing stall cycles to the machine. Missintervals are introduced by various miss events, such as instruction and data cache misses and branch mispredictions. To model performance variation due to DVFS, we need to understand how different intervals are affected by frequency scaling. Figure 2.1 shows how the interval model represents the different miss events. The x-axis corresponds to time in processor cycles, while the y-axis shows the number of instructions issued per cycle. Assuming that the processor operates in a single voltage and frequency domain, the latency of on-chip events (measured in processor cycles), such as branch mispredictions and on-chip misses, is not affected by frequency scaling. This is because all processor components are fed with the same clock, therefore changing the clock speed does not change the relative speed between different components. Consequently, for a workload exhibiting only on-chip miss events, changing the clock speed does not affect the number of cycles required to execute that application. Of course, the clock period is affected, therefore execution time scales proportionally to frequency. Such applications are characterized as compute-intensive. When off-chip misses occur, however, DVFS causes a change in the relative speed between the processor and the off-chip memory, hence the total number of cycles is not constant any more. Therefore, to understand how processor performance is affected by DVFS, we need to model how the off-chip miss-intervals are affected by frequency scaling. Unlike on-chip miss events, the latency of an off-chip miss is affected by the core frequency, due to the asynchronous communication between the processor and the main memory. For example, given a processor operating at 1GHz and a main memory with 100ns access time, the latency of the main memory is 100ns 10 9 cycles sec = 100 processor cycles. If we scale the processor frequency down to 500MHz, main memory access time remains 100ns, therefore mem- 6

15 Instructions executed this area does not scale at all with frequency scaling LLC miss Steady State inelastic area ROB-fill memory latency elastic area full stall IQ drain ramp-up total cycles only this quantity scales proportionally to frequency scaling stall cycles as measured by the stall-based model (do not scale proportionally with frequency scaling) Steady State Figure 2.2. The miss-interval of an LLC-miss. Due to out-of-order execution, the processor can issue instructions under an LLC miss up to the point that all the remaining instructions depend on that miss. The different areas of the miss-interval are characterized as elastic or inelastic to frequency scaling. Instructions executed memory latency LLC miss1 Steady State y ROB-fill LLC miss2 memory latency ST1 ST2 x total cycles Steady State Figure 2.3. Overlapping LLC misses. When the first miss reaches the head of the Reorder Buffer, the processor stalls until the miss is serviced. Then, new instructions can enter the instruction window until the processor stalls again, due to the second miss reaching the head of the Reorder Buffer. When the second miss is also serviced, the processor can reach the steady-state issue-rate again. ory latency now becomes 50 processor cycles. In other words, memory latency (measured in processor cycles) scales proportionally to processor frequency. Figure 2.2 shows an off-chip miss in more detail. Once such a miss occurs, the processor keeps issuing instructions until the miss reaches the head of the Reorder Buffer (ROB). This area is called ROB-fill. At this point, no more instructions can enter the issue window, hence the issue rate starts to drop. When all the instructions left in the instruction window depend on the pending miss, the processor stalls and waits for the miss to be serviced. Only after the data has arrived from the main memory will the processor be able to execute new instructions and ramp-up to the steady-state issue rate again. As explained above, memory latency scales proportionally to frequency, but the different areas of the miss-interval are affected in different ways. Regarding 7

16 ROB-fill, it will take the same amount of cycles to fill-up the ROB regardless of the frequency, hence we say that ROB-fill is inelastic to frequency scaling. Full-stall, on the other hand, is the number of cycles that the processor spends being completely idle. Since the total memory latency scales proportionally to frequency, full-stall also changes with frequency, or it is elastic to frequency scaling. Finally, IQ-drain and ramp-up can also be elastic to frequency scaling if frequency is aggressively scaled down to the extent that full-stall is completely eliminated. Figure 2.3 shows the case that more than one misses overlap with each other. In particular, during the ROB-fill or IQ-drain areas of LLC miss1, a second miss LLC miss2 occurs. The processor first stalls because of the first miss, and after this miss is serviced, it starts issuing instructions again before it stalls due to LLC miss2 reaching the head of the ROB. When the second miss is also serviced, the instruction issue rate rises again to meet the steady-state rate. Based on the observations discussed above, in Paper I we propose two simple interval-based analytical models to estimate how performance changes between different DVFS settings. The first model, called stall-based model, makes certain simplifications and can be applied in almost every modern processor. The miss-based model, on the other hand, is more accurate, but the input required is not readily available in all processors Stall-based model The stall-based model assumes that ROB-fill is negligible, therefore the stall of an LLC miss is proportional to frequency scaling. As shown in Figure 2.2 memory_latency = ROB_ fill+ stall stall (2.1) For the multiple-misses case of Figure 2.3, we can also approximate that ST1+ST2 = y + memory_latency ROB_ fill x memory_latency (2.2) assuming that y ROB_ fill x 0. Hence, in both cases of single and overlapping misses, stalls are approximately equal to memory latency, which scales proportionally to frequency. Consequently, the total number of stalls also scales (approximately) proportionally to frequency. Assuming that executing a fixed number of instructions under frequency f takes c cycles, we can estimate the number of cycles required to execute the same instructions under frequency f /k. As explained above, non-stall cycles (c ST) remain intact with frequency scaling, while stall cycles scale in the same way that frequency does. Therefore, execution cycles in frequency f /k are approximated as c f /k = c ST + ST (2.3) k 8

17 Instructions executed LLC miss1 memory latency x ROB-fill Steady State LLC miss2 y x LLC miss3 LLC miss4 total cycles memory latency y Steady State Figure 2.4. A complex case explaining which misses are critical for DVFS, i.e., which are the misses whose miss-intervals scale with frequency. In a group of overlapping misses, only the first miss is important for DVFS performance estimation. Once this miss is serviced, the next miss that occurs indicates the start of a new group of overlapping misses. After estimating the execution cycles on the target frequency, the execution time can be easily calculated by dividing execution cycles with frequency Miss-based model The miss-based model acknowledges that it is the whole miss-interval that scales proportionally to frequency. Notice that this does not imply that ROBfill scales with frequency. As explained above, ROB-fill is inelastic to frequency scaling, but the full-stall area changes in a way that the whole missinterval scales proportionally to frequency. Moreover, the inelasticity of ROBfill has an important implication for the miss-interval of the overlapping misses. Figure 2.4 shows a complex case for overlapping misses. LLC miss1 occurs and, x cycles later, LLC miss2 overlaps with the first miss. Since LLC miss2 occurred x cycles after the first miss, it will also be serviced x cycles after the first miss is serviced. Moreover, ROB-fill does not scale with frequency, which means that LLC miss2 will always occur and be serviced x cycles after LLC miss1. Therefore, when misses overlap, only the miss interval of the first miss scales with frequency, whereas the miss interval(s) of the additional miss(es) remains intact. One might assume that the misses that should be counted for DVFS modeling are the ones that do not overlap with previous misses. This, however, is not true. In Figure 2.4, LLC miss3 occurs after LLC miss1 has been serviced, but it overlaps with LLC miss2 which is still pending. Although LLC miss3 overlaps with another miss, it initiates a new group of overlapping misses, therefore it should be accounted for DVFS. This is because, although x will not change with DVFS, the remainder of LLC miss3 miss-interval (shown as memory latency in Figure 2.4) will scale. LLC miss4 overlaps with LLC miss3 after y 9

18 cycles, hence it will always be serviced y cycles after LLC miss3 is serviced and its extra miss interval will not change with DVFS. From the example above we can determine that a miss is important for DVFS modeling (i.e., its miss-interval scales with frequency) if it initiates a new group of overlapping misses. Once such a miss occurs, the misses overlapping with that miss are not counted for DVFS, until that first miss is serviced. Then, the next miss that occurs indicates a new group of overlapping misses, even if it overlaps with pending misses from the previous group. In the literature, the name leading miss [35] has been proposed for the first miss in a group of overlapping misses. Hence, a miss that does not overlap with a leading miss is a leading miss itself. After counting the number of leading misses, we can calculate the number of cycles that scale with frequency simply by multiplying the number of leading misses with the memory latency (mem_lat leading_misses). Then, when scaling frequency from f to f /k, execution cycles can be estimated as: c f /k = c mem_lat leading_misses + mem_lat leading_misses k (2.4) 2.2 Implementing the Models in Real Processors The models shown in this chapter were conceived and evaluated using the SimpleScalar [6] simulator. Using a simulator enabled us to investigate concepts that would be otherwise hard to explore in detail. In particular, understanding how misses overlap with each other and how the stall cycles are affected by frequency scaling depending on whether they were generated by isolated or overlapping misses would not have been possible without the detailed view of a cycle-accurate simulator. However, our motivation for creating our models was to use them in real hardware, therefore we put a great effort on implementing them in commercial processors. This task is particularly challenging due to the limited selection of performance-counter events that are available in commodity processors. Up till recently, implementing the miss-based model was infeasible, due to the lack of events that monitor the overlapping of the last-level cache misses. In recent AMD processors, however, researchers have been able to implement a leading-load estimator [39]. The stall-based model, on the other hand, can be approximated using a combination of events that have been supported in the majority of processors in the past years: number of stall cycles and number of LLC misses. Papers II and III discuss the heuristics used to implement the stall-based model on an Intel Nehalem and an AMD Phenom II processor. Although both processors offer a performance-counter event to measure stall cycles, there is no event that counts stalls that are explicitly due to LLC misses. A first-order approximation is to use the total number of stalls and assume that those are mostly due to LLC misses, as these are the longest-latency 10

19 miss-events that a processor can experience. As an extra optimization step, we use the total number of LLC misses along with the average memory latency to estimate the worst-case stalls due to off-chip misses. This prevents us from erroneously classifying other stalls, such as on-chip misses or long-latency operations (e.g. DIV), as LLC-miss stalls, in compute-intensive applications. This heuristic, despite its simplicity, achieves good accuracy at predicting performance across different frequencies, leading to near-optimal runtime DVFS decisions (Paper II). 2.3 Model Accuracy The models presented in this chapter were evaluated both in a simulator (Paper I) and real hardware (Paper III). Running the SPEC2000 benchmark suite in the SimpleScalar simulator, the stall-based model yields an average error of 2.1% when executing at f max and predicting for f max /4 and vice versa. The maximum error, however, can be up to 20%, due to the fact that this model disregards the existence of the ROB-fill area. The miss-based model, on the other hand, achieves an impressive 0.2% average error for the same frequency range, while maximum error is less than 5%. In real hardware, implementing the miss-based model was infeasible at the time due to the lack of appropriate performance-counter events. Regarding the stall-based model, we could only implement an approximation of the model due to the lack of an event that would count stalls explicitly due to LLC misses in Intel and AMD processors. However, the approximation discussed in Section 2.2 yields good accuracy for practical purposes. Running the SPEC2006 benchmark suite on Intel Nehalem processor, we could estimate execution time across maximum (2.66GHz) and minimum (1.6GHz) frequency with an average error of less than 5%. 2.4 Model Extensions Our DVFS models have served as an inspiration for a significant amount of related work and have been extended in various interesting directions. First of all, two more research groups, working independently but concurrently with us, have proposed models that are similar to our miss-based model [11, 35]. Rountree et. al. [35] proposed the term leading loads for the loads that initiate a group of overlapping misses, i.e., the misses that our miss-based model identifies as critical for DVFS performance estimation. Miftakhutdinov et. al [29] extended our model to account for memory systems with non-fixed memory latency. Nath et. al. [30] adapted our model to estimate performance variation of GPGPU workloads under DVFS. Finally, Akram et. al. [3] extended previ- 11

20 ous work in the field to model DVFS performance for managed multi-threaded applications. 12

21 3. DVFS Power Modeling Estimating energy and power consumption of a running application is crucial both for (i) understanding the behavior of an application, and (ii) optimizing its energy efficiency through different runtime techniques. Although an abundance of different power models have been proposed, they can be divided in two main categories [40]: Bottom-up power models [5, 27, 38] use theoretical models to estimate the power consumption of different parts of a system, based on characteristics such as node technology, circuit layout and design parameters. These models tend to be highly-configurable, as different parameters are simply fed into the theoretical models, but their accuracy is questionable, with the estimation error often exceeding 20% [34, 43]. The goal of these models is to at least provide reliable relative power estimation, to determine whether certain modifications have a positive/negative impact on energy and power consumption. Top-down models [21, 19, 7, 14], on the other hand, employ machine learning theory, treating the processor as a black box, and use power measurements obtained from a real system while running a set of test applications to create a regression power-model. Although these models are only useful for the hardware that they were trained on, they are highly accurate and can be used in real systems to drive power and energy optimization techniques. The challenge in this class of models is to identify the processor events that best correlate with processor power consumption, as well as to select a good benchmark training-set to build the regression model. Top-down models are typically built by running a set of applications in a real processor and measuring the power consumption for each of them. Moreover, different performance counter events are monitored to represent the activity of the processor during the execution of the applications, such as the number of instructions executed, number of accesses and misses in the different caches of the system and the type of executed instructions. Then, a regression model is built, assuming that power consumption is a function of the selected performance counter-events, and the model parameters are acquired by fitting the model to the observed power and counter measurements. The accuracy of the model depends on the selection of events to be monitored, as well as the diversity of the applications used to train the model. In Paper III we extend previous work by building a frequency-independent regression power-model that only needs to be trained in a single frequency and 13

22 Figure 3.1. Measuring Intel Nehalem power consumption from the voltage regulator. Using the motherboard schematics, we were able to detect the pins providing the voltage and current that is fed to the processor. Then, by attaching cables and monitoring them with an oscilloscope, we were able to measure and log power consumption of the processor. can then be used to estimate power consumption in any frequency. Moreover, we show that by combining the power model with the DVFS performance model of Chapter 2, power consumption of an application can be estimated for any frequency, regardless of the frequency that the application was profiled at. To train and evaluate our power model, we first showcase different methods for obtaining power measurements of applications running in real processors. 3.1 Measuring Power Consumption Measuring power consumption is an important part of developing a linearregression power model. Moreover, it is necessary for evaluating the benefits achieved when using our performance and power models at runtime to optimize energy efficiency (Chapter 5). In our work, we have used two different approaches to measure power consumption on real hardware From the voltage regulator Measuring power consumption from the voltage regulator yields the most accurate results, since this is the closest to the processor component that we can measure power. Other approaches (wall-power, power from the motherboard ATX connector) introduce noise due to the power consumption of components other than the processor. The disadvantage, however, is that the voltage regula- 14

23 tor is not always easily accessible on the motherboard, and detailed schematics need to be available to determine where to install the measuring probes. Moreover, adding helper electronic devices (e.g. shunt resistors, inductive current sensors) is infeasible, and measuring power relies on the voltage regulator s self-monitoring capabilities. Paper II presents in more detail the methodology for measuring power consumption on Intel Nehalem and AMD Phenom II from the voltage regulator pins From the motherboard ATX connector An alternative, more generic approach to measure power consumption is to monitor the power that is fed from the power supply to the motherboard through the ATX connectors. To do so, we designed the power-measuring device shown in Figure 3.2a. The device consists of a PCB that is installed between the power supply and the motherboard of the system that we want to measure. Each of the power-supply voltage rails (ATX 3.3V, 5V and 12V, as well as the separate 12V rail supplying the processor), is driven through sense resistors R s on the PCB. Current flowing through the resistor affects the voltage across the resistor. This voltage is measured using fine-grained sensors [8], the output of which is sampled using an A/D device. Eventually, the voltage read by A/D is proportional to the current flowing through each voltage rail. Using test currents for each of the sensors, we were able to calibrate our device and determine the exact relationship between current flowing through the sensors and voltage read by A/D. Then, our device can be plugged into a system as shown in Figure 3.2b to measure its power consumption. This device was used in Paper III to develop and evaluate a linear-regression power model for Intel Nehalem processor. 3.2 A Frequency-Independent Power Model Previous work [21, 19, 7, 14] has focused on how to select a representative set of (i) performance-counter events, and (ii) training benchmarks to build regression power-models. However, these works have not investigated how such models can be voltage and frequency independent. Consequently, different model parameters have to be derived for different frequencies by training the power model in each of them. This is because, in most cases, it is assumed that energy consumption is a linear function of different event counts. Equivalently, power consumption is a linear function of different event rates. Energy = coe f f 1 event 1 + coe f f 2 event Power = coe f f 1 event_rate 1 + coe f f 2 event_rate (3.1) where the coefficients are obtained by fitting the equation to the observed values of the events and power samples measured. This power-model structure, 15

24 Logging System AD Power Supply PCB Custom Sensors I m R s I m V i 1 2 Sensor 3 I out 1K Target System (a) Design of our power-measuring device. (b) Connecting the device on a real system. Figure 3.2. Measuring power consumption from the ATX connector. (a) shows the design of our device, which includes a PCB board with current sensors attached for every ATX connector cable. The device is installed between the power supply and the target-machine motherboard. An A/D device is used to read the outputs of the current sensors, and a logging machine collects and reports the samples. (b) shows how the device is attached on a real system. 16

25 however, does not identify the role of voltage and frequency in the power consumption, hence a different power model has to be built for every V-f configuration. In Paper III, we propose a more flexible power model that can be built by selecting a more detailed model formulation. As seen in Chapter 1, power consumption is given by the following equation: Power = f C ef f V 2 + P static In this equation, it is only C ef f (activity_ factor load_capacitance) that is non-deterministic, as f,v are controlled by the user/operating system, and P static does not depend on the application running on the processor. Therefore, to estimate the power consumption of an application, we only need to estimate effective capacitance. The source of dynamic power consumption is the switching activity of the various node-capacitances that make up a processor. On every cycle, some of these node-capacitances switch state, leading to dynamic power consumption. Hence, it is intuitive to assume that C ef f, which is the average capacitance that switches state on every cycle, correlates with the number of different events that occur on every processor cycle. Therefore, we propose the following power-model formulation: C ef f = n i=1 (coe f f i event i cycles )+C clock Power = f C ef f V 2 + P static (V ) (3.2) Static power consumption P static is a function of voltage and temperature, and can be measured off-line when the processor is idle (hence dynamic power is 0). To obtain the model coefficients, we run a set of benchmarks in a single voltage and frequency and measure power consumption and a set of events. To determine C ef f, we subtract P static from total power consumption and then divide with f V 2. Then, the events measured for each benchmark are divided with the number of cycles to obtain the average number of events occurring on every processor cycle. Finally, the computed event rates and C ef f values are used to determine the model coefficients through linear regression. In Paper III, we use events that have been previously proposed in the literature (instructions executed, floating point instructions, L2 accesses/misses, branch mispredictions, resource stalls), but different events can be used with the same model formulation. The advantage of this power-model is that it decouples voltage and frequency from the power model coefficients. The model is only trained once, at a single voltage and frequency, and the model parameters can be used to estimate power at any voltage and frequency of interest. This is because the model correlates event rates with the processor activity (C ef f ), which is decoupled from voltage and frequency. In Paper III we build such a model for 17

26 the Intel Nehalem processor. Our experimental findings show that our model can estimate power consumption across different frequencies with an average error of less than 4% for the SPEC2006 benchmark suite. Recently, Walker et. al. [40] proposed a similar power model formulation by decoupling voltage, frequency and static power and automating the process of curve fitting to create power models for an ARM SoC. 18

27 4. Understanding the Power Behavior of Applications Although power is a critical design constraint, there is a lack of profiling tools that provide information about the applications power behavior. Simply measuring average power consumption is not sufficient, since the behavior can differ significantly within an application, if the application exhibits different phases throughout its execution. Moreover, changing the voltage and frequency of the processor can have a significant impact on power and performance, which may vary depending on the phase of execution. Figure 4.1 shows the execution time, energy and power consumption of two different phases, X and Y, of the gcc/166 application from SPEC2006 benchmark suite, when executed in frequencies f max (2.66GHz) and f min (1.6GHz). Regarding execution time, whether X or Y is the longest running phase depends on the frequency. At maximum frequency, X takes longer to execute than Y. When frequency is scaled down, however, the execution time of Y increases substantially, as opposed to X which only experiences a small overhead. This is because Y is a compute-bound phase, while X involves a significant amount of off-chip communication and is therefore memory-bound. Regarding power consumption, Y is more power-hungry than X in maximum frequency, which is reasonable since compute-bound phases make better utilization of the processor and therefore consume more power. When it comes to energy, however, due to X s longer execution time, both phases consume approximately the same amount of energy at f max. Therefore, it is obvious that Time (s) Power (W) Energy (J) f max f min f max f min f max f min X Y Figure 4.1. Execution time, power and energy consumption in maximum ( f max ) and minimum ( f min ) frequency for two phases (X, Y) of gcc/166 application from the SPEC2006 benchmark suite. 19

28 execution time, power and energy are equally important to characterize the behavior of the different application phases. Paper III discusses how the DVFS models presented in the previous chapters can be utilized to understand and estimate the behavior of an application in different DVFS configurations. Moreover, it employs phase detection to correlate behavior with different application phases, and it proposes a phasebased sampling methodology to monitor performance counter events when their number exceeds the hardware capabilities. The tool presented in Paper III, called Power-Sleuth, only needs to profile an application once, in a single frequency. Then, it can estimate the power and performance behavior of each individual phase in different V-f configurations. 4.1 Phase Detection Applications can have very distinct behavior throughout their execution, or, in other words, they can exhibit different phases. To detect such phases online, we use the ScarPhase [36] library. ScarPhase is characterized by low overhead (less than 2%), and utilizes the concept of Basic Block Vectors (BBV) [37] to detect code regions that make up different execution phases. In particular, execution is divided into intervals, during which branch instructions are sampled using Intel Precise Event Based Sampling (PEBS). The branch addresses are hashed into a vector, the entries of which show how many times the corresponding branches were sampled in the execution of the program. Hence, every interval is characterized by a vector. Similar intervals/vectors are clustered together and form a program phase, and they are characterized by similar behavior in terms of different metrics (IPC, cache misses, power etc.). 4.2 Utilizing Program Phases to Aid Performance Event Monitoring To better understand the processor behavior, many times we need to monitor various performance-counter events, the number of which often exceeds the number of events that can be concurrently monitored by the hardware. Of course, one can run an application several times and monitor different events at a time. This, however, is not applicable for run-time optimizations, and it introduces significant overhead in profiling tools. In Power-Sleuth, phase detection and performance and power modeling require a total of 9 performancecounter events. This number can easily increase if more events are involved in the linear-regression power model to achieve higher accuracy. Phase information, however, can be used to efficiently sample a subset of events at a time, and interpolate the event values for the missing samples. 20

Power-Sleuth: A Tool for Investigating your Program s Power Behavior

Power-Sleuth: A Tool for Investigating your Program s Power Behavior Power-Sleuth: A Tool for Investigating your Program s Power Behavior Vasileios Spiliopoulos, Andreas Sembrant, Stefanos Kaxiras Uppsala University, Department of Information Technology P.O. Box 337, SE-751

More information

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Ramon Canal NCD Master MIRI. NCD Master MIRI 1 Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/

More information

An Overview of Static Power Dissipation

An Overview of Static Power Dissipation An Overview of Static Power Dissipation Jayanth Srinivasan 1 Introduction Power consumption is an increasingly important issue in general purpose processors, particularly in the mobile computing segment.

More information

Highly Efficient Ultra-Compact Isolated DC-DC Converter with Fully Integrated Active Clamping H-Bridge and Synchronous Rectifier

Highly Efficient Ultra-Compact Isolated DC-DC Converter with Fully Integrated Active Clamping H-Bridge and Synchronous Rectifier Highly Efficient Ultra-Compact Isolated DC-DC Converter with Fully Integrated Active Clamping H-Bridge and Synchronous Rectifier JAN DOUTRELOIGNE Center for Microsystems Technology (CMST) Ghent University

More information

Power Management in Multicore Processors through Clustered DVFS

Power Management in Multicore Processors through Clustered DVFS Power Management in Multicore Processors through Clustered DVFS A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Tejaswini Kolpe IN PARTIAL FULFILLMENT OF THE

More information

Evaluation of CPU Frequency Transition Latency

Evaluation of CPU Frequency Transition Latency Noname manuscript No. (will be inserted by the editor) Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz Alexandre Laurent Benoît Pradelle William Jalby Abstract Dynamic Voltage and Frequency

More information

UNIT-II LOW POWER VLSI DESIGN APPROACHES

UNIT-II LOW POWER VLSI DESIGN APPROACHES UNIT-II LOW POWER VLSI DESIGN APPROACHES Low power Design through Voltage Scaling: The switching power dissipation in CMOS digital integrated circuits is a strong function of the power supply voltage.

More information

DYNAMIC VOLTAGE FREQUENCY SCALING (DVFS) FOR MICROPROCESSORS POWER AND ENERGY REDUCTION

DYNAMIC VOLTAGE FREQUENCY SCALING (DVFS) FOR MICROPROCESSORS POWER AND ENERGY REDUCTION DYNAMIC VOLTAGE FREQUENCY SCALING (DVFS) FOR MICROPROCESSORS POWER AND ENERGY REDUCTION Diary R. Suleiman Muhammed A. Ibrahim Ibrahim I. Hamarash e-mail: diariy@engineer.com e-mail: ibrahimm@itu.edu.tr

More information

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS The major design challenges of ASIC design consist of microscopic issues and macroscopic issues [1]. The microscopic issues are ultra-high

More information

Domino Static Gates Final Design Report

Domino Static Gates Final Design Report Domino Static Gates Final Design Report Krishna Santhanam bstract Static circuit gates are the standard circuit devices used to build the major parts of digital circuits. Dynamic gates, such as domino

More information

Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators

Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators Jan Doutreloigne Abstract This paper describes two methods for the reduction of the peak

More information

Final Report: DBmbench

Final Report: DBmbench 18-741 Final Report: DBmbench Yan Ke (yke@cs.cmu.edu) Justin Weisz (jweisz@cs.cmu.edu) Dec. 8, 2006 1 Introduction Conventional database benchmarks, such as the TPC-C and TPC-H, are extremely computationally

More information

Design of an Integrated OLED Driver for a Modular Large-Area Lighting System

Design of an Integrated OLED Driver for a Modular Large-Area Lighting System Design of an Integrated OLED Driver for a Modular Large-Area Lighting System JAN DOUTRELOIGNE, ANN MONTÉ, JINDRICH WINDELS Center for Microsystems Technology (CMST) Ghent University IMEC Technologiepark

More information

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Ho Young Kim, Robert Maxwell, Ankil Patel, Byeong Kil Lee Abstract The purpose of this study is to analyze and compare the

More information

Exploring Heterogeneity within a Core for Improved Power Efficiency

Exploring Heterogeneity within a Core for Improved Power Efficiency Computer Engineering Exploring Heterogeneity within a Core for Improved Power Efficiency Sudarshan Srinivasan Nithesh Kurella Israel Koren Sandip Kundu May 2, 215 CE Tech Report # 6 Available at http://www.eng.biu.ac.il/segalla/computer-engineering-tech-reports/

More information

Performance Evaluation of Recently Proposed Cache Replacement Policies

Performance Evaluation of Recently Proposed Cache Replacement Policies University of Jordan Computer Engineering Department Performance Evaluation of Recently Proposed Cache Replacement Policies CPE 731: Advanced Computer Architecture Dr. Gheith Abandah Asma Abdelkarim January

More information

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Presentation of UniServer Horizon 2020 European project findings: X-Gene server chips, voltage-noise characterization, high-bandwidth voltage measurements,

More information

A Static Power Model for Architects

A Static Power Model for Architects A Static Power Model for Architects J. Adam Butts and Guri Sohi University of Wisconsin-Madison {butts,sohi}@cs.wisc.edu 33rd International Symposium on Microarchitecture Monterey, California December,

More information

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Performance Metrics http://www.yildiz.edu.tr/~naydin 1 2 Objectives How can we meaningfully measure and compare

More information

CHAPTER 3 NEW SLEEPY- PASS GATE

CHAPTER 3 NEW SLEEPY- PASS GATE 56 CHAPTER 3 NEW SLEEPY- PASS GATE 3.1 INTRODUCTION A circuit level design technique is presented in this chapter to reduce the overall leakage power in conventional CMOS cells. The new leakage po leepy-

More information

FIFO WITH OFFSETS HIGH SCHEDULABILITY WITH LOW OVERHEADS. RTAS 18 April 13, Björn Brandenburg

FIFO WITH OFFSETS HIGH SCHEDULABILITY WITH LOW OVERHEADS. RTAS 18 April 13, Björn Brandenburg FIFO WITH OFFSETS HIGH SCHEDULABILITY WITH LOW OVERHEADS RTAS 18 April 13, 2018 Mitra Nasri Rob Davis Björn Brandenburg FIFO SCHEDULING First-In-First-Out (FIFO) scheduling extremely simple very low overheads

More information

CHAPTER 7 HARDWARE IMPLEMENTATION

CHAPTER 7 HARDWARE IMPLEMENTATION 168 CHAPTER 7 HARDWARE IMPLEMENTATION 7.1 OVERVIEW In the previous chapters discussed about the design and simulation of Discrete controller for ZVS Buck, Interleaved Boost, Buck-Boost, Double Frequency

More information

Static Energy Reduction Techniques in Microprocessor Caches

Static Energy Reduction Techniques in Microprocessor Caches Static Energy Reduction Techniques in Microprocessor Caches Heather Hanson, Stephen W. Keckler, Doug Burger Computer Architecture and Technology Laboratory Department of Computer Sciences Tech Report TR2001-18

More information

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors Chapter 16 - Instruction-Level Parallelism and Superscalar Processors Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 16 - Superscalar Processors 1 / 78 Table of Contents I 1 Overview

More information

An Energy Conservation DVFS Algorithm for the Android Operating System

An Energy Conservation DVFS Algorithm for the Android Operating System Volume 1, Number 1, December 2010 Journal of Convergence An Energy Conservation DVFS Algorithm for the Android Operating System Wen-Yew Liang* and Po-Ting Lai Department of Computer Science and Information

More information

Interconnect-Power Dissipation in a Microprocessor

Interconnect-Power Dissipation in a Microprocessor 4/2/2004 Interconnect-Power Dissipation in a Microprocessor N. Magen, A. Kolodny, U. Weiser, N. Shamir Intel corporation Technion - Israel Institute of Technology 4/2/2004 2 Interconnect-Power Definition

More information

THERE is a growing need for high-performance and. Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment

THERE is a growing need for high-performance and. Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment 1014 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 7, JULY 2005 Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment Dongwoo Lee, Student

More information

EDA Challenges for Low Power Design. Anand Iyer, Cadence Design Systems

EDA Challenges for Low Power Design. Anand Iyer, Cadence Design Systems EDA Challenges for Low Power Design Anand Iyer, Cadence Design Systems Agenda Introduction ti LP techniques in detail Challenges to low power techniques Guidelines for choosing various techniques Why is

More information

Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence

Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence 778 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 4, APRIL 2018 Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence

More information

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir Parallel Computing 2020: Preparing for the Post-Moore Era Marc Snir THE (CMOS) WORLD IS ENDING NEXT DECADE So says the International Technology Roadmap for Semiconductors (ITRS) 2 End of CMOS? IN THE LONG

More information

CMOS circuits and technology limits

CMOS circuits and technology limits Section I CMOS circuits and technology limits 1 Energy efficiency limits of digital circuits based on CMOS transistors Elad Alon 1.1 Overview Over the past several decades, CMOS (complementary metal oxide

More information

Experimental Evaluation of the MSP430 Microcontroller Power Requirements

Experimental Evaluation of the MSP430 Microcontroller Power Requirements EUROCON 7 The International Conference on Computer as a Tool Warsaw, September 9- Experimental Evaluation of the MSP Microcontroller Power Requirements Karel Dudacek *, Vlastimil Vavricka * * University

More information

Research Statement. Sorin Cotofana

Research Statement. Sorin Cotofana Research Statement Sorin Cotofana Over the years I ve been involved in computer engineering topics varying from computer aided design to computer architecture, logic design, and implementation. In the

More information

Instruction Scheduling for Low Power Dissipation in High Performance Microprocessors

Instruction Scheduling for Low Power Dissipation in High Performance Microprocessors Instruction Scheduling for Low Power Dissipation in High Performance Microprocessors Abstract Mark C. Toburen Thomas M. Conte Department of Electrical and Computer Engineering North Carolina State University

More information

Low Power Design Methods: Design Flows and Kits

Low Power Design Methods: Design Flows and Kits JOINT ADVANCED STUDENT SCHOOL 2011, Moscow Low Power Design Methods: Design Flows and Kits Reported by Shushanik Karapetyan Synopsys Armenia Educational Department State Engineering University of Armenia

More information

A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication

A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication Peggy B. McGee, Melinda Y. Agyekum, Moustafa M. Mohamed and Steven M. Nowick {pmcgee, melinda, mmohamed,

More information

CS4617 Computer Architecture

CS4617 Computer Architecture 1/26 CS4617 Computer Architecture Lecture 2 Dr J Vaughan September 10, 2014 2/26 Amdahl s Law Speedup = Execution time for entire task without using enhancement Execution time for entire task using enhancement

More information

A Survey of the Low Power Design Techniques at the Circuit Level

A Survey of the Low Power Design Techniques at the Circuit Level A Survey of the Low Power Design Techniques at the Circuit Level Hari Krishna B Assistant Professor, Department of Electronics and Communication Engineering, Vagdevi Engineering College, Warangal, India

More information

Powering Automotive Cockpit Electronics

Powering Automotive Cockpit Electronics White Paper Powering Automotive Cockpit Electronics Introduction The growth of automotive cockpit electronics has exploded over the past decade. Previously, self-contained systems such as steering, braking,

More information

Low-Power Digital CMOS Design: A Survey

Low-Power Digital CMOS Design: A Survey Low-Power Digital CMOS Design: A Survey Krister Landernäs June 4, 2005 Department of Computer Science and Electronics, Mälardalen University Abstract The aim of this document is to provide the reader with

More information

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1 Design Of Low Power Approximate Mirror Adder Sasikala.M 1, Dr.G.K.D.Prasanna Venkatesan 2 ME VLSI student 1, Vice Principal, Professor and Head/ECE 2 PGP college of Engineering and Technology Nammakkal,

More information

Integrated Power Delivery for High Performance Server Based Microprocessors

Integrated Power Delivery for High Performance Server Based Microprocessors Integrated Power Delivery for High Performance Server Based Microprocessors J. Ted DiBene II, Ph.D. Intel, Dupont-WA International Workshop on Power Supply on Chip, Cork, Ireland, Sept. 24-26 Slide 1 Legal

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 2 1.1 MOTIVATION FOR LOW POWER CIRCUIT DESIGN Low power circuit design has emerged as a principal theme in today s electronics industry. In the past, major concerns among researchers

More information

DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers

DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers Muhammad Nummer and Manoj Sachdev University of Waterloo, Ontario, Canada mnummer@vlsi.uwaterloo.ca, msachdev@ece.uwaterloo.ca

More information

EE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004

EE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004 EE 382C EMBEDDED SOFTWARE SYSTEMS Literature Survey Report Characterization of Embedded Workloads Ajay Joshi March 30, 2004 ABSTRACT Security applications are a class of emerging workloads that will play

More information

Logic Analyzer Probing Techniques for High-Speed Digital Systems

Logic Analyzer Probing Techniques for High-Speed Digital Systems DesignCon 2003 High-Performance System Design Conference Logic Analyzer Probing Techniques for High-Speed Digital Systems Brock J. LaMeres Agilent Technologies Abstract Digital systems are turning out

More information

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER K. RAMAMOORTHY 1 T. CHELLADURAI 2 V. MANIKANDAN 3 1 Department of Electronics and Communication

More information

System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching Regulators

System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching Regulators System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching s Wonyoung Kim, Meeta S. Gupta, Gu-Yeon Wei and David Brooks School of Engineering and Applied Sciences, Harvard University, 33 Oxford

More information

CS Computer Architecture Spring Lecture 04: Understanding Performance

CS Computer Architecture Spring Lecture 04: Understanding Performance CS 35101 Computer Architecture Spring 2008 Lecture 04: Understanding Performance Taken from Mary Jane Irwin (www.cse.psu.edu/~mji) and Kevin Schaffer [Adapted from Computer Organization and Design, Patterson

More information

A Novel Low-Power Scan Design Technique Using Supply Gating

A Novel Low-Power Scan Design Technique Using Supply Gating A Novel Low-Power Scan Design Technique Using Supply Gating S. Bhunia, H. Mahmoodi, S. Mukhopadhyay, D. Ghosh, and K. Roy School of Electrical and Computer Engineering, Purdue University, West Lafayette,

More information

Getting the Most From Your Portable DC/DC Converter: How To Maximize Output Current For Buck And Boost Circuits

Getting the Most From Your Portable DC/DC Converter: How To Maximize Output Current For Buck And Boost Circuits Getting the Most From Your Portable DC/DC Converter: How To Maximize Output Current For Buck And Boost Circuits Upal Sengupta, Texas nstruments ABSTRACT Portable product design requires that power supply

More information

Analysis of Dynamic Power Management on Multi-Core Processors

Analysis of Dynamic Power Management on Multi-Core Processors Analysis of Dynamic Power Management on Multi-Core Processors W. Lloyd Bircher and Lizy K. John Laboratory for Computer Architecture Department of Electrical and Computer Engineering The University of

More information

Research in Support of the Die / Package Interface

Research in Support of the Die / Package Interface Research in Support of the Die / Package Interface Introduction As the microelectronics industry continues to scale down CMOS in accordance with Moore s Law and the ITRS roadmap, the minimum feature size

More information

Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence

Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence Katayoun Neshatpour George Mason University kneshatp@gmu.edu Amin Khajeh Broadcom Corporation amink@broadcom.com Houman Homayoun

More information

On the Rules of Low-Power Design

On the Rules of Low-Power Design On the Rules of Low-Power Design (and Why You Should Break Them) Prof. Todd Austin University of Michigan austin@umich.edu A long time ago, in a not so far away place The Rules of Low-Power Design P =

More information

Ultra Low Power VLSI Design: A Review

Ultra Low Power VLSI Design: A Review International Journal of Emerging Engineering Research and Technology Volume 4, Issue 3, March 2016, PP 11-18 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Ultra Low Power VLSI Design: A Review G.Bharathi

More information

In this lecture, we will first examine practical digital signals. Then we will discuss the timing constraints in digital systems.

In this lecture, we will first examine practical digital signals. Then we will discuss the timing constraints in digital systems. 1 In this lecture, we will first examine practical digital signals. Then we will discuss the timing constraints in digital systems. The important concepts are related to setup and hold times of registers

More information

Chapter 1 Introduction

Chapter 1 Introduction Chapter 1 Introduction 1.1 Introduction There are many possible facts because of which the power efficiency is becoming important consideration. The most portable systems used in recent era, which are

More information

Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors

Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors STIJN EYERMAN and LIEVEN EECKHOUT Ghent University A thread executing on a simultaneous multithreading (SMT) processor

More information

SURVEY AND EVALUATION OF LOW-POWER FULL-ADDER CELLS

SURVEY AND EVALUATION OF LOW-POWER FULL-ADDER CELLS SURVEY ND EVLUTION OF LOW-POWER FULL-DDER CELLS hmed Sayed and Hussain l-saad Department of Electrical & Computer Engineering University of California Davis, C, U.S.. STRCT In this paper, we survey various

More information

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling EE241 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolic Lecture 15 Low-Power Design: Supply Voltage Scaling Announcements Homework #2 due today Midterm project reports due next Thursday

More information

Fast Placement Optimization of Power Supply Pads

Fast Placement Optimization of Power Supply Pads Fast Placement Optimization of Power Supply Pads Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of Illinois at Urbana-Champaign

More information

Multiple Clock and Voltage Domains for Chip Multi Processors

Multiple Clock and Voltage Domains for Chip Multi Processors Multiple Clock and Voltage Domains for Chip Multi Processors Efraim Rotem- Intel Corporation Israel Avi Mendelson- Microsoft R&D Israel Ran Ginosar- Technion Israel institute of Technology Uri Weiser-

More information

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks Recently, consensus based distributed estimation has attracted considerable attention from various fields to estimate deterministic

More information

Power Consumption and Management for LatticeECP3 Devices

Power Consumption and Management for LatticeECP3 Devices February 2012 Introduction Technical Note TN1181 A key requirement for designers using FPGA devices is the ability to calculate the power dissipation of a particular device used on a board. LatticeECP3

More information

Design of a Folded Cascode Operational Amplifier in a 1.2 Micron Silicon-Carbide CMOS Process

Design of a Folded Cascode Operational Amplifier in a 1.2 Micron Silicon-Carbide CMOS Process University of Arkansas, Fayetteville ScholarWorks@UARK Electrical Engineering Undergraduate Honors Theses Electrical Engineering 5-2017 Design of a Folded Cascode Operational Amplifier in a 1.2 Micron

More information

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis N. Banerjee, A. Raychowdhury, S. Bhunia, H. Mahmoodi, and K. Roy School of Electrical and Computer Engineering, Purdue University,

More information

Power Modeling and Characterization of Computing Devices: A Survey. Contents

Power Modeling and Characterization of Computing Devices: A Survey. Contents Foundations and Trends R in Electronic Design Automation Vol. 6, No. 2 (2012) 121 216 c 2012 S. Reda and A. N. Nowroz DOI: 10.1561/1000000022 Power Modeling and Characterization of Computing Devices: A

More information

Design of Pipeline Analog to Digital Converter

Design of Pipeline Analog to Digital Converter Design of Pipeline Analog to Digital Converter Vivek Tripathi, Chandrajit Debnath, Rakesh Malik STMicroelectronics The pipeline analog-to-digital converter (ADC) architecture is the most popular topology

More information

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng.

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng. MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng., UCLA - http://nanocad.ee.ucla.edu/ 1 Outline Introduction

More information

CIRCUIT AND SYSTEM LEVEL DESIGN OPTIMIZATION FOR POWER DELIVERY AND MANAGEMENT. A Dissertation TONG XU

CIRCUIT AND SYSTEM LEVEL DESIGN OPTIMIZATION FOR POWER DELIVERY AND MANAGEMENT. A Dissertation TONG XU CIRCUIT AND SYSTEM LEVEL DESIGN OPTIMIZATION FOR POWER DELIVERY AND MANAGEMENT A Dissertation by TONG XU Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial

More information

Cherry Picking: Exploiting Process Variations in the Dark Silicon Era

Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Siddharth Garg University of Waterloo Co-authors: Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu # Transistors Power/Dark

More information

APPLICATION NOTE 3166 Source Resistance: The Efficiency Killer in DC-DC Converter Circuits

APPLICATION NOTE 3166 Source Resistance: The Efficiency Killer in DC-DC Converter Circuits Maxim > Design Support > Technical Documents > Application Notes > Battery Management > APP 3166 Maxim > Design Support > Technical Documents > Application Notes > Power-Supply Circuits > APP 3166 Keywords:

More information

Design Techniques for Fully Integrated Switched- Capacitor Voltage Regulators

Design Techniques for Fully Integrated Switched- Capacitor Voltage Regulators Design Techniques for Fully Integrated Switched- Capacitor Voltage Regulators Hanh-Phuc Le Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2015-21

More information

POWER GATING. Power-gating parameters

POWER GATING. Power-gating parameters POWER GATING Power Gating is effective for reducing leakage power [3]. Power gating is the technique wherein circuit blocks that are not in use are temporarily turned off to reduce the overall leakage

More information

High Speed Digital Systems Require Advanced Probing Techniques for Logic Analyzer Debug

High Speed Digital Systems Require Advanced Probing Techniques for Logic Analyzer Debug JEDEX 2003 Memory Futures (Track 2) High Speed Digital Systems Require Advanced Probing Techniques for Logic Analyzer Debug Brock J. LaMeres Agilent Technologies Abstract Digital systems are turning out

More information

RECENT technology trends have lead to an increase in

RECENT technology trends have lead to an increase in IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004 1581 Noise Analysis Methodology for Partially Depleted SOI Circuits Mini Nanua and David Blaauw Abstract In partially depleted silicon-on-insulator

More information

Testing Power Sources for Stability

Testing Power Sources for Stability Keywords Venable, frequency response analyzer, oscillator, power source, stability testing, feedback loop, error amplifier compensation, impedance, output voltage, transfer function, gain crossover, bode

More information

Active Decap Design Considerations for Optimal Supply Noise Reduction

Active Decap Design Considerations for Optimal Supply Noise Reduction Active Decap Design Considerations for Optimal Supply Noise Reduction Xiongfei Meng and Resve Saleh Dept. of ECE, University of British Columbia, 356 Main Mall, Vancouver, BC, V6T Z4, Canada E-mail: {xmeng,

More information

Embedded Systems. 9. Power and Energy. Lothar Thiele. Computer Engineering and Networks Laboratory

Embedded Systems. 9. Power and Energy. Lothar Thiele. Computer Engineering and Networks Laboratory Embedded Systems 9. Power and Energy Lothar Thiele Computer Engineering and Networks Laboratory General Remarks 9 2 Power and Energy Consumption Statements that are true since a decade or longer: Power

More information

ADVANCED EMBEDDED MONITORING SYSTEM FOR ELECTROMAGNETIC RADIATION

ADVANCED EMBEDDED MONITORING SYSTEM FOR ELECTROMAGNETIC RADIATION 98 Chapter-5 ADVANCED EMBEDDED MONITORING SYSTEM FOR ELECTROMAGNETIC RADIATION 99 CHAPTER-5 Chapter 5: ADVANCED EMBEDDED MONITORING SYSTEM FOR ELECTROMAGNETIC RADIATION S.No Name of the Sub-Title Page

More information

Low Power, Area Efficient FinFET Circuit Design

Low Power, Area Efficient FinFET Circuit Design Low Power, Area Efficient FinFET Circuit Design Michael C. Wang, Princeton University Abstract FinFET, which is a double-gate field effect transistor (DGFET), is more versatile than traditional single-gate

More information

Best Instruction Per Cycle Formula >>>CLICK HERE<<<

Best Instruction Per Cycle Formula >>>CLICK HERE<<< Best Instruction Per Cycle Formula 6 Performance tuning, 7 Perceived performance, 8 Performance Equation, 9 See also is the average instructions per cycle (IPC) for this benchmark. Even. Click Card to

More information

Architecting Systems of the Future, page 1

Architecting Systems of the Future, page 1 Architecting Systems of the Future featuring Eric Werner interviewed by Suzanne Miller ---------------------------------------------------------------------------------------------Suzanne Miller: Welcome

More information

Leading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005]

Leading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005] Leading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005] AMD s drive to 64-bit processors surprised everyone with its speed, even as detractors commented

More information

Energy Reduction of Ultra-Low Voltage VLSI Circuits by Digit-Serial Architectures

Energy Reduction of Ultra-Low Voltage VLSI Circuits by Digit-Serial Architectures Energy Reduction of Ultra-Low Voltage VLSI Circuits by Digit-Serial Architectures Muhammad Umar Karim Khan Smart Sensor Architecture Lab, KAIST Daejeon, South Korea umar@kaist.ac.kr Chong Min Kyung Smart

More information

Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting

Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting C. Guardiani, C. Forzan, B. Franzini, D. Pandini Adanced Research, Central R&D, DAIS,

More information

Keywords : MTCMOS, CPFF, energy recycling, gated power, gated ground, sleep switch, sub threshold leakage. GJRE-F Classification : FOR Code:

Keywords : MTCMOS, CPFF, energy recycling, gated power, gated ground, sleep switch, sub threshold leakage. GJRE-F Classification : FOR Code: Global Journal of researches in engineering Electrical and electronics engineering Volume 12 Issue 3 Version 1.0 March 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global

More information

Improving TDR/TDT Measurements Using Normalization Application Note

Improving TDR/TDT Measurements Using Normalization Application Note Improving TDR/TDT Measurements Using Normalization Application Note 1304-5 2 TDR/TDT and Normalization Normalization, an error-correction process, helps ensure that time domain reflectometer (TDR) and

More information

VLSI System Testing. Outline

VLSI System Testing. Outline ECE 538 VLSI System Testing Krish Chakrabarty System-on-Chip (SOC) Testing ECE 538 Krish Chakrabarty 1 Outline Motivation for modular testing of SOCs Wrapper design IEEE 1500 Standard Optimization Test

More information

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES 44 CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES 3.1 INTRODUCTION The design of high-speed and low-power VLSI architectures needs efficient arithmetic processing units,

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH 2009 427 Power Management of Voltage/Frequency Island-Based Systems Using Hardware-Based Methods Puru Choudhary,

More information

Modeling Physical PCB Effects 5&

Modeling Physical PCB Effects 5& Abstract Getting logical designs to meet specifications is the first step in creating a manufacturable design. Getting the physical design to work is the next step. The physical effects of PCB materials,

More information

Leakage Power Minimization in Deep-Submicron CMOS circuits

Leakage Power Minimization in Deep-Submicron CMOS circuits Outline Leakage Power Minimization in Deep-Submicron circuits Politecnico di Torino Dip. di Automatica e Informatica 1019 Torino, Italy enrico.macii@polito.it Introduction. Design for low leakage: Basics.

More information

A Low-Power SRAM Design Using Quiet-Bitline Architecture

A Low-Power SRAM Design Using Quiet-Bitline Architecture A Low-Power SRAM Design Using uiet-bitline Architecture Shin-Pao Cheng Shi-Yu Huang Electrical Engineering Department National Tsing-Hua University, Taiwan Abstract This paper presents a low-power SRAM

More information

Lec 24: Parallel Processors. Announcements

Lec 24: Parallel Processors. Announcements Lec 24: Parallel Processors Kavita ala CS 3410, Fall 2008 Computer Science Cornell University P 3 out Hack n Seek nnouncements The goal is to have fun with it Recitations today will talk about it Pizza

More information

Reduce Power Consumption for Digital Cmos Circuits Using Dvts Algoritham

Reduce Power Consumption for Digital Cmos Circuits Using Dvts Algoritham IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE) e-issn: 2278-1676,p-ISSN: 2320-3331, Volume 10, Issue 5 Ver. II (Sep Oct. 2015), PP 109-115 www.iosrjournals.org Reduce Power Consumption

More information

Design of Sub-10-Picoseconds On-Chip Time Measurement Circuit

Design of Sub-10-Picoseconds On-Chip Time Measurement Circuit Design of Sub-0-Picoseconds On-Chip Time Measurement Circuit M.A.Abas, G.Russell, D.J.Kinniment Dept. of Electrical and Electronic Eng., University of Newcastle Upon Tyne, UK Abstract The rapid pace of

More information

UNIT-III POWER ESTIMATION AND ANALYSIS

UNIT-III POWER ESTIMATION AND ANALYSIS UNIT-III POWER ESTIMATION AND ANALYSIS In VLSI design implementation simulation software operating at various levels of design abstraction. In general simulation at a lower-level design abstraction offers

More information

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy CSE 2021: Computer Organization Single Cycle (Review) Lecture-10 CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan CSE-2021 July-12-2012 2 Single Cycle with Jump Multi-Cycle Implementation

More information