Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System

Size: px
Start display at page:

Download "Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System"

Transcription

1 Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Ho Young Kim, Robert Maxwell, Ankil Patel, Byeong Kil Lee Abstract The purpose of this study is to analyze and compare the performance of multi-threaded system and chip-multi-processor system. Currently, various types of multi-core and multi-threaded processors are popularly used in all the domains. Even mobile devices have two or more cores to improve the performance. While those two technologies are widely used, it is not clear which one would be better in performance/power consumption and which hardware configuration is optimal for a specific target domain. This research originally arose from a question asking which system we should choose to execute 4 thread workloads: (i) 1 core, 4-threaded computer; or (ii) 4 cores, 1-threaded computer. Index Terms Multi-thread system, Chip-multi-processor system, Performance evaluation, Power consumption. I. INTRODUCTION In today s computer industry, it is common to use multi-threaded or chip-multi-processor computers. Also, in mobile device applications, it is becoming more common to see cell phones with multi-cores. Multi-threading/multi-core technology increases performance, but doing so requires more power than do single treading/core computers. It was not a big issue to use a lot of power at the beginning of computer era. However, consuming less power is becoming a critical issue in designing computer systems. Using multi-threaded and multi-core systems also requires more space (area) than a single threaded or single core system. Multi-threaded and multi-core technologies are not the same concept [5][10][11][13]. A thread is the smallest unit of processing that can be scheduled by an operating system, and multiple threads can exist within the same process and share the resources but be executed independently. However, cores in multi-core system have each hardware resources for themselves and use them for each processing. It is not simple to say that performance of multi-threading is better than that of multi-core or vice versa, and this research is started by this curiosity. In this paper, we measure some performance metrics of 32 multi-threading systems and 32 multi-core systems in order to compare and find the best configuration of performance computing. For each multi-threading and multi-core systems, the simulation case is made by multiple of two (2, 4, 8, 16 and 32), and total number of simulation was 20 for each benchmarks. Manuscript received Aug 2, Ho Young Kim, Electrical and Computer Engineering, University of Texas at San Antonio, ( Derek.kim25@gamil.com). San Antonio, Texas, USA, Robert Maxwell, (ramaxwell@gamil.com). Ankil Patel, (ankilpatel@hotmail.com). Byeong Kil Lee, ( byeong.lee@utsa.edu). Based on the investigation and analysis, the best condition for performance and power consumption depends on the application and a given number of execution threads. The best combination of number of cores/threads is recommended for each thread group. Also, we cannot say that one of them, multi-threading or multi-core, is clearly better than the other one. As fabrication technology evolves, designers can create tinier transistors, where leakage power is getting more important. In general, computer performance is not determined by frequency number or the number of threads/cores anymore. Of the many factors that affect computer performance, this research focuses on the impact from different combination of multi-threaded and multi-core systems. Also, power consumption is taking into account with the variations. II. METHODOLOGY A. Multi2sim simulator for performance measurement Multi2Sim [1] has been developed integrating some significant characteristics of popular simulators, such as separate functional and timing simulation, SMT and multiprocessor support and cache coherence. Multi2sim is an application-only tool intended to simulate x86 binary executable files. Table 1 shows the simulator s configuration. Table 1. Multi2sim configuration for this research Simulator Architecture Cache configuration Multi2sim x86 architecture L1: private, 2-way, 64KB(32/32) L2: private, 8-way, 512KB L3: shared, 16-way, 8MB The simulation is run by several thread-groups which are 2-Th, 4-Th, 8-Th, 16-Th, and 32-Th. In each group, the simulation case is determined by increasing number of cores or by decreasing number of threads. For example in 4-Th group, the first simulation case is 1 thread-4 cores, the second case is 2 threads-2 cores, and the last case is 4 threads-1 core. This way, the total simulation case in five groups is 20. The 4 physical machines are used for this research, having Xeon E5506 processor with private L1, L2 caches and shared L3 cache. Table 2 shows the detail information about the machine used in the simulation. All configurations are running in parallel and the result data is gathered and analyzed. 10

2 Table 2. Physical machine configuration Table 4. PARSEC 2.1 benchmark suite Xeon E5506 # of CPU 8/12 cores Cache configuration Technology L1: private, 2-way, 64KB(32/32) L2: private, 8-way, 256KB L3: shared, 16-way, 8MB 45nm B. McPAT Simulator for power consumption McPAT [4] is the first integrated power, area, and timing modeling framework for multithreaded and multicore / manycore processors. It is designed to work with a variety of processor performance simulators (and thermal simulators, etc.) over a large range of technology generations. McPAT allows a user to specify low-level configuration details. It also provides default values when the user decides to specify only high-level architectural parameters. For power simulation in this research, the Xeon Tulsa system configuration which is included in simulator, are used for CPU configuration, and the CPU simulation data from Multi2sim are dumped to collect activity information. The number of power simulations is same with that of CPU performance simulation, but the simulation time is relatively shorter than CPU simulation time. Cache configuration for McPAT is matched to CPU simulator, Multi2sim, to get more accurate result. Table 3 shows simulator condition for this research. Table 3. McPAT power simulator configuration Simulator McPAT 0.8 Instruction mix of the PARSEC is shown in Figure 1. Total 9 workloads are analyzed for instruction mix. In all workloads have over 20% of Memory instructions, and the portion of integer instruction is over 36% in all workloads. Also, it is a main part of instruction in PARSEC 2.1 benchmark suite. For floating point instruction, 5 workloads have over 10% of it, but in 2 workloads, there is no floating point instruction. These 3 instructions portion in all benchmarks instruction mix is over 84%. Base CPU Cache configuration Technology Xeon Tulsa L1: private, 2-way, 64KB(32/32) L2: private, 8-way, 512KB L3: shared, 16-way, 8MB 65nm Figure 1. Instruction Mix in PARSEC benchmark III. WORKLOADS The PARSEC benchmark suite [2][3][15] is used for the experiments. This is recompiled version for Multi2sim simulator, and it can be downloaded in Multi2sim workload site. One more important thing to use this benchmark suite is that it also needs different execution commands from original version, but the detail information to run workloads is located inside each workload folder. Table 4 shows the detail information about the PARSEC 2.1 benchmarks suite [12]. All 13 benchmarks would be used in first plan, but 4 benchmarks have some problem from recompilation and cannot be run - which are Facesim, Freqmine, Raytrace, and Streamcluster. So, 9 benchmarks out of 13 are used for this experiment. The input for these workloads are medium size input sets because it takes too long time for simulation with native input. For example with medium input set, the full simulation time is about 3 or 4 days depends on benchmark. If the native input is used, it maybe needs more than 7 days for 1 full simulation. Also, Multi2sim creator recommended the medium size input for simulation. IV. PERFORMANCE COMPARISON OF MULTI-THREADED AND MULTI-CORE SYSTEM WITH VARIOUS CONFIGURATIONS A. CPU performance: Instruction Per Cycle variation in working thread groups One of the most common and widely used performance metrics is IPC (Instruction Per Cycle) and it is measured and analyzed in all simulation cases because it is an indicator of speed for the processor [6][14]. Figure 2 is a comparison chart between threads variations in one core and core variations in fixed one thread. In left chart, threads are increased from 1 to 32 by power of 2, and IPC is decreased from 1.83 to 1.11 when the number of threads is increased. Performance is degraded by increasing number of thread because of threads contention, but in case of increasing number of cores, the performance is increased from 2.40 to linearly. When number of cores is increased by power of 2, performance is also increased as much as 1.58 in all cases. 11

3 Figure 2. IPC comparison between threads and cores up These trends are common for all benchmarks and it is shown in Figure 3 and Figure 4. Figure 3 is an IPC chart of thread variation of fixed one core, and Figure 4 is an IPC chart of core variation of fixed one thread. Figure 5. IPC chart of threads variation in two core and four core fixed condition IPC performance is investigated in each working threads group, 2-thread, 4-thread, 8-thread, 16-thread and 32-thread. First one is performance data chart in 2-thread and 4-thread groups in Figure 6. Figure 3. IPC chart of thread variation in fixed core Figure 6. IPC chart of threads variation in 2 and 4 working threads groups Figure 4. IPC chart of core variation in fixed thread In other core fixed and thread variation charts, there is an interesting point that using 2 threads or 4 threads has better performance than others in all simulation cases. In multi-threading system, hardware resources are shared among threads, and it is shown that using 2 threads or 4 threads is best choice by this simulation as shown in Figure 5. However, in core variation simulation, when more number of cores is used, the CPU performance is always increased. 12 Figure 7. IPC chart of threads variation in 8 working threads group In 2 and 4 threads group, it is obvious that using more cores instead of threads is good to get a better performance. Also, it is confirmed in 8-thread group chart, Figure 7. In 16 and 32 working group threads analysis, results are the same as the other cases that when threads are changed to cores, performance is increased as shown in Figure 8. It can be summarized that increasing number of cores instead of increasing number of threads improves the CPU performance, and using lots of threads, over 4 threads, hurts CPU performance by causing too much threads contentions.

4 C. Power measurement There are some power metrics in power simulation result, and runtime dynamic power, substrate leakage, and total leakage power is considered in this section to analyze power behavior in multi-threading system and multi-core system. Figure 8. IPC performance in 16 and 32 working thread Dynamic power performance: Runtime dynamic power consumption represents a power consumed by a hardware activity and switching by the input. There is peak dynamic power, and it is the worst case of power consumption under maximum hardware activity and switching, so it is not considered in this paper. Figure 10 shows runtime dynamic power comparison chart between thread changing and core changing. B. Area Estimation Consumption area is also important factor of developing CPU, and it is measured with power consumption simulator, McPAT. In general, area depends on hardware resources such as the number of hardware, number of threads and number of cores. Usually increasing number of core needs more space than increasing number of threads in one central processing unit. This common information is also reflected in simulation result. Figure 9 shows that estimated area increases as both number of thread up and number of core up. Figure 9. Estimated area in threads up and cores up In the chart, when number of thread is increased from 2 to 32, the area is increased from 352 mm 2 to 617 mm 2. The difference is less than two times from 2 threads case. However, in core variation case, when number of core is increased by two, the area is increased by 122mm 2 in each case, so total increased area from 2 cores to 32 cores is over 4 times. To summarize area estimation in threads variation and cores variation is that increasing number of cores is not a good choice for area consumption and needs more 122 mm 2 to increase 2 cores because increasing number of cores needs additional hardware resources per core like processing unit, L1 and L2 cache, but increasing number of threads is better choice and needs 15 mm 2 to increase 2 threads. Figure 10. Runtime dynamic power consumption in thread and core variation The left one is a thread variation result from 2-thread to 32-thread in one core, and it shows runtime dynamic power is increased exponentially by increasing number of threads by power of 2. The right one is same type of chart of core changing in one thread, and the dynamic power is increased linearly by increasing number of cores by power of 2. Lower 16 working threads cases, the dynamic power consumption in core variation group, 16c1t (16 core-1thread), 8c1t, 4c1t, 2c1t, is higher than thread variation group, 1c16t, 1c8t, 1c4t, 1c2t, but in 32 working threads, it is changed suddenly and 1 core 32 threads used about 2 times more power than 1 thread 32 cores. The result in each working threads groups will be explained one by one. First chart is 2 and 4 working threads chart that is Figure 11. It is expected to use more power using more cores instead of thread in these 2 and 4 working threads groups, and the simulation data has same trend with expectation that 2c1t consumed more power than 1c2t. Also, in 4 working threads group, 2c2t and 4c1t consumed more power than 1c4t. There is no Fluidanimate power consumption data in 4c1t case because simulator error was happened in that only 4c1t configuration and it cannot be fixed though it is asked in Mulati2sim forum. In 8 working threads group, the trend of dynamic power consumption is similar to 4 threads group data that when thread is changed to cores in fixed 8 working threads, the power consumption increased by increasing number of cores. 13

5 Figure11. Runtime dynamic consumption in 2 and 4 working threads In case of bodytrack, x264, they showed well linear trend of power consumption by increasing cores for threads. This 8 working threads groups data trend is not changed at all when number of working thread were changed from 4 to 8, but from 16 working threads the power consumption data shows some meaningful changing from that data. Figure 12 shows runtime dynamic power consumption data in 16 working threads. In this 16 working threads chart, 1c16t case used more power than 2c8t and 4c4t, and after 8c2t consumed power is higher than 1c16t, and 16c1t is used more power than others. It shows that using over 8 threads can cause too much threads contention among them as much as degrading the system performance, and it becomes more serious in 32 working threads group. In 32 working thread, data trends are changed a lot than other cases in Figure 13. In this 32 working threads group, the most power consuming case is 1 core 32 threads, 33.6 W, and 4 cores 8 threads consumed less power than others, 9.6 W. Figure 13. Runtime dynamic power in 32 working threads group Leakage power consumption: When process technology keeps shrinking, the leakage power consumption becomes a dominant part of power consumption in these days [7]. Moreover, as long as process technology keeps decreasing, the leakage power will be more important part of power consumption to overcome. There are gate leakage, subthreshold leakage and total leakage in simulation data, and gate leakage is about less than a tenth of subthreshold leakage, so subthreshold and total leakage is mainly considered in this leakage power section. This leakage power consumption will be reviewed with same order with dynamic power consumption that first comparison between threads and core variation will be showed then each working thread group result will be showed. Figure 14 shows the subthreshold leakage power chart in thread variation and core variation. Figure 12. Runtime dynamic power in 16 working threads group It is the expected result to consume more power in 1 core 32 threads in core variation trends, but it is not expected that 4 core 8 threads used least power than others. The differences power consumption between highest and lowest is W, over 3 times higher than 4c8t s, the lowest one. Generally in this simulation, when number of cores increased in fixed thread, the power is increased linearly, but when number of threads increased in fixed core, the power is increased exponentially, so 1 core 32 threads consumed more power than 1 thread 32 cores and others. Figure 14. Subthreshold leakage in threads variation and core variation In left chart with threads variation, 1 core 2 threads subthreshold leakage is W and it is W for 1 core 32 threads. Number of threads is increased 16 times, and subthreshold leakage is increased just about 5 W. However, in right chart, core variation in one thread, subthreshold leakage is increased from W to W exponentially, and it is about over 10 times than 2 cores 1 thread s although in area estimation, area is increased 4 times than 2 cores 1 thread. The leakage power is directly connected with increasing number of cores, and it is shown well in all working threads groups in Figure

6 Figure 15. Total leakage power in working threads groups In each group, thread is changed to core in fixed working threads. The leakage power is increased exponentially when thread is changed to core, and the increasing rate is also increased in more number of working threads group. For best performance for power consumption, using more number of threads than the number of cores is the answer. Figure 17. E-D product in 2, 4 and 8 working threads Figure 16. Power consumption comparison between runtime dynamic and total leakage To see the impact of leakage power consumption, it is compared to dynamic power consumption in Figure 16. In left chart, the gap between runtime dynamic and leakage is getting smaller when using more number of threads because runtime dynamic power increased exponentially but leakage is linearly in small rate. However, in right chart, the gap between runtime dynamic and leakage is getting larger as the number of cores increases, and in 32 cores 1 thread the power consumption difference is over 10 times than runtime dynamic power consumption. In summary, runtime dynamic power is exponentially increased by increasing number of threads, and leakage power is also exponentially increased by increasing number of cores, but the portion of leakage power in this 64 nm technology is over 90% in total power consumption. D. Power and delay product (mixed performance of power and speed) E-D Product: Until now the speed performance by the IPC and the power performance by runtime dynamic and leakage power is simulated and analyzed, but it is still not easy to say that which one is better for overall performance to use more threads or more cores. To see the total performance including speed and power performance, the Energy * Delay product [8] is adopted. To calculate E-D product, it is needed that energy per instruction data and cycle per instruction data, and it can be changed (Energy/cycle)/IPC Figure 18. E-D product in 16 working threads Figure 17 shows an Energy-Delay product chart of 2, 4 and 8 working threads groups and shows the results of best performances in each group. In 2 working threads group, the best performance is 1 core 2 threads, and in 4 working threads, the best is 2 core 2 threads, and from 4 working threads group the best performance point is going to move to use more cores than threads. However, the worst choice in each group is 2c1t, 1c4t, and 1c8t, using more threads than cores except in 2 working threads group. Figure 18 shows 16 working threads group data. The best performance condition is in 8 cores 2 threads in this group, but the mixed performance data between 4-cores 4-threads and 8-core 2-threads is similar. Also, in this group the worst performance is 1c16t, using highest threads in group and second bad case is 2c8t like expectation by IPC and power consumption results. In 32 working threads group result, the best performance is in using 8 cores and 4 threads as shown in Figure 19, and worst performance is in using 1 core and 32 threads. When working threads number is increased, using more number of threads cannot give a better performance than using more number of cores. By this Energy-Delay product results in groups, 2 threads or 4 threads can improve the performance but in using more than 4 threads, the performance can be hurt by this over threads contention. Also, the number of cores should be increased for better performance under using 2 or 4 threads. Because of the high leakage power consumption in large number of cores, 16c1t and 32c1t cannot be a best case in each group.

7 Figure 19. E-D product in 32 working threads E-D 2 Product: The other popular mixed performance metric is E-D 2 [9] With E-D product. Also, this E-D 2 metric is calculated with IPC and Power performance data for this research to compare the performance of both. Figure 20 shows an E-D 2 calculation chart from the same simulation in 2, 4 and 8 working threads groups. In the 2 working threads group, the best performance case is 2 cores 1 thread, not 1 core 2 threads like E-D product because speed factor is more weighted by multiplying Delay to E-D. Also, in 4 working threads group data, worst case is 1c4t, and it is same with E-D product result, but best performance case is 4c1t, it is changed from 2c2t in E-D product. Figure 20. E-D^2 product in 2,4,8 working threads In 8 working threads group, the gaps among mixed performance data are increased, and the best performance combination in this group is using 8 cores 1 thread. In case of increasing number of core instead of thread, the weak point is that more number of cores consumed more power as hardware and area are increased. However, its merit is increasing the speed by more number of cores. In this E-D 2 product, more important metric for performance is speed, delay, so best performance choice is moved to use more number of cores in all groups. Figure 21 shows 16 working threads group data. The best performance in this group was 8c2t in E-D product, but in E-D 2 metric data, the best choice is moved to use one more core and one less thread, 16c1t, and worst choice is 1c16t, it is same with E-D product. In 32 working threads group, result of E-D 2 trend is almost same with other groups that 16c2t is the best case in this group, and worst is 1c32t in Figure 22. Figure 21. E-D 2 product in 16 working threads group Figure 22. E-D 2 product in 32 working threads group In these E-D and E-D 2 products, the mixed performance is reviewed that more number of cores should be used for better performance in each working threads group. Usually, using more number of cores is better for speed and runtime dynamic power consumption, but minor point of it is increasing the leakage power. The portion of leakage power is 42.5%, and runtime dynamic power is 57.5% of total power consumption in 1 core 32 threads by the McPAT power simulations. Moreover, in 32 cores 1 thread, the leakage power consumed 93.7% and runtime dynamic power consumed 6.3% of total power. If the solution for leakage power will is found, the answer for better performance is always to choose highest number of cores in each group. It is shown in Table 5 Power consumption comparison. Table 5. Power consumption comparison in 32 working threading group (Total = Runtime Dynamic + Total Leakage, Total leakage = Gate leakage + Subthreshold leakage) 16

8 V. CONCLUSION Lots of information is gathered via simulation, and it is not an easy work to find relationships between multi-threads and multi-core computing performance. Many charts are drawn with different configurations and analyzed. In CPU performance, the IPC behavior is mainly measured and analyzed by variations of threads and cores for each working thread group. The IPC performance is degraded by increasing number of threads under fixed number of cores, and it is shown in all other thread variation cases. However, the IPC is always increasing when number of cores is increased with fixed number of threads cases. In each working thread group, the IPC performance is increased when a thread is changed to a core. Although, 2 or 3 benchmarks show saturation behavior among groups, their performance data is not diminished. It is clear that increasing the number of cores provides better performance than by increasing the threads. In area estimation, increasing the number of threads is a better choice than increasing the number of cores to have a smaller CPU area with the same number of threads or cores. For example, the area for 32 threads CPU is almost 2 times larger than 2 threads CPU. However, 32 core CPU size is over 4 times larger than 2 cores CPU. This is attributed to the relationship between area increasing rate with number of threads and cores which is linear and exponential respectively. Dynamic power consumption shows exponential relationship when the number of threads is increased and linear relationship when the number of cores is increased. The leakage power consumption is higher than the dynamic power consumption for variations in both threads and cores. Leakage power consumption is more dominant when numbers of cores are increased consuming up to 93.7% of the power in worst case.to make a final decision between multi-thread computer and multi-core computer, the Energy-Delay product is adopted, and it draw a new conclusion that in all number of working threads groups it is better to use multi-core concept with 2 or 4 threads without 2 working threads group. For example, 2c2t, 4c2t, 8c2t, and 8c4t are best performance choices in all working threads groups in E-D product. This result will be continued by increasing number of working threads because the leakage power is main part of power consumption and increased by number of cores exponentially. Also, E-D 2 product is calculated from simulation data because it is also well used metric for mixed performance. By increased weight of Delay, the result data is moved to use more number of cores than E-D product result. From 2 working threads to 16 working threads groups, all best choices are using highest number of cores, and for 32 working threads group, the best case is using 16 cores and 2 threads. Based on simulation and analysis, for better speed and better power consumption it can be recommended to use multi-core system having 2 multi-threading architecture. The winner for better speed, area, and power consumption is not the only one, multi-core system or multi-threading system. To achieve the best performance multi-core and multi-threading system has to be used together. However, if leakage power problem can be solved, winner will be a multi-core system. 17 REFERENCES [1] R. Ubal, J. Sahuquillo, S. Petit, P. Lopez. Multi2Sim: A Simulation Frame work to Evaluate Multicore-Multithreaded Processors, IEEE [2] C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and CompilationTechniques, October [3] Christian Bienia and Kai Li. PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, June [4] Sheng Li; Jung Ho Ahn; Strong, R.D.; Brockman, J.B.; Tullsen, D.M.; Jouppi, N.P. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. Microarchitecture, MICRO [5] Intel Hyper-Threading Technology, Jan 2003.[Online]. Available: [6] A. Suleman What makes parallel programming hard? May [Online] ng.html [7] Zyuban, V.V.; Kogge, P.M. Inherently lower-power highperformance superscalar architectures. Computers, IEEE Transactions on Volume: 50, Issue: 3. Page(s): , 2001 [8] Yingmin Li; Brooks, D.; Zhigang Hu; Skadron, K.; Bose, P. Understanding the energy efficiency of simultaneous multithreading. Low Power Electronics and Design, ISLPED '04. Proceedings of the 2004 International Symposium on Publication Page(s): 44 49, 2004 [9] Cong, J.; Jagannathan, A.; Reinman, G.; Tamir, Y. Understanding the energy efficiency of SMT and CMP with multiclustering. Low Power Electronics and Design, ISLPED '05. Proceedings of the 2005 International Symposium on. Page(s): 48 53, 2005 [10] Multithreading (computer architecture). Wikipedia, n.p. n.d. [11] Multi-core processor. Wikipedia, n.p. n.d. [12] Contreras, G.; Martonosi, M. Characterizing and improving the performance of Intel Threading Building Blocks Workload Characterization, IISWC IEEE International Symposium on, Page(s): 57 66, [13] Alameldeen, A.R.; Wood,D.A. Variability in architectural simulations of multi-threaded workloads High-Performance Computer Architecture, HPCA Proceedings. The Ninth International Symposium on. Page(s): 7 18, [14] Wu, C.-J.; Martonosi, M. Characterization and dynamic mitigation of intra-application cache interference. Performance Analysis of Systems and Software (ISPASS), IEEE International Symposium on. Page(s): 2 11, 2011 [15] Bhadauria, M.; Weaver, V.M.; McKee, S.A. Understanding PARSEC performance on contemporary CMPs. Workload Characterization, IISWC IEEE International Symposium on. Page(s): , 2009.

Big versus Little: Who will trip?

Big versus Little: Who will trip? Big versus Little: Who will trip? Reena Panda University of Texas at Austin reena.panda@utexas.edu Christopher Donald Erb University of Texas at Austin cde593@utexas.edu Lizy Kurian John University of

More information

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Ramon Canal NCD Master MIRI. NCD Master MIRI 1 Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/

More information

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Presentation of UniServer Horizon 2020 European project findings: X-Gene server chips, voltage-noise characterization, high-bandwidth voltage measurements,

More information

Characterizing and Improving the Performance of Intel Threading Building Blocks

Characterizing and Improving the Performance of Intel Threading Building Blocks Characterizing and Improving the Performance of Intel Threading Building Blocks Gilberto Contreras, Margaret Martonosi Princeton University IISWC 08 Motivation Chip Multiprocessors are the new computing

More information

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of

More information

Hotspot Monitoring and Temperature Estimation with Miniature On-Chip Temperature Sensors

Hotspot Monitoring and Temperature Estimation with Miniature On-Chip Temperature Sensors Error ( o C) Hotspot Monitoring and Temperature Estimation with Miniature On-Chip Temperature Sensors Pavan Kumar Chundi, Yini Zhou, Martha Kim, Eren Kursun, Mingoo Seok Columbia University, New York,

More information

Statistical Simulation of Multithreaded Architectures

Statistical Simulation of Multithreaded Architectures Statistical Simulation of Multithreaded Architectures Joshua L. Kihm and Daniel A. Connors University of Colorado at Boulder Department of Electrical and Computer Engineering UCB 425, Boulder, CO, 80309

More information

Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence

Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence Katayoun Neshatpour George Mason University kneshatp@gmu.edu Amin Khajeh Broadcom Corporation amink@broadcom.com Houman Homayoun

More information

An Overview of Static Power Dissipation

An Overview of Static Power Dissipation An Overview of Static Power Dissipation Jayanth Srinivasan 1 Introduction Power consumption is an increasingly important issue in general purpose processors, particularly in the mobile computing segment.

More information

Performance Evaluation of Recently Proposed Cache Replacement Policies

Performance Evaluation of Recently Proposed Cache Replacement Policies University of Jordan Computer Engineering Department Performance Evaluation of Recently Proposed Cache Replacement Policies CPE 731: Advanced Computer Architecture Dr. Gheith Abandah Asma Abdelkarim January

More information

Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips

Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips Timothy N. Miller, Xiang Pan, Renji Thomas, Naser Sedaghati, Radu Teodorescu

More information

DESIGN OF 4x4 WALLACE TREE MULTIPLIER BASED ON 0.12µm CMOS TECHNOLOGY USING GDI FULL ADDER

DESIGN OF 4x4 WALLACE TREE MULTIPLIER BASED ON 0.12µm CMOS TECHNOLOGY USING GDI FULL ADDER Volume 119 No. 15 2018, 3293-3300 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ DESIGN OF 4x4 WALLACE TREE MULTIPLIER BASED ON 0.12µm CMOS TECHNOLOGY USING

More information

Proactive Thermal Management using Memory-based Computing in Multicore Architectures

Proactive Thermal Management using Memory-based Computing in Multicore Architectures Proactive Thermal Management using Memory-based Computing in Multicore Architectures Subodha Charles, Hadi Hajimiri, Prabhat Mishra Department of Computer and Information Science and Engineering, University

More information

IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, VOL. 1, NO. 1, JANUARY

IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, VOL. 1, NO. 1, JANUARY This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 1.119/TMSCS.218.287438,

More information

Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS

Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS Rizwana Begum, David Werner and Mark Hempstead Drexel University {rb639,daw77,mhempstead}@drexel.edu Guru Prasad, Jerry

More information

Leveraging Simultaneous Multithreading for Adaptive Thermal Control

Leveraging Simultaneous Multithreading for Adaptive Thermal Control Leveraging Simultaneous Multithreading for Adaptive Thermal Control James Donald and Margaret Martonosi Department of Electrical Engineering Princeton University {jdonald, mrm}@princeton.edu Abstract The

More information

Microarchitectural Simulation and Control of di/dt-induced. Power Supply Voltage Variation

Microarchitectural Simulation and Control of di/dt-induced. Power Supply Voltage Variation Microarchitectural Simulation and Control of di/dt-induced Power Supply Voltage Variation Ed Grochowski Intel Labs Intel Corporation 22 Mission College Blvd Santa Clara, CA 9552 Mailstop SC2-33 edward.grochowski@intel.com

More information

An Energy-Efficient Heterogeneous CMP based on Hybrid TFET-CMOS Cores

An Energy-Efficient Heterogeneous CMP based on Hybrid TFET-CMOS Cores An Energy-Efficient Heterogeneous CMP based on Hybrid TFET-CMOS Cores Abstract The steep sub-threshold characteristics of inter-band tunneling FETs (TFETs) make an attractive choice for low voltage operations.

More information

H-EARtH: Heterogeneous Platform Energy Management

H-EARtH: Heterogeneous Platform Energy Management IEEE SUBMISSION 1 H-EARtH: Heterogeneous Platform Energy Management Efraim Rotem 1,2, Ran Ginosar 2, Uri C. Weiser 2, and Avi Mendelson 2 Abstract The Heterogeneous EARtH algorithm aim at finding the optimal

More information

DAT175: Topics in Electronic System Design

DAT175: Topics in Electronic System Design DAT175: Topics in Electronic System Design Analog Readout Circuitry for Hearing Aid in STM90nm 21 February 2010 Remzi Yagiz Mungan v1.10 1. Introduction In this project, the aim is to design an adjustable

More information

Interconnect-Power Dissipation in a Microprocessor

Interconnect-Power Dissipation in a Microprocessor 4/2/2004 Interconnect-Power Dissipation in a Microprocessor N. Magen, A. Kolodny, U. Weiser, N. Shamir Intel corporation Technion - Israel Institute of Technology 4/2/2004 2 Interconnect-Power Definition

More information

Static Energy Reduction Techniques in Microprocessor Caches

Static Energy Reduction Techniques in Microprocessor Caches Static Energy Reduction Techniques in Microprocessor Caches Heather Hanson, Stephen W. Keckler, Doug Burger Computer Architecture and Technology Laboratory Department of Computer Sciences Tech Report TR2001-18

More information

Processors Processing Processors. The meta-lecture

Processors Processing Processors. The meta-lecture Simulators 5SIA0 Processors Processing Processors The meta-lecture Why Simulators? Your Friend Harm Why Simulators? Harm Loves Tractors Harm Why Simulators? The outside world Unfortunately for Harm you

More information

Under Submission. Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS

Under Submission. Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-Component DVFS Rizwana Begum, David Werner and Mark Hempstead Drexel University {rb639,daw77,mhempstead}@drexel.edu Guru Prasad, Jerry

More information

Low Power Design of Successive Approximation Registers

Low Power Design of Successive Approximation Registers Low Power Design of Successive Approximation Registers Rabeeh Majidi ECE Department, Worcester Polytechnic Institute, Worcester MA USA rabeehm@ece.wpi.edu Abstract: This paper presents low power design

More information

COTSon: Infrastructure for system-level simulation

COTSon: Infrastructure for system-level simulation COTSon: Infrastructure for system-level simulation Ayose Falcón, Paolo Faraboschi, Daniel Ortega HP Labs Exascale Computing Lab http://sites.google.com/site/hplabscotson MICRO-41 tutorial November 9, 28

More information

Power Management in Multicore Processors through Clustered DVFS

Power Management in Multicore Processors through Clustered DVFS Power Management in Multicore Processors through Clustered DVFS A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Tejaswini Kolpe IN PARTIAL FULFILLMENT OF THE

More information

Dynamic MIPS Rate Stabilization in Out-of-Order Processors

Dynamic MIPS Rate Stabilization in Out-of-Order Processors Dynamic Rate Stabilization in Out-of-Order Processors Jinho Suh and Michel Dubois Ming Hsieh Dept of EE University of Southern California Outline Motivation Performance Variability of an Out-of-Order Processor

More information

Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture

Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture Jingwen Leng Yazhou Zu Vijay Janapa Reddi The University of Texas at Austin {jingwen, yazhou.zu}@utexas.edu,

More information

Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence

Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence 778 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 4, APRIL 2018 Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence

More information

Outline Simulators and such. What defines a simulator? What about emulation?

Outline Simulators and such. What defines a simulator? What about emulation? Outline Simulators and such Mats Brorsson & Mladen Nikitovic ICT Dept of Electronic, Computer and Software Systems (ECS) What defines a simulator? Why are simulators needed? Classifications Case studies

More information

Evaluation of CPU Frequency Transition Latency

Evaluation of CPU Frequency Transition Latency Noname manuscript No. (will be inserted by the editor) Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz Alexandre Laurent Benoît Pradelle William Jalby Abstract Dynamic Voltage and Frequency

More information

Variation-Aware Scheduling for Chip Multiprocessors with Thread Level Redundancy

Variation-Aware Scheduling for Chip Multiprocessors with Thread Level Redundancy Variation-Aware Scheduling for Chip Multiprocessors with Thread Level Redundancy Jianbo Dong, Lei Zhang, Yinhe Han, Guihai Yan and Xiaowei Li Key Laboratory of Computer System and Architecture Institute

More information

Recent Advances in Simulation Techniques and Tools

Recent Advances in Simulation Techniques and Tools Recent Advances in Simulation Techniques and Tools Yuyang Li, li.yuyang(at)wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download Abstract: Simulation refers to using specified kind

More information

WEI HUANG Curriculum Vitae

WEI HUANG Curriculum Vitae 1 WEI HUANG Curriculum Vitae 4025 Duval Road, Apt 2538 Phone: (434) 227-6183 Austin, TX 78759 Email: wh6p@virginia.edu (preferred) https://researcher.ibm.com/researcher/view.php?person=us-huangwe huangwe@us.ibm.com

More information

Proactive Thermal Management Using Memory Based Computing

Proactive Thermal Management Using Memory Based Computing Proactive Thermal Management Using Memory Based Computing Hadi Hajimiri, Mimonah Al Qathrady, Prabhat Mishra CISE, University of Florida, Gainesville, USA {hadi, qathrady, prabhat}@cise.ufl.edu Abstract

More information

A Framework for Assessing the Feasibility of Learning Algorithms in Power-Constrained ASICs

A Framework for Assessing the Feasibility of Learning Algorithms in Power-Constrained ASICs A Framework for Assessing the Feasibility of Learning Algorithms in Power-Constrained ASICs 1 Introduction Alexander Neckar with David Gal, Eric Glass, and Matt Murray (from EE382a) Whether due to injury

More information

PROCESS and environment parameter variations in scaled

PROCESS and environment parameter variations in scaled 1078 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 10, OCTOBER 2006 Reversed Temperature-Dependent Propagation Delay Characteristics in Nanometer CMOS Circuits Ranjith Kumar

More information

LIMITS OF PARALLELISM AND BOOSTING IN DIM SILICON

LIMITS OF PARALLELISM AND BOOSTING IN DIM SILICON ... LIMITS OF PARALLELISM AND BOOSTING IN DIM SILICON... THE AUTHORS INVESTIGATE THE LIMIT OF VOLTAGE SCALING TOGETHER WITH TASK PARALLELIZATION TO MAINTAIN TASK-COMPLETION LATENCY WHILE REDUCING ENERGY

More information

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir Parallel Computing 2020: Preparing for the Post-Moore Era Marc Snir THE (CMOS) WORLD IS ENDING NEXT DECADE So says the International Technology Roadmap for Semiconductors (ITRS) 2 End of CMOS? IN THE LONG

More information

CS4617 Computer Architecture

CS4617 Computer Architecture 1/26 CS4617 Computer Architecture Lecture 2 Dr J Vaughan September 10, 2014 2/26 Amdahl s Law Speedup = Execution time for entire task without using enhancement Execution time for entire task using enhancement

More information

Design of Ultra-Low Power PMOS and NMOS for Nano Scale VLSI Circuits

Design of Ultra-Low Power PMOS and NMOS for Nano Scale VLSI Circuits Circuits and Systems, 2015, 6, 60-69 Published Online March 2015 in SciRes. http://www.scirp.org/journal/cs http://dx.doi.org/10.4236/cs.2015.63007 Design of Ultra-Low Power PMOS and NMOS for Nano Scale

More information

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs Li Zhou and Avinash Kodi Technologies for Emerging Computer Architecture Laboratory (TEAL) School of Electrical Engineering and

More information

IMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU

IMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU IMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU Seunghak Lee (HY-SDR Research Center, Hanyang Univ., Seoul, South Korea; invincible@dsplab.hanyang.ac.kr); Chiyoung Ahn (HY-SDR

More information

A Thermally-Aware Methodology for Design-Specific Optimization of Supply and Threshold Voltages in Nanometer Scale ICs

A Thermally-Aware Methodology for Design-Specific Optimization of Supply and Threshold Voltages in Nanometer Scale ICs A Thermally-Aware Methodology for Design-Specific Optimization of Supply and Threshold Voltages in Nanometer Scale ICs ABSTRACT Sheng-Chih Lin, Navin Srivastava and Kaustav Banerjee Department of Electrical

More information

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1 Design Of Low Power Approximate Mirror Adder Sasikala.M 1, Dr.G.K.D.Prasanna Venkatesan 2 ME VLSI student 1, Vice Principal, Professor and Head/ECE 2 PGP college of Engineering and Technology Nammakkal,

More information

Reduced Area Carry Select Adder with Low Power Consumptions

Reduced Area Carry Select Adder with Low Power Consumptions International Journal of Emerging Engineering Research and Technology Volume 3, Issue 3, March 2015, PP 90-95 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) ABSTRACT Reduced Area Carry Select Adder with

More information

Final Report: DBmbench

Final Report: DBmbench 18-741 Final Report: DBmbench Yan Ke (yke@cs.cmu.edu) Justin Weisz (jweisz@cs.cmu.edu) Dec. 8, 2006 1 Introduction Conventional database benchmarks, such as the TPC-C and TPC-H, are extremely computationally

More information

Lighting the Dark Silicon by Exploiting Heterogeneity on Future Processors

Lighting the Dark Silicon by Exploiting Heterogeneity on Future Processors Lighting the Dark Silicon by Exploiting Heterogeneity on Future Processors Ying Zhang Lu Peng Xin Fu ϯ Yue Hu Division of Electrical & Computer Engineering ϯ Electrical Engineering and Computer Science

More information

Thermal Influence on the Energy Efficiency of Workload Consolidation in Many-Core Architectures

Thermal Influence on the Energy Efficiency of Workload Consolidation in Many-Core Architectures Thermal Influence on the Energy Efficiency of Workload Consolidation in Many-Core Architectures Fredric Hällis, Simon Holmbacka, Wictor Lund, Robert Slotte, Sébastien Lafond, Johan Lilius Department of

More information

Data Word Length Reduction for Low-Power DSP Software

Data Word Length Reduction for Low-Power DSP Software EE382C: LITERATURE SURVEY, APRIL 2, 2004 1 Data Word Length Reduction for Low-Power DSP Software Kyungtae Han Abstract The increasing demand for portable computing accelerates the study of minimizing power

More information

Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage

Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage Surbhi Kushwah 1, Shipra Mishra 2 1 M.Tech. VLSI Design, NITM College Gwalior M.P. India 474001 2

More information

Scheduling for HPC Systems with Process Variation Heterogeneity

Scheduling for HPC Systems with Process Variation Heterogeneity Scheduling for HPC Systems with Process Variation Heterogeneity Ehsan Totoni, Akhil Langer, Josep Torrellas, Laxmikant V. Kale Department of Computer Science, University of Illinois at Urbana-Champaign,

More information

VRCon: Dynamic Reconfiguration of Voltage Regulators in a Multicore Platform

VRCon: Dynamic Reconfiguration of Voltage Regulators in a Multicore Platform VRCon: Dynamic Reconfiguration of Voltage Regulators in a Multicore Platform Woojoo Lee, Yanzhi Wang, and Massoud Pedram Dept. of Electrical Engineering, Univ. of Souther California, Los Angeles, California,

More information

A NEW APPROACH FOR DELAY AND LEAKAGE POWER REDUCTION IN CMOS VLSI CIRCUITS

A NEW APPROACH FOR DELAY AND LEAKAGE POWER REDUCTION IN CMOS VLSI CIRCUITS http:// A NEW APPROACH FOR DELAY AND LEAKAGE POWER REDUCTION IN CMOS VLSI CIRCUITS Ruchiyata Singh 1, A.S.M. Tripathi 2 1,2 Department of Electronics and Communication Engineering, Mangalayatan University

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DESIGN OF LOW POWER MULTIPLIERS USING APPROXIMATE ADDER MR. PAWAN SONWANE 1, DR.

More information

Reduce Power Consumption for Digital Cmos Circuits Using Dvts Algoritham

Reduce Power Consumption for Digital Cmos Circuits Using Dvts Algoritham IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE) e-issn: 2278-1676,p-ISSN: 2320-3331, Volume 10, Issue 5 Ver. II (Sep Oct. 2015), PP 109-115 www.iosrjournals.org Reduce Power Consumption

More information

IBM Research Report. GPUVolt: Modeling and Characterizing Voltage Noise in GPU Architectures

IBM Research Report. GPUVolt: Modeling and Characterizing Voltage Noise in GPU Architectures RC55 (WAT1-3) April 1, 1 Electrical Engineering IBM Research Report GPUVolt: Modeling and Characterizing Voltage Noise in GPU Architectures Jingwen Leng, Yazhou Zu, Minsoo Rhu University of Texas at Austin

More information

444 Index. F Fermi potential, 146 FGMOS transistor, 20 23, 57, 83, 84, 98, 205, 208, 213, 215, 216, 241, 242, 251, 280, 311, 318, 332, 354, 407

444 Index. F Fermi potential, 146 FGMOS transistor, 20 23, 57, 83, 84, 98, 205, 208, 213, 215, 216, 241, 242, 251, 280, 311, 318, 332, 354, 407 Index A Accuracy active resistor structures, 46, 323, 328, 329, 341, 344, 360 computational circuits, 171 differential amplifiers, 30, 31 exponential circuits, 285, 291, 292 multifunctional structures,

More information

Total reduction of leakage power through combined effect of Sleep stack and variable body biasing technique

Total reduction of leakage power through combined effect of Sleep stack and variable body biasing technique Total reduction of leakage power through combined effect of Sleep and variable body biasing technique Anjana R 1, Ajay kumar somkuwar 2 Abstract Leakage power consumption has become a major concern for

More information

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS Charlie Jenkins, (Altera Corporation San Jose, California, USA; chjenkin@altera.com) Paul Ekas, (Altera Corporation San Jose, California, USA; pekas@altera.com)

More information

Characterizing non-ideal Impacts of Reconfigurable Hardware Workloads on Ring Oscillator-based Thermometers

Characterizing non-ideal Impacts of Reconfigurable Hardware Workloads on Ring Oscillator-based Thermometers Characterizing non-ideal Impacts of Reconfigurable Hardware Workloads on Ring Oscillator-based Thermometers Moinuddin A. Sayed Department of Electrical and Computer Engineering Iowa State University Ames,

More information

On-chip Networks in Multi-core era

On-chip Networks in Multi-core era Friday, October 12th, 2012 On-chip Networks in Multi-core era Davide Zoni PhD Student email: zoni@elet.polimi.it webpage: home.dei.polimi.it/zoni Outline 2 Introduction Technology trends and challenges

More information

Improved Linearity CMOS Multifunctional Structure for VLSI Applications

Improved Linearity CMOS Multifunctional Structure for VLSI Applications ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY Volume 10, Number 2, 2007, 157 165 Improved Linearity CMOS Multifunctional Structure for VLSI Applications C. POPA Faculty of Electronics, Telecommunications

More information

Towards a Cross-Layer Framework for Accurate Power Modeling of Microprocessor Designs

Towards a Cross-Layer Framework for Accurate Power Modeling of Microprocessor Designs Towards a Cross-Layer Framework for Accurate Power Modeling of Microprocessor Designs Monir Zaman, Mustafa M. Shihab, Ayse K. Coskun and Yiorgos Makris Department of Electrical and Computer Engineering,

More information

Conventional 4-Way Set-Associative Cache

Conventional 4-Way Set-Associative Cache ISLPED 99 International Symposium on Low Power Electronics and Design Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption Koji Inoue, Tohru Ishihara, and Kazuaki Murakami

More information

Cherry Picking: Exploiting Process Variations in the Dark Silicon Era

Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Siddharth Garg University of Waterloo Co-authors: Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu # Transistors Power/Dark

More information

EE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004

EE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004 EE 382C EMBEDDED SOFTWARE SYSTEMS Literature Survey Report Characterization of Embedded Workloads Ajay Joshi March 30, 2004 ABSTRACT Security applications are a class of emerging workloads that will play

More information

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER 1 CH.JAYA PRAKASH, 2 P.HAREESH, 3 SK. FARISHMA 1&2 Assistant Professor, Dept. of ECE, 3 M.Tech-Student, Sir CR Reddy College

More information

LOW LEAKAGE CNTFET FULL ADDERS

LOW LEAKAGE CNTFET FULL ADDERS LOW LEAKAGE CNTFET FULL ADDERS Rajendra Prasad Somineni srprasad447@gmail.com Y Padma Sai S Naga Leela Abstract As the technology scales down to 32nm or below, the leakage power starts dominating the total

More information

Tiago Reimann Cliff Sze Ricardo Reis. Gate Sizing and Threshold Voltage Assignment for High Performance Microprocessor Designs

Tiago Reimann Cliff Sze Ricardo Reis. Gate Sizing and Threshold Voltage Assignment for High Performance Microprocessor Designs Tiago Reimann Cliff Sze Ricardo Reis Gate Sizing and Threshold Voltage Assignment for High Performance Microprocessor Designs A grain of rice has the price of more than a 100 thousand transistors Source:

More information

A Study of The Advancement of CMOS ALU & Full Adder Circuit Design For Modern Design

A Study of The Advancement of CMOS ALU & Full Adder Circuit Design For Modern Design A Study of The Advancement of & Full Adder Circuit Design F Modern Design Bruce Hardy BR759875 Department of Electrical and Computer Engineering University of Central Flida Orlando, FL 32816-2362 Abstract

More information

PV SYSTEM BASED FPGA: ANALYSIS OF POWER CONSUMPTION IN XILINX XPOWER TOOL

PV SYSTEM BASED FPGA: ANALYSIS OF POWER CONSUMPTION IN XILINX XPOWER TOOL 1 PV SYSTEM BASED FPGA: ANALYSIS OF POWER CONSUMPTION IN XILINX XPOWER TOOL Pradeep Patel Instrumentation and Control Department Prof. Deepali Shah Instrumentation and Control Department L. D. College

More information

Characterization of 6T CMOS SRAM in 65nm and 120nm Technology using Low power Techniques

Characterization of 6T CMOS SRAM in 65nm and 120nm Technology using Low power Techniques Characterization of 6T CMOS SRAM in 65nm and 120nm Technology using Low power Techniques Sumit Kumar Srivastavar 1, Er.Amit Kumar 2 1 Electronics Engineering Department, Institute of Engineering & Technology,

More information

Walking Pads: Managing C4 Placement for Transient Voltage Noise Minimization

Walking Pads: Managing C4 Placement for Transient Voltage Noise Minimization Walking : Managing C4 Placement for Transient Voltage Noise Minimization Ke Wang, Brett H. Meyer, Runjie Zhang, Mircea R. Stan, Kevin Skadron Dept. of Computer Science University of Virginia Charlottesville,

More information

II. Previous Work. III. New 8T Adder Design

II. Previous Work. III. New 8T Adder Design ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: High Performance Circuit Level Design For Multiplier Arun Kumar

More information

Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications

Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications Seongsoo Lee Takayasu Sakurai Center for Collaborative Research and Institute of Industrial Science, University

More information

Bus Serialization for Reducing Power Consumption

Bus Serialization for Reducing Power Consumption Regular Paper Bus Serialization for Reducing Power Consumption Naoya Hatta, 1 Niko Demus Barli, 2 Chitaka Iwama, 3 Luong Dinh Hung, 1 Daisuke Tashiro, 4 Shuichi Sakai 1 and Hidehiko Tanaka 5 On-chip interconnects

More information

Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance

Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance Randy Morris Ϯ, Avinash Kodi Ϯ and Ahmed Louri School of Electrical Engineering and Computer

More information

ZIGZAG KEEPER: A NEW APPROACH FOR LOW POWER CMOS CIRCUIT

ZIGZAG KEEPER: A NEW APPROACH FOR LOW POWER CMOS CIRCUIT ZIGZAG KEEPER: A NEW APPROACH FOR LOW POWER CMOS CIRCUIT Kaushal Kumar Nigam 1, Ashok Tiwari 2 Department of Electronics Sciences, University of Delhi, New Delhi 110005, India 1 Department of Electronic

More information

Technology Challenges

Technology Challenges Technology Challenges ECE/CS 752 Fall 2017 Prof. Mikko H. Lipasti University of Wisconsin-Madison Readings Read on your own: Shekhar Borkar, Designing Reliable Systems from Unreliable Components: The Challenges

More information

A Low Power Single Ended Inductorless Wideband CMOS LNA with G m Enhancement and Noise Cancellation

A Low Power Single Ended Inductorless Wideband CMOS LNA with G m Enhancement and Noise Cancellation 2017 International Conference on Electronic, Control, Automation and Mechanical Engineering (ECAME 2017) ISBN: 978-1-60595-523-0 A Low Power Single Ended Inductorless Wideband CMOS LNA with G m Enhancement

More information

HARDWARE ACCELERATION OF THE GIPPS MODEL

HARDWARE ACCELERATION OF THE GIPPS MODEL HARDWARE ACCELERATION OF THE GIPPS MODEL FOR REAL-TIME TRAFFIC SIMULATION Salim Farah 1 and Magdy Bayoumi 2 The Center for Advanced Computer Studies, University of Louisiana at Lafayette, USA 1 snf3346@cacs.louisiana.edu

More information

Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism

Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism Sangpil Lee and Won Woo Ro School of Electrical and Electronic Engineering Yonsei University Seoul, Republic of

More information

Aging-Aware Instruction Cache Design by Duty Cycle Balancing

Aging-Aware Instruction Cache Design by Duty Cycle Balancing 2012 IEEE Computer Society Annual Symposium on VLSI Aging-Aware Instruction Cache Design by Duty Cycle Balancing TaoJinandShuaiWang State Key Laboratory of Novel Software Technology Department of Computer

More information

SPECTRA: A Framework for Thermal Reliability Management in Silicon-Photonic Networks-on-Chip

SPECTRA: A Framework for Thermal Reliability Management in Silicon-Photonic Networks-on-Chip 2016 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems SPECTRA: A Framework for Thermal Reliability Management in Silicon-Photonic Networks-on-Chip

More information

System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching Regulators

System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching Regulators System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching s Wonyoung Kim, Meeta S. Gupta, Gu-Yeon Wei and David Brooks School of Engineering and Applied Sciences, Harvard University, 33 Oxford

More information

Design Challenges in Multi-GHz Microprocessors

Design Challenges in Multi-GHz Microprocessors Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor Development www.compaq.com Introduction Moore s Law ( Law (the trend that the demand for IC functions and the

More information

LOW POWER VLSI TECHNIQUES FOR PORTABLE DEVICES Sandeep Singh 1, Neeraj Gupta 2, Rashmi Gupta 2

LOW POWER VLSI TECHNIQUES FOR PORTABLE DEVICES Sandeep Singh 1, Neeraj Gupta 2, Rashmi Gupta 2 LOW POWER VLSI TECHNIQUES FOR PORTABLE DEVICES Sandeep Singh 1, Neeraj Gupta 2, Rashmi Gupta 2 1 M.Tech Student, Amity School of Engineering & Technology, India 2 Assistant Professor, Amity School of Engineering

More information

CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors

CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors Asit K. Mishra, Shekhar Srikantaiah, Mahmut Kandemir, and Chita R. Das Dept. of Computer Science and Engg., The Pennsylvania State University,

More information

VOLTAGE NOISE IN PRODUCTION PROCESSORS

VOLTAGE NOISE IN PRODUCTION PROCESSORS ... VOLTAGE NOISE IN PRODUCTION PROCESSORS... VOLTAGE VARIATIONS ARE A MAJOR CHALLENGE IN PROCESSOR DESIGN. HERE, RESEARCHERS CHARACTERIZE THE VOLTAGE NOISE CHARACTERISTICS OF PROGRAMS AS THEY RUN TO COMPLETION

More information

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

Design A Redundant Binary Multiplier Using Dual Logic Level Technique Design A Redundant Binary Multiplier Using Dual Logic Level Technique Sreenivasa Rao Assistant Professor, Department of ECE, Santhiram Engineering College, Nandyala, A.P. Jayanthi M.Tech Scholar in VLSI,

More information

A High-Speed Variation-Tolerant Interconnect Technique for Sub-Threshold Circuits Using Capacitive Boosting

A High-Speed Variation-Tolerant Interconnect Technique for Sub-Threshold Circuits Using Capacitive Boosting A High-Speed Variation-Tolerant Interconnect Technique for Sub-Threshold Circuits Using Capacitive Boosting Jonggab Kil Intel Corporation 1900 Prairie City Road Folsom, CA 95630 +1-916-356-9968 jonggab.kil@intel.com

More information

HetCore: TFET-CMOS Hetero-Device Architecture for CPUs and GPUs

HetCore: TFET-CMOS Hetero-Device Architecture for CPUs and GPUs HetCore: -CMOS Hetero-Device Architecture for CPUs and GPUs Bhargava Gopireddy, Dimitrios Skarlatos, Wenjuan Zhu, and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

Design of High Performance Arithmetic and Logic Circuits in DSM Technology

Design of High Performance Arithmetic and Logic Circuits in DSM Technology Design of High Performance Arithmetic and Logic Circuits in DSM Technology Salendra.Govindarajulu 1, Dr.T.Jayachandra Prasad 2, N.Ramanjaneyulu 3 1 Associate Professor, ECE, RGMCET, Nandyal, JNTU, A.P.Email:

More information

Temperature Control of High-Performance Multi-core Platforms Using Convex Optimization

Temperature Control of High-Performance Multi-core Platforms Using Convex Optimization Temperature Control of High-Performance Multi-core Platforms Using Convex Optimization Srinivasan Murali, Almir Mutapcic, David Atienza +, Rajesh Gupta, Stephen Boyd, Luca Benini and Giovanni De Micheli

More information

Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package

Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package Mario D Marino and Kevin Skadron Dept of Computer Science, University of Virginia

More information

Design of Low Power Vlsi Circuits Using Cascode Logic Style

Design of Low Power Vlsi Circuits Using Cascode Logic Style Design of Low Power Vlsi Circuits Using Cascode Logic Style Revathi Loganathan 1, Deepika.P 2, Department of EST, 1 -Velalar College of Enginering & Technology, 2- Nandha Engineering College,Erode,Tamilnadu,India

More information

Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique

Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique Ch. Mohammad Arif 1, J. Syamuel John 2 M. Tech student, Department of Electronics Engineering, VR Siddhartha Engineering College,

More information

Feasibility Tests for Visible Light Communication Scheme with Various LEDs

Feasibility Tests for Visible Light Communication Scheme with Various LEDs Feasibility Tests for Visible Light Communication Scheme with Various LEDs Dongsung Kim, Hoyeon Jung, Chungjo Yu, Dongjun Seo, Biao Zhou, Youngok Kim Department of Electronics Engineering, Kwangwoon University,

More information