SCALCORE: DESIGNING A CORE
|
|
- Stephany Richardson
- 5 years ago
- Views:
Transcription
1 SCALCORE: DESIGNING A CORE FOR VOLTAGE SCALABILITY Bhargava Gopireddy, Choungki Song, Josep Torrellas, Nam Sung Kim, Aditya Agrawal, Asit Mishra University of Illinois, University of Wisconsin, Nvidia, Intel
2 Motivation 2 Application characteristics vary dynamically Goal: Single core design that attains High performance at nominal voltage (~0.9V) High energy efficiency at low voltage (~0.5V) A Voltage-Scalable Core
3 Observations 3 SRAM vs Logic delay scaling Small increase in V à large improvement in delay Normalized Delay SRAMDelay LogicDelay V min V dd (V) ~2x
4 ScalCore Idea 4 Design a Voltage-Scalable core based on the two observations SRAM-Freq Logic-Freq f nom ~ 3500 Frequency (MHz) 1000 f op ~ 1200 f min ~ 900 f logic ~ 600 V nom V op V min V logic V dd (V)
5 ScalCore: A Voltage-Scalable Core 5 Decouple V dd of logic and storage structures in the pipeline Improve energy efficiency Raise V dd of storage structures a little Reconfigure the pipeline to take advantage of faster storage structures Further improve performance and reduce leakage energy!! Evaluated the proposed design under various scenarios Reduce execution time by 31%, energy by 48% and ED by 60% over the state of the art Truly Voltage Scalable Core!!
6 ScalCore Design 6 Goal: Maximize the energy-efficiency at low-voltage (EEMode) Constraint: No impact on performance or energy at nominal voltage (HPMode) Approach: In EEMode, Provide different V dd s for storage and logic stages n Storage stage ~2x faster than logic stage Reconfigure pipeline in one of the two ways n Fuse storage stages in the pipeline (e.g. Accessing Register File) n Increase storage structure sizes
7 Fusing Two Pipeline Stages into One 7 Logic Stage 1 Storage Stage 2a Storage Stage 2b Logic Stage 3 HPMode V nom V nom V nom V nom CLK
8 Fusing Two Pipeline Stages into One 8 Logic Stage 1 Storage Stage 2a Storage Stage 2b Logic Stage 3 HPMode CLK V nom V nom V nom V nom 0 Enable Flow-through Logic Stage 1 2a 2b Logic Stage 3 EEMode CLK V logic V op Enable 1 Flow-through V op V logic
9 Increasing Size of Structures 9 Original Decoder HPMode Decoder EEMode Decoder Array 0 V nom Array 0 V nom Array 1 Disabled Array 0 V op Array 1 V op Sense Amp Sense Amp 1 Sense Amp 0 Sense Amp 0 Sense Amp 1 Data Select Data Select Data Select
10 ScalCore Pipeline 10 Fetch Decode Rename Dispatch Wakeup Select Data Read Source Drive Ex Ex Write back Main storage structures L1-I Branch Pred. Decode Allocate Register File Register File EX EX EX LSU L1-D Next PC ROB/ Completion Table
11 Pipeline Structure Changes 11 Fuse Stages Increase Size Register File Array access + Source drive 1.5X PRF Allocation Rename + Dispatch --- Load Store Unit Addr. generation + Memory disambiguation 1.5X Load/Store Queue and Store buffer Reorder Buffer X ROB Branch Prediction
12 Controlpath Changes 12 Programmable counters for variable latencies Execution schedules Wakeup logic Programmable counters for variable sizes List of free registers ROB, Load/Store queue sizes
13 Circuit Issues 13 HPMode V op V nom EEMode V logic Logic Stage 1 w/ Level Conv. Storage Stage 2a Storage Stage 2b Logic Stage 3 F.Ishihara, F.Sheikh, and B.Nikolic, Level Conversion for Dual-Supply Systems, IEEE Transactions on VLSI Systems, February 2004
14 Circuit Issues 14 HPMode V op V nom EEMode V logic Logic Stage 1 w/ Level Conv. Storage Stage 2a Storage Stage 2b Logic Stage 3 d ck q (inv) clk ck V logic V op F.Ishihara, F.Sheikh, and B.Nikolic, Level Conversion for Dual-Supply Systems, IEEE Transactions on VLSI Systems, February 2004
15 Evaluation Methodology OOO cores 64K I-L1/D-L1, 1MB L2 Voltages and frequencies HPMode n V nom = 0.90 V, f=3.5ghz f nom ~ 3500 EEMode n V logic = 0.50 V, f=600mhz n V op = 0.65 V, f=600mhz Frequency (MHz) 1000 f op ~ 1200 f min ~ 900 f logic ~ 600 n V min = 0.6V, f=900mhz V nom V op V min V logic V dd (V)
16 Configurations 16 Baseline: HPRef: Conventional HP (3.5GHz,0.9V) DVFS: Most aggressive (900MHz,0.6V) ScalCore: HPMode: HPRef with penalty (3.3GHz,0.9V) EEMode (600MHz,0.5V,0.65V) n Pipe2Vdd: Separate voltages only n SC: Fuse the two stages of RF, Allocate ; Increase size ROB, LSQ structures
17 Iso-Power Comparison for Parallel Programs Average Execution Time Average Energy-Delay % 60% 15% % Reduced execution time by 31%; ED by 60% compared to DVFS
18 Dynamic Workloads % 28% 19% 22% Execution Time Energy Energy-Delay Product 46% 15% Dyn-Baseline Static-16-SC Dyn-SC Dyn-SC reduces execution time by 31%, energy by 22% and ED by 46% compared to conventional DVFS
19 Also in the Paper 19 ScalCore complexities and overheads Comparison with Intel Claremont Impact on different classes of applications More results Unconstrained power budget Intermediate design points of ScalCore
20 Conclusion 20 Presented ScalCore, a core designed for voltage scalability Designed a voltage-scalable core by Decoupling V dd of logic and storage structures in the pipeline Raising V dd of storage structures a little Reconfiguring the pipeline to take advantage of faster storage structures and improved performance, energy
21 SCALCORE: DESIGNING A CORE FOR VOLTAGE SCALABILITY Bhargava Gopireddy, Choungki Song, Josep Torrellas, Nam Sung Kim, Aditya Agrawal, Asit Mishra University of Illinois, University of Wisconsin, Nvidia, Intel
22 Why not Big/Little? 22 Heterogeneous cores on a chip ideally suited for different tasks Fixed partitioning of cores A fraction of chip unused Migration overhead
23 Configurations 23 Baseline: HPRef: Conventional HP (3.5GHz,0.9V) DVFS: Most aggressive (900MHz,0.6V) ScalCore: HPMode: HPRef with penalty (3.3GHz,0.9V) EEMode (600MHz,0.5V,0.65V) n Pipe2Vdd: Separate voltages only n SCspeed: Fuse the two stages of RF, LSU, Allocate n SC: Fuse the two stages of RF, Allocate Increase size ROB, LSQ structures
Project 5: Optimizer Jason Ansel
Project 5: Optimizer Jason Ansel Overview Project guidelines Benchmarking Library OoO CPUs Project Guidelines Use optimizations from lectures as your arsenal If you decide to implement one, look at Whale
More informationArchitectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance
Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance Michael D. Powell, Arijit Biswas, Shantanu Gupta, and Shubu Mukherjee SPEARS Group, Intel Massachusetts EECS, University
More informationDynamic Scheduling II
so far: dynamic scheduling (out-of-order execution) Scoreboard omasulo s algorithm register renaming: removing artificial dependences (WAR/WAW) now: out-of-order execution + precise state advanced topic:
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Out-of-Order Schedulers Data-Capture Scheduler Dispatch: read available operands from ARF/ROB, store in scheduler Commit: Missing operands filled in from bypass Issue: When
More informationMultiple Clock and Voltage Domains for Chip Multi Processors
Multiple Clock and Voltage Domains for Chip Multi Processors Efraim Rotem- Intel Corporation Israel Avi Mendelson- Microsoft R&D Israel Ran Ginosar- Technion Israel institute of Technology Uri Weiser-
More informationOn the Rules of Low-Power Design
On the Rules of Low-Power Design (and Why You Should Break Them) Prof. Todd Austin University of Michigan austin@umich.edu A long time ago, in a not so far away place The Rules of Low-Power Design P =
More informationRISC Central Processing Unit
RISC Central Processing Unit Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Spring, 2014 ldvan@cs.nctu.edu.tw http://www.cs.nctu.edu.tw/~ldvan/
More informationExploring Heterogeneity within a Core for Improved Power Efficiency
Computer Engineering Exploring Heterogeneity within a Core for Improved Power Efficiency Sudarshan Srinivasan Nithesh Kurella Israel Koren Sandip Kundu May 2, 215 CE Tech Report # 6 Available at http://www.eng.biu.ac.il/segalla/computer-engineering-tech-reports/
More information7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10 CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan CSE-2021 July-12-2012 2 Single Cycle with Jump Multi-Cycle Implementation
More informationClock-Powered CMOS: A Hybrid Adiabatic Logic Style for Energy-Efficient Computing
Clock-Powered CMOS: A Hybrid Adiabatic Logic Style for Energy-Efficient Computing Nestoras Tzartzanis and Bill Athas nestoras@isiedu, athas@isiedu http://wwwisiedu/acmos Information Sciences Institute
More informationDynamic Scheduling I
basic pipeline started with single, in-order issue, single-cycle operations have extended this basic pipeline with multi-cycle operations multiple issue (superscalar) now: dynamic scheduling (out-of-order
More informationLow-Power Digital CMOS Design: A Survey
Low-Power Digital CMOS Design: A Survey Krister Landernäs June 4, 2005 Department of Computer Science and Electronics, Mälardalen University Abstract The aim of this document is to provide the reader with
More informationChapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:
Chapter 4 The Processor Part II Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup p = 2n/(0.5n + 1.5) 4 =
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Out-of-Order Execution and Register Rename In Search of Parallelism rivial Parallelism is limited What is trivial parallelism? In-order: sequential instructions do not have
More informationDYNAMIC VOLTAGE FREQUENCY SCALING (DVFS) FOR MICROPROCESSORS POWER AND ENERGY REDUCTION
DYNAMIC VOLTAGE FREQUENCY SCALING (DVFS) FOR MICROPROCESSORS POWER AND ENERGY REDUCTION Diary R. Suleiman Muhammed A. Ibrahim Ibrahim I. Hamarash e-mail: diariy@engineer.com e-mail: ibrahimm@itu.edu.tr
More informationEECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3. EECS 427 F09 Lecture Reminders
EECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3 [Partly adapted from Irwin and Narayanan, and Nikolic] 1 Reminders CAD assignments Please submit CAD5 by tomorrow noon CAD6 is due
More informationEECS 427 Lecture 22: Low and Multiple-Vdd Design
EECS 427 Lecture 22: Low and Multiple-Vdd Design Reading: 11.7.1 EECS 427 W07 Lecture 22 1 Last Time Low power ALUs Glitch power Clock gating Bus recoding The low power design space Dynamic vs static EECS
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Out-of-Order Execution and Register Rename In Search of Parallelism rivial Parallelism is limited What is trivial parallelism? In-order: sequential instructions do not have
More informationRamon Canal NCD Master MIRI. NCD Master MIRI 1
Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/
More informationMitigating Parameter Variation with Dynamic Fine-Grain Body Biasing *
Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing * Radu Teodorescu, Jun Nakano, Abhishek Tiwari and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu
More informationEECS 470. Tomasulo s Algorithm. Lecture 4 Winter 2018
omasulo s Algorithm Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, yson, Vijaykumar, and Wenisch of Carnegie Mellon University,
More informationU. Wisconsin CS/ECE 752 Advanced Computer Architecture I
U. Wisconsin CS/ECE 752 Advanced Computer Architecture I Prof. Karu Sankaralingam Unit 5: Dynamic Scheduling I Slides developed by Amir Roth of University of Pennsylvania with sources that included University
More informationEECS 470. Lecture 9. MIPS R10000 Case Study. Fall 2018 Jon Beaumont
MIPS R10000 Case Study Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Multiprocessor SGI Origin Using MIPS R10K Many thanks to Prof. Martin and Roth of University of Pennsylvania for
More informationLow-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering
Low-Power VLSI Seong-Ook Jung 2013. 5. 27. sjung@yonsei.ac.kr VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering Contents 1. Introduction 2. Power classification & Power performance
More informationOptimization of Overdrive Signoff
Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath VLSI CAD LABORATORY, UC San Diego UC San Diego / VLSI CAD Laboratory -1- Outline Motivation Design Cone
More informationIEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 10, OCTOBER
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 10, OCTOER 2013 1769 Enhancing the Efficiency of Energy-Constrained DVFS Designs Andrew. Kahng, Fellow, IEEE, Seokhyeong Kang,
More informationA GALS Many-Core Heterogeneous DSP Platform with Source-Synchronous On-Chip Interconnection Network
A GALS Many-Core Heterogeneous DSP Platform with Source-Synchronous On-Chip Interconnection Network Anh Tran, Dean Truong and Bevan Baas University of California, Davis NOCS 09 May 13, 009 Outline Motivation
More informationEECS 470 Lecture 8. P6 µarchitecture. Fall 2018 Jon Beaumont Core 2 Microarchitecture
P6 µarchitecture Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Core 2 Microarchitecture Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. Portions
More informationSupporting x86-64 Address Translation for 100s of GPU Lanes. Jason Power, Mark D. Hill, David A. Wood
Supporting x86-64 Address Translation for 100s of GPU s Jason Power, Mark D. Hill, David A. Wood Summary Challenges: CPU&GPUs physically integrated, but logically separate; This reduces theoretical bandwidth,
More informationEECS 470 Lecture 5. Intro to Dynamic Scheduling (Scoreboarding) Fall 2018 Jon Beaumont
Intro to Dynamic Scheduling (Scoreboarding) Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides.
More informationA Static Power Model for Architects
A Static Power Model for Architects J. Adam Butts and Guri Sohi University of Wisconsin-Madison {butts,sohi}@cs.wisc.edu 33rd International Symposium on Microarchitecture Monterey, California December,
More informationTopics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.
Topics Low Power Techniques Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J. Rabaey Review: Energy & Power Equations E = C L V 2 DD P 0 1 +
More informationNTE74S188 Integrated Circuit 256 Bit Open Collector PROM 16 Lead DIP Type Package
NTE74S188 Integrated Circuit 256 Bit Open Collector PROM 16 Lead DIP Type Package Description: The NTE74S188 Schottky PROM memory is organized in the popular 32 words by 8 bits configuration. A memory
More informationCS 110 Computer Architecture Lecture 11: Pipelining
CS 110 Computer Architecture Lecture 11: Pipelining Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on
More informationPower Modeling and Characterization of Computing Devices: A Survey. Contents
Foundations and Trends R in Electronic Design Automation Vol. 6, No. 2 (2012) 121 216 c 2012 S. Reda and A. N. Nowroz DOI: 10.1561/1000000022 Power Modeling and Characterization of Computing Devices: A
More informationMitigating Parameter Variation with Dynamic Fine-Grain Body Biasing
Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing Radu Teodorescu, Jun Nakano, Abhishek Tiwari and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu
More informationA Novel Latch design for Low Power Applications
A Novel Latch design for Low Power Applications Abhilasha Deptt. of Electronics and Communication Engg., FET-MITS Lakshmangarh, Rajasthan (India) K. G. Sharma Suresh Gyan Vihar University, Jagatpura, Jaipur,
More informationChapter 16 - Instruction-Level Parallelism and Superscalar Processors
Chapter 16 - Instruction-Level Parallelism and Superscalar Processors Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 16 - Superscalar Processors 1 / 78 Table of Contents I 1 Overview
More informationAsanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.
Pipeline Hazards Krste Asanovic Laboratory for Computer Science M.I.T. Pipelined DLX Datapath without interlocks and jumps 31 0x4 RegDst RegWrite inst Inst rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext A B OpSel
More informationMosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes
Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Rachata Ausavarungnirun Joshua Landgraf Vance Miller Saugata Ghose Jayneel Gandhi Christopher J. Rossbach Onur
More informationDesign and Implement of Low Power Consumption SRAM Based on Single Port Sense Amplifier in 65 nm
Journal of Computer and Communications, 2015, 3, 164-168 Published Online November 2015 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2015.311026 Design and Implement of Low
More informationPipelined Processor Design
Pipelined Processor Design COE 38 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Pipelining versus Serial
More informationEnergy-Recovery CMOS Design
Energy-Recovery CMOS Design Jay Moon, Bill Athas * Univ of Southern California * Apple Computer, Inc. jsmoon@usc.edu / athas@apple.com March 05, 2001 UCLA EE215B jsmoon@usc.edu / athas@apple.com 1 Outline
More informationInstruction Level Parallelism III: Dynamic Scheduling
Instruction Level Parallelism III: Dynamic Scheduling Reading: Appendix A (A-67) H&P Chapter 2 Instruction Level Parallelism III: Dynamic Scheduling 1 his Unit: Dynamic Scheduling Application OS Compiler
More informationLow Power Techniques for SoC Design: basic concepts and techniques
Low Power Techniques for SoC Design: basic concepts and techniques Estagiário de Docência M.Sc. Vinícius dos Santos Livramento Prof. Dr. Luiz Cláudio Villar dos Santos Embedded Systems - INE 5439 Federal
More informationComputer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks
Advanced Computer Architecture Spring 2010 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture Outline Instruction-Level Parallelism Scoreboarding (A.8) Instruction Level Parallelism
More informationHetCore: TFET-CMOS Hetero-Device Architecture for CPUs and GPUs
HetCore: -CMOS Hetero-Device Architecture for CPUs and GPUs Bhargava Gopireddy, Dimitrios Skarlatos, Wenjuan Zhu, and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu
More informationPROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs
PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs Li Zhou and Avinash Kodi Technologies for Emerging Computer Architecture Laboratory (TEAL) School of Electrical Engineering and
More informationLecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)
Lecture Topics Today: Pipelined Processors (P&H 4.5-4.10) Next: continued 1 Announcements Milestone #4 (due 2/23) Milestone #5 (due 3/2) 2 1 ISA Implementations Three different strategies: single-cycle
More informationSOFTWARE IMPLEMENTATION OF THE
SOFTWARE IMPLEMENTATION OF THE IEEE 802.11A/P PHYSICAL LAYER SDR`12 WInnComm Europe 27 29 June, 2012 Brussels, Belgium T. Cupaiuolo, D. Lo Iacono, M. Siti and M. Odoni Advanced System Technologies STMicroelectronics,
More informationEE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling
EE241 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolic Lecture 15 Low-Power Design: Supply Voltage Scaling Announcements Homework #2 due today Midterm project reports due next Thursday
More informationProbabilistic and Variation- Tolerant Design: Key to Continued Moore's Law. Tanay Karnik, Shekhar Borkar, Vivek De Circuit Research, Intel Labs
Probabilistic and Variation- Tolerant Design: Key to Continued Moore's Law Tanay Karnik, Shekhar Borkar, Vivek De Circuit Research, Intel Labs 1 Outline Variations Process, supply voltage, and temperature
More informationCombined Circuit and Microarchitecture Techniques for Effective Soft Error Robustness in SMT Processors
Combined Circuit and Microarchitecture Techniques for Effective Soft Error Robustness in SMT Processors Xin Fu, Tao Li and José Fortes Department of ECE, University of Florida xinfu@ufl.edu, taoli@ece.ufl.edu,
More informationOut-of-Order Execution. Register Renaming. Nima Honarmand
Out-of-Order Execution & Register Renaming Nima Honarmand Out-of-Order (OOO) Execution (1) Essence of OOO execution is Dynamic Scheduling Dynamic scheduling: processor hardware determines instruction execution
More informationStatic Energy Reduction Techniques in Microprocessor Caches
Static Energy Reduction Techniques in Microprocessor Caches Heather Hanson, Stephen W. Keckler, Doug Burger Computer Architecture and Technology Laboratory Department of Computer Sciences Tech Report TR2001-18
More informationSRAM SYSTEM DESIGN FOR MEMORY BASED COMPUTING
SRAM SYSTEM DESIGN FOR MEMORY BASED COMPUTING A Thesis Presented to The Academic Faculty by Muneeb Zia In Partial Fulfillment of the Requirements for the Degree Masters in the School of Electrical and
More informationLow Power VLSI Circuit Synthesis: Introduction and Course Outline
Low Power VLSI Circuit Synthesis: Introduction and Course Outline Ajit Pal Professor Department of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA -721302 Agenda Why Low
More informationRANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM
RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, Shaojun Wei Institute of Microelectronics Tsinghua University The 45th International
More informationCS521 CSE IITG 11/23/2012
Parallel Decoding and issue Parallel execution Preserving the sequential consistency of execution and exception processing 1 slide 2 Decode/issue data Issue bound fetch Dispatch bound fetch RS RS RS RS
More informationPower Optimization of FPGA Interconnect Via Circuit and CAD Techniques
Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Safeen Huda and Jason Anderson International Symposium on Physical Design Santa Rosa, CA, April 6, 2016 1 Motivation FPGA power increasingly
More informationECOM 4311 Digital System Design using VHDL. Chapter 9 Sequential Circuit Design: Practice
ECOM 4311 Digital System Design using VHDL Chapter 9 Sequential Circuit Design: Practice Outline 1. Poor design practice and remedy 2. More counters 3. Register as fast temporary storage 4. Pipelined circuit
More informationOverview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture
Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of
More informationInstruction Level Parallelism. Data Dependence Static Scheduling
Instruction Level Parallelism Data Dependence Static Scheduling Basic Block A straight line code sequence with no branches in except to the entry and no branches out except at the exit Loop: L.D ADD.D
More informationAnnouncements. Advanced Digital Integrated Circuits. Midterm feedback mailed back Homework #3 posted over the break due April 8
EE241 - Spring 21 Advanced Digital Integrated Circuits Lecture 18: Dynamic Voltage Scaling Announcements Midterm feedback mailed back Homework #3 posted over the break due April 8 Reading: Chapter 5, 6,
More informationCMP 301B Computer Architecture. Appendix C
CMP 301B Computer Architecture Appendix C Dealing with Exceptions What should be done when an exception arises and many instructions are in the pipeline??!! Force a trap instruction in the next IF stage
More informationMETHODS FOR TRUE ENERGY- PERFORMANCE OPTIMIZATION. Naga Harika Chinta
METHODS FOR TRUE ENERGY- PERFORMANCE OPTIMIZATION Naga Harika Chinta OVERVIEW Introduction Optimization Methods A. Gate size B. Supply voltage C. Threshold voltage Circuit level optimization A. Technology
More informationAuto refresh and self refresh refresh cycles / 64ms. Part No. Clock Frequency Power Organization Interface Package. Normal. 4Banks x 1Mbits x16
4 Banks x 1M x 16Bit Synchronous DRAM DESCRIPTION The Hynix HY57V641620HG is a 67,108,864-bit CMOS Synchronous DRAM, ideally suited for the main memory applications which require large memory density and
More informationSIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS 1 T.Thomas Leonid, 2 M.Mary Grace Neela, and 3 Jose Anand
More informationReference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering
FPGA Fabrics Reference Wayne Wolf, FPGA-Based System Design Pearson Education, 2004 CPLD / FPGA CPLD Interconnection of several PLD blocks with Programmable interconnect on a single chip Logic blocks executes
More informationBooster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips
Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips Timothy N. Miller, Xiang Pan, Renji Thomas, Naser Sedaghati, Radu Teodorescu
More informationEnergy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control
Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control Guangyi Cao and Arun Ravindran Department of Electrical and Computer Engineering University of North Carolina at Charlotte
More informationIssue. Execute. Finish
Specula1on & Precise Interrupts Fall 2017 Prof. Ron Dreslinski h6p://www.eecs.umich.edu/courses/eecs470 In Order Out of Order In Order Issue Execute Finish Fetch Decode Dispatch Complete Retire Instruction/Decode
More informationA VCO-Based ADC Employing a Multi- Phase Noise-Shaping Beat Frequency Quantizer for Direct Sampling of Sub-1mV Input Signals
A VCO-Based ADC Employing a Multi- Phase Noise-Shaping Beat Frequency Quantizer for Direct Sampling of Sub-1mV Input Signals Bongjin Kim, Somnath Kundu, Seokkyun Ko and Chris H. Kim University of Minnesota,
More informationLow Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS
Low Power Design Part I Introduction and VHDL design Ricardo Santos ricardo@facom.ufms.br LSCAD/FACOM/UFMS Motivation for Low Power Design Low power design is important from three different reasons Device
More informationTomasulo s Algorithm. Tomasulo s Algorithm
Tomasulo s Algorithm Load and store buffers Contain data and addresses, act like reservation stations Branch Prediction Top-level design: 56 Tomasulo s Algorithm Three Steps: Issue Get next instruction
More informationDecoupling Capacitance
Decoupling Capacitance Nitin Bhardwaj ECE492 Department of Electrical and Computer Engineering Agenda Background On-Chip Algorithms for decap sizing and placement Based on noise estimation Decap modeling
More informationIJMIE Volume 2, Issue 3 ISSN:
IJMIE Volume 2, Issue 3 ISSN: 2249-0558 VLSI DESIGN OF LOW POWER HIGH SPEED DOMINO LOGIC Ms. Rakhi R. Agrawal* Dr. S. A. Ladhake** Abstract: Simple to implement, low cost designs in CMOS Domino logic are
More informationComputer Hardware. Pipeline
Computer Hardware Pipeline Conventional Datapath 2.4 ns is required to perform a single operation (i.e. 416.7 MHz). Register file MUX B 0.6 ns Clock 0.6 ns 0.2 ns Function unit 0.8 ns MUX D 0.2 ns c. Production
More informationFinal Report: DBmbench
18-741 Final Report: DBmbench Yan Ke (yke@cs.cmu.edu) Justin Weisz (jweisz@cs.cmu.edu) Dec. 8, 2006 1 Introduction Conventional database benchmarks, such as the TPC-C and TPC-H, are extremely computationally
More informationReducing Transistor Variability For High Performance Low Power Chips
Reducing Transistor Variability For High Performance Low Power Chips HOT Chips 24 Dr Robert Rogenmoser Senior Vice President Product Development & Engineering 1 HotChips 2012 Copyright 2011 SuVolta, Inc.
More informationServer Operational Cost Optimization for Cloud Computing Service Providers over
Server Operational Cost Optimization for Cloud Computing Service Providers over a Time Horizon Haiyang(Ocean)Qian and Deep Medhi Networking and Telecommunication Research Lab (NeTReL) University of Missouri-Kansas
More informationLeakage Power Minimization in Deep-Submicron CMOS circuits
Outline Leakage Power Minimization in Deep-Submicron circuits Politecnico di Torino Dip. di Automatica e Informatica 1019 Torino, Italy enrico.macii@polito.it Introduction. Design for low leakage: Basics.
More informationImproving GPU Performance via Large Warps and Two-Level Warp Scheduling
Improving GPU Performance via Large Warps and Two-Level Warp Scheduling Veynu Narasiman The University of Texas at Austin Michael Shebanow NVIDIA Chang Joo Lee Intel Rustam Miftakhutdinov The University
More informationOn the Off-chip Memory Latency of Real-Time Systems: Is DDR DRAM Really the Best Option? Mohamed Hassan
On the Off-chip Memory Latency of eal-time Systems: Is DD DAM eally the Best Option? Mohamed Hassan Motivation 2 PEDICTABILITY DAMs 3 LDAM 4 esults 5 Outline Historically, SAMs have been the option for
More informationTomasolu s s Algorithm
omasolu s s Algorithm Fall 2007 Prof. homas Wenisch http://www.eecs.umich.edu/courses/eecs4 70 Floating Point Buffers (FLB) ag ag ag Storage Bus Floating Point 4 3 Buffers FLB 6 5 5 4 Control 2 1 1 Result
More informationRevision History Revision 0.0 (October, 2003) Target spec release Revision 1.0 (November, 2003) Revision 1.0 spec release Revision 1.1 (December, 2003
16Mb H-die SDRAM Specification 50 TSOP-II with Pb-Free (RoHS compliant) Revision 1.4 August 2004 Samsung Electronics reserves the right to change products or specification without notice. Revision History
More informationTrace Based Switching For A Tightly Coupled Heterogeneous Core
Trace Based Switching For A Tightly Coupled Heterogeneous Core Shru% Padmanabha, Andrew Lukefahr, Reetuparna Das, Sco@ Mahlke Micro- 46 December 2013 University of Michigan Electrical Engineering and Computer
More informationMinimum Energy CMOS Design with Dual Subthreshold Supply and Multiple Logic-Level Gates
Minimum Energy CMOS Design with Dual Subthreshold Supply and Multiple Logic-Level Gates Kyungseok Kim and Vishwani D. Agrawal Department of ECE, Auburn University, Auburn, AL 36849, USA kyungkim@auburn.edu,
More informationTechnology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.
FPGAs 1 CMPE 415 Technology Timeline 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs FPGAs The Design Warrior s Guide
More informationRevision No. History Draft Date Remark. 0.1 Initial Draft Jan Preliminary. 1.0 Final Version Apr. 2007
64Mb Synchronous DRAM based on 1M x 4Bank x16 I/O Document Title 4Bank x 1M x 16bits Synchronous DRAM Revision History Revision No. History Draft Date Remark 0.1 Initial Draft Jan. 2007 Preliminary 1.0
More informationLow-Power CMOS VLSI Design
Low-Power CMOS VLSI Design ( 范倫達 ), Ph. D. Department of Computer Science, National Chiao Tung University, Taiwan, R.O.C. Fall, 2017 ldvan@cs.nctu.edu.tw http://www.cs.nctu.tw/~ldvan/ Outline Introduction
More informationHY5V56D(L/S)FP. Revision History. No. History Draft Date Remark. 0.1 Defined Target Spec. May Rev. 0.1 / Jan
Revision History No. History Draft Date Remark 0.1 Defined Target Spec. May 2003 Rev. 0.1 / Jan. 2005 1 Series 4 Banks x 4M x 16bits Synchronous DRAM DESCRIPTION The HY5V56D(L/S)FP is a 268,435,456bit
More informationA Reconfigurable Source-Synchronous On-Chip Network for GALS Many-Core Platforms
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 6, JUNE 200 897 A Reconfigurable Source-Synchronous On-Chip Network for GALS Many- Platforms Anh T. Tran, Dean
More informationRevision No. History Draft Date Remark. 1.0 First Version Release Dec Corrected PIN ASSIGNMENT A12 to NC Jan. 2005
128Mb Synchronous DRAM based on 2M x 4Bank x16 I/O Document Title 4Bank x 2M x 16bits Synchronous DRAM Revision History Revision No. History Draft Date Remark 1.0 First Version Release Dec. 2004 1.1 1.
More informationA B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time
Pipelining Readings: 4.5-4.8 Example: Doing the laundry A B C D Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes Folder takes
More informationA Employing Circadian Rhythms to Enhance Power and Reliability
A Employing Circadian Rhythms to Enhance Power and Reliability Saket Gupta, Broadcom Corporation Sachin S. Sapatnekar, University of Minnesota, Twin Cities This paper presents a novel scheme for saving
More informationPipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold
Pipelining Readings: 4.5-4.8 Example: Doing the laundry Ann, Brian, Cathy, & Dave A B C D each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes Folder takes
More informationAuto refresh and self refresh refresh cycles / 64ms. Part No. Clock Frequency Power Organization Interface Package. Normal. 4Banks x 2Mbits x16
4 Banks x 2M x 16bits Synchronous DRAM DESCRIPTION The Hynix HY57V281620A is a 134,217,728bit CMOS Synchronous DRAM, ideally suited for the Mobile applications which require low power consumption and extended
More informationCMOS Process Variations: A Critical Operation Point Hypothesis
CMOS Process Variations: A Critical Operation Point Hypothesis Janak H. Patel Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign jhpatel@uiuc.edu Computer Systems
More informationRevision No. History Draft Date Remark. 0.1 Initial Draft Jul Preliminary. 1.0 Release Aug. 2009
128Mb Synchronous DRAM based on 2M x 4Bank x16 I/O Document Title 4Bank x 2M x 16bits Synchronous DRAM Revision History Revision No. History Draft Date Remark 0.1 Initial Draft Jul. 2009 Preliminary 1.0
More information