Computer Architecture A Quantitative Approach

Similar documents
COMPUne ARCHIIECIUR -'^'X/'^y

Mobile Broadband Multimedia Networks

Signals and Systems Using MATLAB

Power Systems Modelling and Fault Analysis

Digital Signal Processing System Design: LabVIEW-Based Hybrid Programming

Wireless Communications Over Rapidly Time-Varying Channels

The Complete Guide to Game Audio

Software Systems Architecture

Embedded Systems and Software Validation

EN164: Design of Computing Systems Lecture 22: Processor / ILP 3

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

CIRCUITS, SYSTEMS, AND SIGNALS FOR BIOENGINEERS: A MATLAB-BASED INTRODUCTION

DIGITAL HERITAGE APPLYING DIGITAL IMAGING TO CULTURAL HERITAGE

Final Report: DBmbench

Qäf) Newnes f-s^j^s. Digital Signal Processing. A Practical Guide for Engineers and Scientists. by Steven W. Smith

MULTISCALAR PROCESSORS

SATSim: A Superscalar Architecture Trace Simulator Using Interactive Animation

Department Computer Science and Engineering IIT Kanpur

SPIE. Lens Design Fundamentals PRESS. Second Edition RUDOLF KINGSLAKE R. BARRY JOHNSON

CIRCUITS. Raj Nair Donald Bennett PRENTICE HALL

Sensors for Mechatronics

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona

Developing Virtual Reality Applications

OFDM for Optical Communications

Revised Curriculum for Bachelor of Computer Science & Engineering, 2011

INSTRUMENTATION AND CONTROL SYSTEMS SECOND EDITION

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp

On-chip Networks in Multi-core era

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Labs CDT 102

Early Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida

Technology Entrepreneurship Creating, Capturing, and Protecting Value

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

Instruction Scheduling for Low Power Dissipation in High Performance Microprocessors

Multivariate Permutation Tests: With Applications in Biostatistics

Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

Architecture ISCA 16 Luis Ceze, Tom Wenisch

Constructive Computer Architecture

Computer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

Principles of Measurement Systems

EE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004

CSE 305: Computer Architecture

EMBEDDED SYSTEM DESIGN

Small Signal Audio Design

Outline Simulators and such. What defines a simulator? What about emulation?

Concepts of Parallelism In An Introductory Computer Architecture Courses With FPGA Laboratories

CS Computer Architecture Spring Lecture 04: Understanding Performance

Instruction Level Parallelism Part II - Scoreboard

Recent Advances in Simulation Techniques and Tools

SOFTWARE IMPLEMENTATION OF THE

CMP 301B Computer Architecture. Appendix C

Design of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi

Project 5: Optimizer Jason Ansel

EFFICIENT IMPLEMENTATIONS OF OPERATIONS ON RUNLENGTH-REPRESENTED IMAGES

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

Research Statement. Sorin Cotofana

Second Workshop on Pioneering Processor Paradigms (WP 3 )

CRAFTING SHORT SCREENPLAYS THAT CONNECT

Sound Systems: Design and Optimization

Digital Control of Dynamic Systems

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Regional Innovation Ecosystems:

Data Word Length Reduction for Low-Power DSP Software

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads

Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs

Power Issues with Embedded Systems. Rabi Mahapatra Computer Science

CMOS Process Variations: A Critical Operation Point Hypothesis

BIOMEDICAL DIGITAL SIGNAL PROCESSING

Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen

Principles of Modern Radar

Pervasive Games Theory and Design

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

ANTENNA THEORY part 2

5th Workshop on Runtime and Operating Systems for the Many-core Era (ROME 2017)

Chess Skill in Man and Machine

7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

CSE 2021: Computer Organization

Compiler Optimisation

Basics of INTERFEROMETRY

Dynamic Scheduling I

Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control

Author: Yih-Yih Lin. Correspondence: Yih-Yih Lin Hewlett-Packard Company MR Forest Street Marlboro, MA USA

CSE502: Computer Architecture CSE 502: Computer Architecture

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

Measurement and Instrumentation

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

PREPARING SYBIL IHRIG EMIL IHRIG

Preface... Acknowledgments... Glossary Introduction Management Systems and the Management of Chemical Process Safety...

ELECTRIC CIRCUITS. Third Edition JOSEPH EDMINISTER MAHMOOD NAHVI

CSE502: Computer Architecture CSE 502: Computer Architecture

APPLICATION FOR APPROVAL OF A IENG EMPLOYER-MANAGED FURTHER LEARNING PROGRAMME

A Study on Comparator and Offset Calibration Techniques in High Speed Nyquist ADCs. Chi Hang Chan, Ivor

Pipelined Processor Design

COSC4201. Scoreboard

Autonomous and Autonomic Systems: With Applications to NASA Intelligent Spacecraft Operations and Exploration Systems

Geometric Measure Theory A Beginner s Guide Fourth Edition

Management. Industrial Safety and Heal. Sixth Edition. David W. Rieske. C. Ray Asfahl. University of Arkansas UNIVERSITATSB'.

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System

Materials and the Environment

Transcription:

Computer Architecture A Quantitative Approach Fourth Edition John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With Contributions by Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau Krste Asanovic Massachusetts Institute of Technology Robert P. Colwell R&E Colwell & Associates, Inc. Thomas M. Conte North Carolina State University Josä Duato Universitat Politecnica de Valencia and Simula Diana Franklin California Polytechnic State University, San Luis Obispo David Goldberg Xerox Palo Al to Research Center Wen-mei W. Hwu University of Illinois at Urbana Champaign Norman P. Jouppi HP Labs Timothy M. Pinkston University of Southem California John W. Sias University of Illinois at Urbana Champaign David A. Wood Amsterdam Boston Heidelberg London mi rl New York Oxford Paris San Diego ELSEVIER San Francisco Singapore Sydney Tokyo MORGAN KAUFMANN PUBLISHERS

Contents Foreword Preface Acknowledgments ix xv xxiii Chapter 1 Fundamentals of Computer Design 1.1 Introduction 2 1.2 Classes of Computers 4 1.3 Defining Computer Architecture 8 1.4 Trends in Technology 14 1.5 Trends in Power in lntegrated Circuits 17 1.6 Trends in Cost 19 1.7 Dependability 25 1.8 Measuring, Reporting, and Summarizing Performance 28 1.9 Quantitative Principles of Computer Design 37 1.10 Putting lt All Together: Performance and Price-Performance 44 1.11 Fallacies and Pitfalls 48 1.12 Concluding Remarks 52 1.13 Historical Perspectives and References 54 Case Studies with Exercises by Diana Franklin 55 Chapter 2 Instruction-Level Parallelism and Its Exploitation 2.1 Instruction-Level Parallelism:Concepts and Challenges 66 2.2 Basic Compiler Techniques for Exposing ILP 74 2.3 Reducing Branch Costs with Prediction 80 2.4 Overcoming Data Hazards with Dynamic Scheduling 89 2.5 Dynamic Scheduling: Examples and the Algorithm 97 2.6 Hardware-Based Speculation 104 2.7 Exploiting ILP Using Multiple Issue and Static Scheduling 114 xi

xii q Contents 2.8 Exploiting ILP Using Dynamic Scheduling, Multiple lssue, and Speculation 118 2.9 Advanced Techniques for Instruction Delivery and Speculation 121 2.10 Putting lt All Together:The Intel Pentium 4 131 2.11 Fallacies and Pitfalls 138 2.12 Concluding Remarks 140 2.13 Historical Perspective and References 141 Case Studies with Exercises by Robert P. Colwell 142 Chapter 3 Limits on Instruction-Level Parallelism 3.1 Introduction 154 3.2 Studies of the Limitations of ILP 154 3.3 Limitations on ILP for Realizable Processors 165 3.4 Crosscutting Issues: Hardware versus Software Speculation 170 3.5 Multithreading: Using ILP Support to Exploit Thread-Level Parallelism 172 3.6 Putting lt All Together: Performance and Efficiency in Advanced Multiple-Issue Processors 179 3.7 Fallacies and Pitfalls 183 3.8 Concluding Remarks 184 3.9 Historical Perspective and References 185 Case Study with Exercises by Wen-mei W. Hwu and John W. Sias 185 Chapter4 Multiprocessors and Thread-Level Parallelism 4.1 Introduction 196 4.2 Symmetric Shared-Memory Architectures 205 4.3 Performance of Symmetric Shared-Memory Multiprocessors 218 4.4 Distributed Shared Memory and Directory-Based Coherence 230 4.5 Synchronization:The Basics 237 4.6 Models of Memory Consistency: An Introduction 243 4.7 Crosscutting Issues 246 4.8 Putting lt All Together:The Sun T1 Multiprocessor 249 4.9 Fallacies and Pitfalls 257 4.10 Concluding Remarks 262 4.11 Historical Perspective and References 264 Case Studies with Exercises by David A.Wood 264 Chapter 5 Memory Hierarchy Design 5.1 Introduction 288 5.2 Eleven Advanced Optimizations of Cache Performance 293 5.3 Memory Technology and Optimizations 310

Contents n xiii 5.4 Protection:Virtual Memory and Virtual Machines 315 5.5 Crosscutting Issues: The Design of Memory Hierarchies 324 5.6 Putting lt All Together: AMD Opteron Memory Hierarchy 326 5.7 Fallacies and Pitfalls 335 5.8 Concluding Remarks 341 5.9 Historical Perspective and References 342 Case Studies with Exercises by Norman P.Jouppi 342 Chapter 6 Storage Systems 6.1 Introduction 358 6.2 Advanced Topics in Disk Storage 358 6.3 Definition and Examples of Real Faults and Failures 366 6.4 1/0 Performance, Reliability Measures, and Benchmarks 371 6.5 A Little Queuing Theory 379 6.6 Crosscutting Issues 390 6.7 Designing and Evaluating an 1/0 System-The Internet Archive Cluster 392 6.8 Putting lt All Together: NetApp FAS6000 Filer 397 6.9 Fallacies and Pitfalls 399 6.10 Concluding Remarks 403 6.11 Historical Perspective and References 404 Case Studies with Exercises by Andrea C.Arpaci-Dusseau and Remzi H. Arpaci-Dusseau 404 Appendix A Pipelining: Basic and Intermediate Concepts A.1 Introduction A-2 A.2 The Major Hurdle of Pipelining-Pipeline Hazards A-11 A.3 How Is Pipelining lmplemented? A-26 A.4 What Makes Pipelining Hard to Implement? A-37 A.5 Extending the MIPS Pipeline to Handle Multicycle Operations A-47 A.6 Putting lt All Together:The MIPS R4000 Pipeline A-56 A.7 Crosscutting Issues A-65 A.8 Fallacies and Pitfalls A-75 A.9 Concluding Remarks A-76 A.10 Historical Perspective and References A-77 Appendix B Instruction Set Principles and Examples B.1 Introduction B-2 B.2 Classifying Instruction Set Architectures B-3 B.3 Memory Addressing B-7 B.4 Type and Size of Operands B-13 B.5 Operations in the Instruction Set B-14

XiV o Contents B.6 Instructions for Control Flow B.7 Encoding an Instruction Set B.8 Crosscutting Issues:The Role of Compilers B.9 Putting lt All Together:The MIPS Architecture B.10 Fallacies and Pitfalls B.11 Concluding Remarks B.12 Historical Perspective and References B-16 B-21 B-24 B-32 B-39 B-45 B-47 Appendix C Review of Memory Hierarchy C.1 I ntroduction C2 Cache Performance C3 Six Basic Cache Optimizations C.4 Virtual Memory C.5 Protection and Examples ofvirtual Memory C.6 Fallacies and Pitfalls C.7 Concluding Remarks C.8 Historical Perspective and References C-2 C-15 C-22 C-38 C-47 C-56 C-57 C-58 Appendix D Appendix E Appendix F Appendix G Appendix H Appendix 1 Appendix J Appendix K Companion CD Appendices Embedded Systems Updated bythomas M. Conte Interconnection Networks Revised bytimothy M. Pinkston and Jose Duato Vector Processors Revised by Krste Asanovic Hardware and Software for VLIW and EPIC Large-Scale Multiprocessors and Scientific Applications Computer Arithmetic by David Goldberg Survey of Instruction Set Architectures Historical Perspectives and References Appendix L Online Appendix (textbooks.elseviercom/0123704901) Solutions to Case Study Exercises References Index