Outline Simulators and such. What defines a simulator? What about emulation?

Similar documents
Performance Evaluation of Recently Proposed Cache Replacement Policies

Processors Processing Processors. The meta-lecture

Dynamic MIPS Rate Stabilization in Out-of-Order Processors

FOR almost all computer architecture research and design,

SATSim: A Superscalar Architecture Trace Simulator Using Interactive Animation

SW simulation and Performance Analysis

An Overview of Computer Architecture and System Simulation

EE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004

Recent Advances in Simulation Techniques and Tools

COTSon: Infrastructure for system-level simulation

CS4617 Computer Architecture

Final Report: DBmbench

Architecture ISCA 16 Luis Ceze, Tom Wenisch

CS Computer Architecture Spring Lecture 04: Understanding Performance

ACCELERATE SOFTWARE DEVELOPMENT WITH CONTINUOUS INTEGRATION AND SIMULATION

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

DSP VLSI Design. DSP Systems. Byungin Moon. Yonsei University

CS61c: Introduction to Synchronous Digital Systems

Statistical Simulation of Multithreaded Architectures

Introduction to co-simulation. What is HW-SW co-simulation?

Real-Time Systems Hermann Härtig Introduction

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System

Measuring and Evaluating Computer System Performance

Performance Metrics, Amdahl s Law

2.6.1: Program Outcomes

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

What is a Simulation? Simulation & Modeling. Why Do Simulations? Emulators versus Simulators. Why Do Simulations? Why Do Simulations?

Lecture 1: Introduction to Digital System Design & Co-Design

VLSI System Testing. Outline

Copyright 2003 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Slides prepared by Walid A. Najjar & Brian J.

CLIPPER: Counter-based Low Impact Processor Power Estimation at Run-time

Project 5: Optimizer Jason Ansel

Console Architecture 1

Benchmarking C++ From video games to algorithmic trading. Alexander Radchenko

Kosuke Imamura, Assistant Professor, Department of Computer Science, Eastern Washington University

Blackfin Online Learning & Development

Leading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005]

Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

MPEG-4 Structured Audio Systems

Architecting Systems of the Future, page 1

CSTA K- 12 Computer Science Standards: Mapped to STEM, Common Core, and Partnership for the 21 st Century Standards

Proactive Thermal Management using Memory-based Computing in Multicore Architectures

Challenges of in-circuit functional timing testing of System-on-a-Chip

Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka

DSP BASED SYSTEM FOR SYNCHRONOUS GENERATOR EXCITATION CONTROLL

Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence

24 Challenges in Deductive Software Verification

Overview of Design Methodology. A Few Points Before We Start 11/4/2012. All About Handling The Complexity. Lecture 1. Put things into perspective

Flexibility, Speed and Accuracy in VLIW Architectures Simulation and Modeling

Trace Based Switching For A Tightly Coupled Heterogeneous Core

Deadline scheduling: can your mobile device last longer?

Individual Test Item Specifications

Computer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks

Signal Processing in Mobile Communication Using DSP and Multi media Communication via GSM

EE382V: Embedded System Design and Modeling

Challenges in Transition

Speed Control of the DC Motor through Temperature Variations using Labview and Aurdino

How cryptographic benchmarking goes wrong. Thanks to NIST 60NANB12D261 for funding this work, and for not reviewing these slides in advance.

Multi-core Platforms for

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Comparison: On-Device and Drive Test Measurements

Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance

Distributed spectrum sensing in unlicensed bands using the VESNA platform. Student: Zoltan Padrah Mentor: doc. dr. Mihael Mohorčič

CS 6290 Evaluation & Metrics

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

shangupt 2260 Hayward St. #4861, Ann Arbor, MI 48105, Ph:

Evaluation of CPU Frequency Transition Latency

Enhancing System Architecture by Modelling the Flash Translation Layer

Adaptable C5ISR Instrumentation

Forecasting Paper. Name. University / Affiliation / Institution

1) Fixed point [15 points] a) What are the primary reasons we might use fixed point rather than floating point? [2]

Game Architecture. 4/8/16: Multiprocessor Game Loops

Evolution of Software-Only-Simulation at NASA IV&V

Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications

Creating the Right Environment for Machine Learning Codesign. Cliff Young, Google AI

Introduction to Computer Engineering

An ahead pipelined alloyed perceptron with single cycle access time

Simulation of Hybrid Computer Architectures: Simulators, Methodologies and Recommendations

An Energy Conservation DVFS Algorithm for the Android Operating System

EE382N-20 Computer Architecture Parallelism and Locality Lecture 1

Statement of Research Weiwei Chen

Lies, Damned Lies and Hardware Verification. Mike Bartley, Test and Verification Solutions

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.

FAST RADIX 2, 3, 4, AND 5 KERNELS FOR FAST FOURIER TRANSFORMATIONS ON COMPUTERS WITH OVERLAPPING MULTIPLY ADD INSTRUCTIONS

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona

Optimizing VM Checkpointing for Restore Performance in VMware ESXi Server

Analysis of the electrical disturbances in CERN power distribution network with pattern mining methods

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

Cooperative Cross-Layer Protection for Resource Constrained Mobile Multimedia Systems

Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems

Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control

Computer Architecture A Quantitative Approach

Introduction to Real-Time Systems

dw Interviews: Nicholas Leduc on the mobile experience of billions of devices Episode date:

Design as You See FIT: System-Level Soft Error Analysis of Sequential Circuits

Hybrid Architectural Dynamic Thermal Management

Online Game Quality Assessment Research Paper

SDR Platforms for Research on Programmable Wireless Networks

Transcription:

Outline Simulators and such Mats Brorsson & Mladen Nikitovic ICT Dept of Electronic, Computer and Software Systems (ECS) What defines a simulator? Why are simulators needed? Classifications Case studies Benchmark suites New challenges References 1 2 What defines a simulator? What about emulation? host data Program (target) statistics output From Wikipedia: "A simulation is an imitation of some real thing, state of affairs, or process. The act of simulating something generally entails representing certain key characteristics or behaviors of a selected physical or abstract system." Simulation gives you the opportunity to model non-existing components, to collect statistics about its performance etc. Wikipedia: A software emulator allows computer programs to run on a platform (computer architecture and/or operating system) other than the one for which they were originally written. Unlike simulation, which only attempts to reproduce a program's behavior, emulation attempts to model to various degrees the state of the device being emulated. 3 4 1

Why use simulation? Research trends Understanding real systems Higher degree of observability Less dangerous Fault injection Debugging Prototype hardware before expensive implementations Develop software in parallel with hardware nof papers 50 40 30 20 10 Performance evaluation methodologies used in a sampling of papers frm ISCA [K. Skadron] other mathematical modeling measurement simulation 0 1973 1985 1993 1997 2001 5 6 Simulator classifications Simulator categories Detail Performance Performance speeds design cycle Flexibility maximizes design scope Detail minimizes risk Flexibility Design goals drives optimization towards any corner in the triangle Full-system vs microarchitecture simulators Full system simulators are slow, but models OSoverheads and a more complete picture of the results Architecture simulators are faster, usually more accurate than full system simulators. Functional vs performance simulators functional sims are faster but less accurate functional sims are also more flexible performance allow accurate modeling of more complex architectures with out-of-order execution 7 8 2

More simulator categories SimpleScalar Interpreters vs. instrumented code Interpreters can support multiple targets, thus are more flexible Intstrumented code runs much faster on a host than using an interpreter, however one must watch out for probing effects Trace-driven vs execution-driven simulators traces are pre-recorded streams of instructions, which allows for a deterministic simulation each time execution-driven simulations allows exploration of speculative execution and also side-effects of the operating system A microarchitectural simulator suite (T. Austin 92 ) www.simplescalar.com Development tools Compilers, assembler, linker & libraries All source code included Simulators Functional and performance simulators Execution and trace-driven Trace genarator D P F 9 10 Simplescalar structure Simplescalar simulator suite User programs SimpleScalar Program Binary Execution & trace-driven simulators sim-fast functional simulation Prg/Sim interface Functional core Performance core SS ISA (PISA) POSIX System Calls Machine Definition Proxy Syscall Handler BPred Stats Simulator core Resource Cache Loader Regs Dlite! Memory sim-safe sim-fast with error detection sim-cache functional cache simulator sim-cheetah cache simulator (multiple configurations) sim-outorder performance out-of order execution Trace generator sim-eio i/o-tracing & check-pointing Detail Performance 11 12 3

MASE structure MASE callback interface 13 14 SimICS SimICS Architecture Commercial Full-system simulator (SICS 92 ) models entire operating system including uni/multiprocessors, caches, networks, and peripherals P User can attach own modules through interfaces Many targets and hosts supported D Powerpc, Sparc, x86, x86-64, Alpha, ARM, IA32/64, MIPS Linux, Windows, Solaris Uses images to load system configurations Supports checkpointing and tracing http://www.simics.net, http://www.virtutech.com F 15 16 4

Simics three modes Target vs. host Fast mode: No cache simulation. Just in time compilation. Normal mode: Simple cache simulation. Out-of-order mode: MAI (Micro Architecture Interface). Supports speculative execution, such as, branch and valute prediction. Cache simulation etc. The target is the simulated system The host is the computer that runs Simics The different prompts: target# the targets prompt: root on target system host$ the host prompt: user on the host system (xterm etc) simics> the Simics prompt 17 18 Simulators summary Benchmark suites : Spec SimpleScalar Free uniprocessor simulator w tools, can simulate cache hieriarchy with a cycle-accurate processor model Simics Commercial full-system uni/multi-processor simulator, flexible and portable, reasonably fast Simics extensions Multifacet GEMS: http://www.cs.wisc.edu/gems/ Simflex: http://www.ece.cmu.edu/~simflex/ Standard Performance Evaluation Corporation Consists of many categories e.g CPU Ver. 92, 95, 2000, current version 2006 CINT 2006 Integer benchmarks (12) CFP 2000 Floating point benchmarks (17) Base vs Peak depending on optimization level Input versions reference reduced input (CPU 2000, obsolete) 19 20 5

Benchmark suites : MiBench Benchmark suites : Mediabench Freely available benchmark suite, resembles EEMBC, a standardized (non-free) suite http://www.eecs.umich.edu/mibench/ Range of 1 billion executed instructions Consists of 35 applications from 6 different areas in embedded computing: automotive/industrial (sorting) consumer (image and text compression) office (document-related operations) network (routing and encryption/decryption) security (encryption/decryption) telecom (encryption/decryption, speech encoding) 21 Purpose is to represent applications common in embedded multimedia and communication environments Voice compression, image rendering & compression, encryption/decryption of text, Ranges from few millon to a billion executed instructions, quick to simulate Free source code at http://cares.icsl.ucla.edu/mediabench/ 22 Possible problems New challenges: Accuracy Can I trust the results that my simulator has produced? How do I verify my results? Absolute accuracy shows how close one is to the real world whereas relative accuracy shows how correct a model is between different configuration settings Absolute accuracy is increasingly complex to achieve due to modeling limitations such as timing variations due to physical phenomenons Therefore, relative accuracy is more feasable to achieve today, but harder to verify 23 24 6

New challenges: Increased complexity Increasingly complex architectures are modeled e.g multiprocessor systems with complex networks, operating system behavior, running multiple processes/threads and so on.. With current simulation speed one would wait years for a simulation session to finish Several ideas to reduce simulation time Reduce binary size Vary simulator accuracy during a session Fast forward between sections of code New challenges: Workload variability A workload consists of multiple processes or threads executed at various instants of time Running one workload scenario will not give you an accurate result due to variations in I/O response, OS services and schedulers 25 [E. Larson] 26 New challenges: Workload variability References One need to run a workload scenario multiple times to increase confidence in results Addressing workload variability in architectural simulations, E. Larson et al., IEEE Computer, 2001. Challenges in Computer Architecture Evaluation, K. Skadron et al., IEEE Computer, 2003. Simulating a 2M commercial server on a 2K PC, A. R. Alameldeen et al., IEEE Computer, 2003. SimpleScalar: An infrastructure for computer System modeling, T. Austin et al, IEEE Computer, 2002. MASE: A novel infrastructure for detailed microarchitectural modeling E. Larson et al., Int. Symp. on Performance Analysis of Systems and Software, 2001. 27 28 7

References Asim: A performance model framework J. Emer et al., IEEE Computer 2002. Simics: A full system simulation platform P. S. Magnusson, IEEE Computer 2002. RSIM: Simulating shared-memory multiprocessors with ILP processors C. J. Hughes, IEEE Computer, 2002. Picking statistically valid and early simulation points E. Perelman, Intl. Conf. On Parallel Architectures and Compilation Techniques (PACT), 2003. 29 8