Midterm: In Perspective

Similar documents
Comm 502: Communication Theory. Lecture 5. Intersymbol Interference FDM TDM

Frequency Calibration of A/D Converter in Software GPS Receivers

Hashiwokakero. T. Morsink. August 31, 2009

The Central Limit Theorem

Produced in cooperation with. Revision: May 26, Overview

AN EVALUATION OF DIGILTAL ANTI-ALIASING FILTER FOR SPACE TELEMETRY SYSTEMS

Internet Routing Protocols Lecture 02 Intra-domain Routing

Time-Domain Coupling to a Device on Printed Circuit Board Inside a Cavity. Chatrpol Lertsirimit, David R. Jackson and Donald R.

HIGH VOLTAGE DC-DC CONVERTER USING A SERIES STACKED TOPOLOGY

Analysis. Control of a dierential-wheeled robot. Part I. 1 Dierential Wheeled Robots. Ond ej Stan k

Subcarrier exclusion techniques

Lab 7 Rev. 2 Open Lab Due COB Friday April 27, 2018

An FM signal in the region of 4.2 to 4.6

Mobile Communications TCS 455

Resonant amplifier L A B O R A T O R Y O F L I N E A R C I R C U I T S. Marek Wójcikowski English version prepared by Wiesław Kordalski

COST OF TRANSMISSION TRANSACTIONS: Comparison and Discussion of Used Methods

Active vibration isolation for a 6 degree of freedom scale model of a high precision machine

Comparative Study of PLL, DDS and DDS-based PLL Synthesis Techniques for Communication System

Sampling Theory MODULE XIII LECTURE - 41 NON SAMPLING ERRORS

Interactive Machine Learning

Gemini. The errors from the servo system are considered as the superposition of three things:

Position Control of a Large Antenna System

Control of Electromechanical Systems using Sliding Mode Techniques

Digital Control of Boost PFC AC-DC Converters with Predictive Control

REAL-TIME IMPLEMENTATION OF A NEURO-AVR FOR SYNCHRONOUS GENERATOR. M. M. Salem** A. M. Zaki** O. P. Malik*

The RCS of a resistive rectangular patch antenna in a substrate-superstrate geometry

MODAL ANALYSIS OF A BEAM WITH CLOSELY SPACED MODE SHAPES

Optimized BER Performance of Asymmetric Turbo Codes over AWGN Channel

Identification of Image Noise Sources in Digital Scanner Evaluation

Integral Control AGC of Interconnected Power Systems Using Area Control Errors Based On Tie Line Power Biasing

Sloppy Addition and Multiplication

Parallel DCMs APPLICATION NOTE AN:030. Introduction. Sample Circuit

A New Technique to TEC Regional Modeling using a Neural Network.

Techniques for Implementing a Model Simulated on a Physical Drive Vector Control

Chapter Introduction

IE 361 Module 6. Gauge R&R Studies Part 2: Two-Way ANOVA and Corresponding Estimates for R&R Studies

The Performance Analysis of MIMO OFDM System with Different M-QAM Modulation and Convolution Channel Coding

A Simple DSP Laboratory Project for Teaching Real-Time Signal Sampling Rate Conversions

HEURISTIC APPROACHES TO SOLVE THE U-SHAPED LINE BALANCING PROBLEM AUGMENTED BY GENETIC ALGORITHMS. Ulises Martinez William S. Duff

Mechatronics Laboratory Assignment 5 Motor Control and Straight-Line Robot Driving

DSP-Based Control of Boost PFC AC-DC Converters Using Predictive Control

(a) frequency (b) mode (c) histogram (d) standard deviation (e) All the above measure

INDIAN OCEAN HYDROACOUSTIC WAVE PROPAGATION CHARACTERISTICS

SIMULINK for Process Control

A COMPARISON OF METHODS FOR EVALUATING THE TEST ZONE PERFORMANCE OF ANECHOIC CHAMBERS DESIGNED FOR TESTING WIRELESS DEVICES

Lecture 11. Noise from optical amplifiers. Optical SNR (OSNR), noise figure, (electrical) SNR Amplifier and receiver noise

Different Parameters Variation Analysis of a PV Cell

NOISE BARRIERS CERC 1. INTRODUCTION

Phase-Locked Loops (PLL)

Raising Cavity Q for Microwave-Pulse Compression by Reducing Aperture Skin-Effect Losses

MIMO Systems: Multiple Antenna Techniques

Method to Improve Range and Velocity Error Using De-interleaving and Frequency Interpolation for Automotive FMCW Radars

CIRCULAR SYNTHETIC APERTURE SONAR WITHOUT A BEACON

EM341 Installation and use instructions

Cuing mechanisms in auditory signal detection

Topology in Circuit Analysis

SCK LAB MANUAL SAMPLE

To appear in 30th ACM/IEEE Design Automation Conference 1993.

FM Range Calculation

Revisiting Cross-channel Information Transfer for Chromatic Aberration Correction

DIGITAL COMMUNICATION

Synchronous Power Controller Merits for Dynamic Stability Improvement in Long Line by Renewables

A SIMPLE HARMONIC COMPENSATION METHOD FOR NONLINEAR LOADS USING HYSTERESIS CONTROL TECHNIQUE

Active Harmonic Elimination in Multilevel Converters Using FPGA Control

V is sensitive only to the difference between the input currents,

Research on Control Technology of Permanent Magnet Synchronous Motor Based on Iterative Algorithm Liu Yangyang 1c, Yang Guang 2b, Zou Qi 1c,

RESEARCH ON NEAR FIELD PASSIVE LOCALIZATION BASED ON PHASE MEASUREMENT TECHNOLOGY BY TWO TIMES FREQUENCY DIFFERENCE

MIMO Enabled Efficient Mapping of Data in WiMAX Networks

ECE451/551 Matlab and Simulink Controller Design Project

Lecture 6-1. Data Path Circuits

Previous lecture. Lecture 5 Control of DVD reader. TheDVD-reader tracking problem. Can you see the laser spot?

APPLICATION OF PHASOR MEASUREMENT UNIT IN SMART GRID

Study of Electronic Direct Digital Control (DDC) Panel using Mechanical Vibration Exciter

A Real-Time Wireless Channel Emulator For MIMO Systems

MM6 PID Controllers. Readings: Section 4.2 (the classical three-term controllers, p except subsection 4.2.5); Extra reading materials

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT)

Joint Wireless Positioning and Emitter Identification in DVB-T Single Frequency Networks

CHAPTER 2 WOUND ROTOR INDUCTION MOTOR WITH PID CONTROLLER

STRUCTURAL SEMI-ACTIVE CONTROL DEVICE

New Resonance Type Fault Current Limiter

Improvement in Image Reconstruction of Biological Object by EXACT SIRT cell Scanning Technique from Two Opposite sides of the Target

Modulation Extension Control for Multilevel Converters Using Triplen Harmonic Injection with Low Switching Frequency

Protection scheme for transmission lines based on correlation coefficients

UNIT 3 CIRCLES AND VOLUME Lesson 4: Finding Arc Lengths and Areas of Sectors Instruction

ENHANCEMENT OF FINGER VEIN IMAGE USING MULTIFILTERING ALGORITHM

Loss Reduction of AS/AC Networks with Holographic Optical Switches

2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

Network Coding for Multi-Resolution Multicast

GPS signal Rician fading model for precise navigation in urban environment

Design of Centralized PID Controllers for TITO Processes*

Power Conversion Efficiency of Airborne Parametric Array

Feedback Control Design of Off-line Flyback Converter

Performance analysis in cognitive radio system under perfect spectrum sensing Chen Song, Gu Shuainan, Zhang Yankui

Observational Uncertainty in Plan Recognition Among Interacting. Robots. Marcus J. Huber. Edmund H. Durfee

Francisco M. Gonzalez-Longatt Juan Manuel Roldan Jose Luis Rueda. Line 5: City, Country

Voltage Analysis of Distribution Systems with DFIG Wind Turbines

A VHDL-AMS Simulation Methodology for Transient Supply Current Extraction

Kalman Filtering Based Object Tracking in Surveillance Video System

A Two-Stage Optimization PID Algorithm

Summary of Well Known Interface Standards

Transcription:

Undertanding and Meauring Speedup Lat Time» Midterm Exam Today» Midterm Summary» Definition of Speedup» Meauring Speedup Reminder/Announcement» New Homework #3 will be out oon (tomorrow?)» Midterm Exam will be returned today (END of cla)» Graded Homework #2 will be returned next week Lecture #10, Slide 1 Midterm: In Perpective In general, the cla did very well. #2: Many didn t articulate the fundamental reaon that application are increaingly parallel large cale data, analyi complexity, large human activity Generally well on apect of parallel Java programming: thread, ynchronized, example #9: Mot didn t get the Pagerank quetion» Two imple way to calculate random walk, equation for each node iteration wa the hard way Lecture #10, Slide 2

Midterm Score 6 5 4 3 2 1 0 120 140 160 180 200 Average and Median 154 Standard Deviation 26 High Score 198 Lecture #10, Slide 3 Why Meaure Performance? Tell you how you are doing Limit tell you whether thing can be improved appreciably Important: Undertand exactly what you are meauring and how you are meauring it. Lecture #10, Slide 4

Common Reource Performance Meaure MFLOPS million floating point operation per econd» GFLOPS, TFLOPS MBYTES million byte per econd» GByte, TByte MIPS Million intruction per econd Thee metric provide one meaure of reource performance. They do not however indicate how fat YOUR program will run. Lecture #10, Slide 5 Performance Improvement Relative Performance a la CSE 141 What i being compared?» Machine A v. Machine B» Program A v. Program B Sytem A i X time fater than Sytem B Lecture #10, Slide 6

Comparing Performance for Parallel Program App App Parallel Program v. Sequential Program Same Machine? Try to keep the proceor equal (1 of them v. N of them) Thee comparion are known a peedup. Lecture #10, Slide 7 Speedup Speedup S(n) (Execution time on Single CPU) (Execution on N parallel proceor) T T» Speedup meaure of application performance on a given application implementation and platform (ytem oftware and hardware) p Lecture #10, Slide 8

Preview: What i a Good Speedup? Hopefully, S(n) > 1 Linear peedup:» S(n) n» Parallel program conidered perfectly calable Superlinear peedup:» S(n) > n» Can thi happen? Lecture #10, Slide 9 Defining Speed-Up Speedup S(n) (Execution time on Single CPU) (Execution on n parallel proceor) Speedup depend on many attribute:» What problem ize?» Wort cae? Average cae? Bet cae?» What do we count a work? Parallel computation, communication, overhead?» What equential algorithm and what machine for the numerator? Can the algorithm ued for the numerator and the denominator be different? Lecture #10, Slide 10

Common Definition of Speedup Speedup S(n) (Execution time on Single CPU) (Execution on n parallel proceor) Let M be a parallel machine with p proceor Let T(X) be the time it take to olve a problem on M with X proceor Common definition of Speedup:» Serial machine i one proceor of parallel machine and erial algorithm i interleaved verion of parallel algorithm» Serial algorithm i fatet known erial algorithm for running on a erial proceor (W+A) T (1) S ( n) T ( n) T S( n) T ( n)» Serial algorithm i fatet known erial algorithm running on a one S ( n) proceor of the parallel machine (Gutafon) Lecture #10, Slide 11 T '(1) T ( n) Typical Speedup Graph X-axi i the number of proceor; Y-axi i the peedup Graph i for a particular program Ideal i a traight line, with unit lope (that i, 1) Lecture #10, Slide 12

Can peedup be uperlinear? Speedup CAN be uperlinear:» Let M be a parallel machine with n proceor» Let T(X) be the time it take to olve a problem on M with X proceor T» Speedup definition: S( n) T ( n)» Serial verion of the algorithm may involve more overhead than the parallel verion of the algorithm E.g. AB+C on a SIMD machine with A,B,C matrice v. loop overhead on a erial machine» Hardware characteritic may favor parallel algorithm E.g. if all data can be decompoed in cache or main memorie of parallel proceor v. needing econdary torage on erial proceor to retain all data Lecture #10, Slide 13 Bound on Speedup (Amdahl) What i the maximum peedup poible for a parallel program?» Let f erial fraction that cannot be parallelized Amdahl law bound the peedup in term of erial portion and parallelizable portion of algorithm. T ft + (1 f ) T S ( n ) ft (1 f ) T T p ft + n T n (1 f ) T + nf + 1 f n 1 lim n > f 1 ( n 1) f n + 1 Lecture #10, Slide 14

Example of Amdahl Law Suppoe that a calculation ha a 4% erial portion, what i the limit of peedup on 64 proceor? What i the maximum peedup? Lecture #10, Slide 15 Speedup Variant: Parallel Efficiency Efficiency: E(n) S(n)/n * 100% Efficiency meaure the fraction of ideal peedup that i being achieved» A program with linear peedup i 100% efficient. Uing efficiency:» A program attain 89% parallel efficiency on 64 proceor, what i the peedup? Lecture #10, Slide 16

Pitfall: Cheating Speedup Not uing the bet equential algorithm or running time make you look good» Uing the parallel verion (lot of overhead built-in)» Uing an algorithm which doen t make optimal ue of the cache Lecture #10, Slide 17 Beyond Amdahl Law Gutafon challenged Amdahl' aumption that erial fraction (f) remain contant for all problem ize (and for larger machine -> larger problem)» Example: if erial part i grow a N and the parallel part grow a N 2, then a problem ize grow, the erial fraction (f) decreae» N 100, N 2 10,000, f 100/10,100 1%» N 1000, N 2 1,000,000, f 0.1%» N 10,000, N 2 100,000,000, f 0.01% According to Amdahl what peedup would be poible? Lecture #10, Slide 18

Gutafon Speed Limit Gutafon defined two more relevant notion of peedup» Scaled peedup» Fixed-time peedup» And renamed Amdahl verion a fixed-ize peedup Lecture #10, Slide 19 Gutafon Law Fix execution time on a ingle proceor» + p erial part + parallel part 1 (normalized erial time)» ( ame a f previouly)» Aume problem fit in memory of erial computer Fixed-ize peedup (Amdahl Law) S fixed _ ize + p p + n 1 1 + n Fix execution time on a parallel computer» + p erial part + parallel part 1 (normalized parallel time)» + np erial time on a ingle proceor» Aume problem fit in memory of parallel computer Scaled Speedup (Gutafon Law) S caled n + + np p + ( 1 n) Lecture #10, Slide 20

Scaled Speedup Scaling: problem ize can increae with number of proceor» Memory, Compute Power Increae, o doe problem ambition! (at ome point problem may not be meaningful)» Gutafon law give meaure of how much Scaled Speedup fixe the parallel execution time» Amdahl fixed the problem ize fixe erial execution time» Too conervative for large-cale ytem Intereting conequence: no bound to peedup a n infinity, peedup ha no real bound Lecture #10, Slide 21 Uing Gutafon Law Given a caled peedup of 80 on 128 proceor, what i the erial fraction from Amdahl law? What i the erial fraction from Gutafon Law? S caled n + + np p + ( 1 n) Lecture #10, Slide 22

Fixed Time Speedup Gutafon alo!» Ue caled peedup when the memory requirement cale linearly with the number of proceor Idea: Ue fixed-time peedup when the work cale linearly with the number of proceor, rather than the memory» A different kind of caleup allow problem ize to increae (and perhap alo erial fraction to decreae) Lecture #10, Slide 23 Fixed Time Speedup Let T p '(1, X ) complexity of the bet erial algorithm for a ize X problem on one proceor of the parallel machine. T p ( m, X ) complexity of the parallel algorithm run on m proceor for problem ize X N 0 the ize of the larget problem that conveniently fit into primary memory of one proceor N m maximum value of N atifying Tp ( m, N) Tp '(1, N0) may be non-monotonic due to architectural feature mn 0 ize of the problem that conveniently fit into primary memory of a parallel machine with m proceor S caled Tp'(1, mn0 ) _ and_ S T ( m, mn ) P 0 fixed_ time Tp'(1, N T ( m, N P m m ) Tp'(1, Nm) ) T '(1, N ) P 0 Lecture #10, Slide 24

Example: MinuteSort Kayak Neterver Kayak Minute Sort (all the record you can ort in a Minute!)» Fixed Time Scaling» 340Million, 32GB, 2004» ~120M, 12GB, 2000 See Gray Sort Benchmark Page http://reearch.microoft.com/barc/sortbenchmark/ Lecture #10, Slide 25 Fixed Work Benchmark Work (and data) cale up with # of proceor Meaure time to complete an iteration --- it goe up with # of Node! Similar to Fixed Time Model Lecture #10, Slide 26

Summary Midterm Redux Speedup» Amdahl Law and Gutafon Reviion» Speedup v. Abolute Efficiency Next Time» Benchmark» Some Application and Machine Example Lecture #10, Slide 27