Undertanding and Meauring Speedup Lat Time» Midterm Exam Today» Midterm Summary» Definition of Speedup» Meauring Speedup Reminder/Announcement» New Homework #3 will be out oon (tomorrow?)» Midterm Exam will be returned today (END of cla)» Graded Homework #2 will be returned next week Lecture #10, Slide 1 Midterm: In Perpective In general, the cla did very well. #2: Many didn t articulate the fundamental reaon that application are increaingly parallel large cale data, analyi complexity, large human activity Generally well on apect of parallel Java programming: thread, ynchronized, example #9: Mot didn t get the Pagerank quetion» Two imple way to calculate random walk, equation for each node iteration wa the hard way Lecture #10, Slide 2
Midterm Score 6 5 4 3 2 1 0 120 140 160 180 200 Average and Median 154 Standard Deviation 26 High Score 198 Lecture #10, Slide 3 Why Meaure Performance? Tell you how you are doing Limit tell you whether thing can be improved appreciably Important: Undertand exactly what you are meauring and how you are meauring it. Lecture #10, Slide 4
Common Reource Performance Meaure MFLOPS million floating point operation per econd» GFLOPS, TFLOPS MBYTES million byte per econd» GByte, TByte MIPS Million intruction per econd Thee metric provide one meaure of reource performance. They do not however indicate how fat YOUR program will run. Lecture #10, Slide 5 Performance Improvement Relative Performance a la CSE 141 What i being compared?» Machine A v. Machine B» Program A v. Program B Sytem A i X time fater than Sytem B Lecture #10, Slide 6
Comparing Performance for Parallel Program App App Parallel Program v. Sequential Program Same Machine? Try to keep the proceor equal (1 of them v. N of them) Thee comparion are known a peedup. Lecture #10, Slide 7 Speedup Speedup S(n) (Execution time on Single CPU) (Execution on N parallel proceor) T T» Speedup meaure of application performance on a given application implementation and platform (ytem oftware and hardware) p Lecture #10, Slide 8
Preview: What i a Good Speedup? Hopefully, S(n) > 1 Linear peedup:» S(n) n» Parallel program conidered perfectly calable Superlinear peedup:» S(n) > n» Can thi happen? Lecture #10, Slide 9 Defining Speed-Up Speedup S(n) (Execution time on Single CPU) (Execution on n parallel proceor) Speedup depend on many attribute:» What problem ize?» Wort cae? Average cae? Bet cae?» What do we count a work? Parallel computation, communication, overhead?» What equential algorithm and what machine for the numerator? Can the algorithm ued for the numerator and the denominator be different? Lecture #10, Slide 10
Common Definition of Speedup Speedup S(n) (Execution time on Single CPU) (Execution on n parallel proceor) Let M be a parallel machine with p proceor Let T(X) be the time it take to olve a problem on M with X proceor Common definition of Speedup:» Serial machine i one proceor of parallel machine and erial algorithm i interleaved verion of parallel algorithm» Serial algorithm i fatet known erial algorithm for running on a erial proceor (W+A) T (1) S ( n) T ( n) T S( n) T ( n)» Serial algorithm i fatet known erial algorithm running on a one S ( n) proceor of the parallel machine (Gutafon) Lecture #10, Slide 11 T '(1) T ( n) Typical Speedup Graph X-axi i the number of proceor; Y-axi i the peedup Graph i for a particular program Ideal i a traight line, with unit lope (that i, 1) Lecture #10, Slide 12
Can peedup be uperlinear? Speedup CAN be uperlinear:» Let M be a parallel machine with n proceor» Let T(X) be the time it take to olve a problem on M with X proceor T» Speedup definition: S( n) T ( n)» Serial verion of the algorithm may involve more overhead than the parallel verion of the algorithm E.g. AB+C on a SIMD machine with A,B,C matrice v. loop overhead on a erial machine» Hardware characteritic may favor parallel algorithm E.g. if all data can be decompoed in cache or main memorie of parallel proceor v. needing econdary torage on erial proceor to retain all data Lecture #10, Slide 13 Bound on Speedup (Amdahl) What i the maximum peedup poible for a parallel program?» Let f erial fraction that cannot be parallelized Amdahl law bound the peedup in term of erial portion and parallelizable portion of algorithm. T ft + (1 f ) T S ( n ) ft (1 f ) T T p ft + n T n (1 f ) T + nf + 1 f n 1 lim n > f 1 ( n 1) f n + 1 Lecture #10, Slide 14
Example of Amdahl Law Suppoe that a calculation ha a 4% erial portion, what i the limit of peedup on 64 proceor? What i the maximum peedup? Lecture #10, Slide 15 Speedup Variant: Parallel Efficiency Efficiency: E(n) S(n)/n * 100% Efficiency meaure the fraction of ideal peedup that i being achieved» A program with linear peedup i 100% efficient. Uing efficiency:» A program attain 89% parallel efficiency on 64 proceor, what i the peedup? Lecture #10, Slide 16
Pitfall: Cheating Speedup Not uing the bet equential algorithm or running time make you look good» Uing the parallel verion (lot of overhead built-in)» Uing an algorithm which doen t make optimal ue of the cache Lecture #10, Slide 17 Beyond Amdahl Law Gutafon challenged Amdahl' aumption that erial fraction (f) remain contant for all problem ize (and for larger machine -> larger problem)» Example: if erial part i grow a N and the parallel part grow a N 2, then a problem ize grow, the erial fraction (f) decreae» N 100, N 2 10,000, f 100/10,100 1%» N 1000, N 2 1,000,000, f 0.1%» N 10,000, N 2 100,000,000, f 0.01% According to Amdahl what peedup would be poible? Lecture #10, Slide 18
Gutafon Speed Limit Gutafon defined two more relevant notion of peedup» Scaled peedup» Fixed-time peedup» And renamed Amdahl verion a fixed-ize peedup Lecture #10, Slide 19 Gutafon Law Fix execution time on a ingle proceor» + p erial part + parallel part 1 (normalized erial time)» ( ame a f previouly)» Aume problem fit in memory of erial computer Fixed-ize peedup (Amdahl Law) S fixed _ ize + p p + n 1 1 + n Fix execution time on a parallel computer» + p erial part + parallel part 1 (normalized parallel time)» + np erial time on a ingle proceor» Aume problem fit in memory of parallel computer Scaled Speedup (Gutafon Law) S caled n + + np p + ( 1 n) Lecture #10, Slide 20
Scaled Speedup Scaling: problem ize can increae with number of proceor» Memory, Compute Power Increae, o doe problem ambition! (at ome point problem may not be meaningful)» Gutafon law give meaure of how much Scaled Speedup fixe the parallel execution time» Amdahl fixed the problem ize fixe erial execution time» Too conervative for large-cale ytem Intereting conequence: no bound to peedup a n infinity, peedup ha no real bound Lecture #10, Slide 21 Uing Gutafon Law Given a caled peedup of 80 on 128 proceor, what i the erial fraction from Amdahl law? What i the erial fraction from Gutafon Law? S caled n + + np p + ( 1 n) Lecture #10, Slide 22
Fixed Time Speedup Gutafon alo!» Ue caled peedup when the memory requirement cale linearly with the number of proceor Idea: Ue fixed-time peedup when the work cale linearly with the number of proceor, rather than the memory» A different kind of caleup allow problem ize to increae (and perhap alo erial fraction to decreae) Lecture #10, Slide 23 Fixed Time Speedup Let T p '(1, X ) complexity of the bet erial algorithm for a ize X problem on one proceor of the parallel machine. T p ( m, X ) complexity of the parallel algorithm run on m proceor for problem ize X N 0 the ize of the larget problem that conveniently fit into primary memory of one proceor N m maximum value of N atifying Tp ( m, N) Tp '(1, N0) may be non-monotonic due to architectural feature mn 0 ize of the problem that conveniently fit into primary memory of a parallel machine with m proceor S caled Tp'(1, mn0 ) _ and_ S T ( m, mn ) P 0 fixed_ time Tp'(1, N T ( m, N P m m ) Tp'(1, Nm) ) T '(1, N ) P 0 Lecture #10, Slide 24
Example: MinuteSort Kayak Neterver Kayak Minute Sort (all the record you can ort in a Minute!)» Fixed Time Scaling» 340Million, 32GB, 2004» ~120M, 12GB, 2000 See Gray Sort Benchmark Page http://reearch.microoft.com/barc/sortbenchmark/ Lecture #10, Slide 25 Fixed Work Benchmark Work (and data) cale up with # of proceor Meaure time to complete an iteration --- it goe up with # of Node! Similar to Fixed Time Model Lecture #10, Slide 26
Summary Midterm Redux Speedup» Amdahl Law and Gutafon Reviion» Speedup v. Abolute Efficiency Next Time» Benchmark» Some Application and Machine Example Lecture #10, Slide 27