The end of Moore s law and the race for performance Michael Resch (HLRS) September 15, 2016, Basel, Switzerland
Roadmap Motivation (HPC@HLRS) Moore s law Options Outlook
HPC@HLRS
Cray XC40 Hazelhen 185.376 cores Intel Haswell 7,42 PF/s Peak #9 in TOP500 (Fastest PRACE system) #10 in HPCG (Europe s fastest research system) #2 in HPGMG
HLRS Investment Costs #27 #8? #7
MOORE S LAW
Mooré s law The complexity for minimum component costs has increased at a rate of roughly a factor of two per year (see graph on next page). Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. Gordon E. Moore, Cramming more components onto integrated circuits, Electronics (38), 8, April 1965,
International Technology Roadmap for Semiconductors (2013)
Technology Outlook (I) Indeed, future growth in capabilities may come from an explosion of specialized hardware architectures that exploit the growth in the number of transistors on a chip. The transition implied by the anticipated end of Moore s Law will be even more severe absent development of disruptive technologies; it could mean, for the first time in over three decades, the stagnation of computer performance and the end of sustained reductions in the price-performance ratio. Committee on Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science in 2017-2020, Interim Report, November 2014, USA
Technology Outlook (II) Next month, the worldwide semiconductor industry will formally acknowledge what has become increasingly obvious to everyone involved: Moore's law, the principle that has powered the informationtechnology revolution since the 1960s, is nearing its end. M. Mitchell Waldrop, February 9, 2016 http://www.nature.com/news/the-chips-are-down-formoore-s-law-1.19338
General Trends (TOP500) Source: www.top500.org
Level of Parallelism
Hardware Technology Improvement expected We can seriously expect to see a factor of 2-4 in the coming 5 years in power consumption reduction We can seriously expect to see more cores on a die in the coming 5-10 years All in all we may see a 1 ExaFlop system in 2020/2021 with about 100 Mio Cores and a power consumption somewhere between 50 and 120 MW (most likely in China) (We could build an Exaflop today at the cost of about 1.5 billion $US for the system + cost for building construction + operational cost)
Architectural Improvements Standard architectures still follow the Linpack benchmark will be so for a while Options Extreme many core -> GPGPU Will increase peak speed Will make programming even more difficult Will work for a small number of algorithms only Improved memory subsystem NVRAM Can help to support IO 3D Memory Can help increase performance Makes cooling more difficult Good old vector technology Could be still programmable Requires specialized people
Main Problems Infrastructure Size / Weight Power / Cooling Programming Communication IO
ACCELERATORS?
Accelerators Accelerators have changed the HPC landscape but from the top and not from the bottom Accelerators have technical issues that have not been resolved in years PCIe connectivity Small memory Lack of standards Accelerators start to converge with microprocessor architectures Opportunities and risks do not balance so far
Accelerators in Tian-He 2 The real question is: what are they going to use the machine for? I question, at some level, what the Chinese are doing with these big machines, Dongarra said. They are not using the accelerator part of the machine. I go visit the computing facilities [in China] and I m not saying that they are being used for things that are secret I m saying that I don t know what they are being used for. http://www.vrworld.com/2015/03/22/jack-dongarra-chinaisnt-the-emerging-hpc-power-you-think-it-is/
CLOUDS?
HPC in Germany in 2014 Where is the Cloud? Software Initiatives (BMBF/DFG) Performance Pyramid European Tier-0 System German Tier-1 Systems Coordination with GA Funding: 400 M State-wide concepts Regional concepts Coordination with GA German Tier-2 Systems Coordination with GCS Funding: 100 M Data-Management National Research Network (DFN)
ENERGY?
Do Not Communicate Data: Dmitri Khabi, HLRS, 2016
Message Algorithms should be able to reuse caches Algorithms should be blocking Algorithms should support vector type mode Algorithms should not touch main memory too often Assume that your memory is a disk Assume that your cache is the memory
Do Not Communicate Data: Dmitri Khabi, HLRS, 2016
Message Algorithms should not communicate in large scale parallel systems Calculation is cheaper than computation
CAN SOFTWARE SAVE YOUR MORTAL SOUL? AND CAN YOU TEACH ME HOW TO DANCE REAL SLOW?
Software Improvements Calculation example: 1 Exaflop with 1% sustained performance -> 10 PF 100 PF with 10% sustained performance -> 10 PF 30 PF with 30% sustained performance -> 10 PF Questions Can we afford to ignore TOP500 completely? Which of the three is cheaper in investment? Which of the three is cheaper in operational costs? Which of the three is easier to program?
BUT ISN T THERE...?
A Utopian View Quantum Computer Minimize a Function Open Questions: Which problems can be represented by this type of minimization? Which problems can be transformed to meet criteria for a i and b ij? What would a transformation have to look like?
OUTLOOK
Expectations We will see more cores We will need more power We will need more programming effort We will struggle to continue with current technology
Moving Forward A Poor Comparison USA 1893: Frederick Jackson Turner states that the final frontier has been reached Alleghenies /Mississippi / Missouri / Rocky Mountains /Pacific Ocean Internal development is needed For HPC MFLOPS (1964) GFLOPS (1983) TFLOPS (1996) PFLOPS (2008) EFLOPS (2021) Internal development is needed
Todo List We need to explore new architectures We need to explore new programming models We need to harness the power of algorithms
Questions