Building a Cell Ecosystem David A. Bader
Acknowledgment of Support National Science Foundation CSR: A Framework for Optimizing Scientific Applications (06-14915) CAREER: High-Performance Algorithms for Scientific Applications (06-11589; 00-93039) ITR: Building the Tree of Life -- A National Resource for Phyloinformatics and Computational Phylogenetics (EF/BIO 03-31654) ITR/AP: Reconstructing Complex Evolutionary Histories (01-21377) DEB Comparative Chloroplast Genomics: Integrating Computational Methods, Molecular Evolution, and Phylogeny (01-20709) ITR/AP(DEB): Computing Optimal Phylogenetic Trees under Genome Rearrangement Metrics (01-13095) DBI: Acquisition of a High Performance Shared-Memory Computer for Computational Science and Engineering (04-20513). IBM Shared University Research Award (2006). IBM PERCS / DARPA High Productivity Computing Systems (HPCS) DARPA Contract NBCH30390004 David A. Bader, Cell Ecosystem 2
Georgia Tech s Rise in HPC (since Fall 2005) Creation of the Computational Science and Engineering division and academic programs Klaus Advanced Computing Building Prominent faculty hires in HPC NSF Track 1 and Track 2 HPC proposals Our partnership is a front-runner in the Track 1 Petascale program to acquire and operate a petascale system by June 2011 Creation of GT campus @ Oak Ridge GA Tech ranked #6 academic institution in most recent Top500 List of Supercomputing Sites Deployed 22 Teraflops IBM SUR Award for Cell Broadband Engine Blades Building of collaborations with Sony and Toshiba Other Multicore/multithreaded industrial collaborations with Microsoft Research, Sun, and Cray GT - Oak Ridge Oak Ridge National Laboratory David A. Bader, Cell Ecosystem 3
New Computing Buildings at Georgia Tech Christopher W. Klaus Advanced Computing Building Expected completion date of construction of July 2006, with grand opening planned for 26 October 2006 Approximately 200,000 square feet, including 80 faculty offices, many research labs, five classrooms and a 200-seat auditorium Technology Square Research Building Opened April 2003 Most advanced media building in the world David A. Bader, Cell Ecosystem 4
The rise of HPC at Georgia Tech! 6th ranked academic institution in the recent June 2006 Top100 List Georgia Tech s high-end computing resources include approximately 7,000 processors in 35 clusters along with about 100 processors across several SMP systems. Recent HPC system acquisitions include: IBM Skolnick System Biology Center system (installed November 2005): a 4020-processor IBM eserver Blade- Center with 1,005 blades of 2x2 Opteron cores per blade Dell PowerEdge 1850 system (installed September 2005): a 512-node supercomputing cluster with Intel Xeons and InfiniBand interconnect. David A. Bader, Cell Ecosystem 5
Petascale System will Accelerate Solutions to some of the most Challenging Problems for Research and Practice Area Health, Life sciences Environment: Agriculture, Natural Resources, Earthquake Analysis Energy Production and Conservation Nanoscience Petascale applications drug design, organ research, genomics earth motion dynamics, structural safety of buildings, levees, Climate change effects on air quality, agriculture, wildfires, water, other resources Turbulent combustion, prediction of fuel efficiency, more efficient solar cells Development of new materials: Faster, lower power semiconductors. Drug development. Fuel Cells. Better airport screening. Research Understanding formation of universe, dark matter, quantum physics, protein folding, etc., etc. Slide courtesy of Jim Demmel David A. Bader, Cell Ecosystem 6
Building A Cell Ecosystem Cell needs a rich ecosystem consisting of academic and industrial partners to provide a major leap and wide recognition to the Cell technologies through collaborations in research, education and evangelism. Create a brand name for Cell with recognition as the leading multi-core technology Create exciting application possibilities on Cell beyond a conventional computing platform (desktop, server, laptops etc.) e.g. open an era of Cell everywhere for new domains (ubiquity, gaming, media, digital content, high-performance computing) Develop technologies that have a lasting impact well beyond current Cell lifecycle planned Perform a paradigm shift in the way we think of computation major implications on the future roadmap for multi-cores Develop a model of industry-academia collaboration leveraging complementary strengths charting a new era in innovation through open source Provide a unique academic experience for students to create a strong body of technical experts who will lead tomorrow s innovation through such partnerships Establish necessary forums for evangelism to promote Cell interests rapidly and effectively to a large body of academics as well as industries Promote small business development leveraging private VC investments, government small business grants and key partnerships to help grow the Cell ecosystem in terms of both breadth as well as the impact. David A. Bader, Cell Ecosystem 7
Cell Development Cell programming methodologies Develop general techniques for optimizing algorithms for Cell (PPE+SPEs) Bootstrap compiler, algorithm, and library development efforts Port and optimize key libraries with high affinity for Cell that significantly impact real-world applications Enable workloads from Scientific and technical computing, Digital content creation, Entertainment and gaming, Health care, Seismic, Financial, David A. Bader, Cell Ecosystem 8
Cell Research Areas at Georgia Tech Scientific and Digital Media Libraries FFT, Sort, JPEG, JPEG-2000, MPEG-2, MJPEG Cell Algorithms and Programming Methodologies Compilers Software-based cache Systems Task partitioning, performance optimization Advanced tool chains Applications Gaming Digitial Media Special Effects Scientific Computing David A. Bader, Cell Ecosystem 9
Software Dissemination Plans Cell SDK in use already for research and courses We intend to make all libraries and software freelyavailable as open source under a BSD-like license as both source code and precompiled binaries Planned release mechanisms Power.org Sourceforge Georgia Tech web pages Students already using power.org David A. Bader, Cell Ecosystem 10
Student Involvement and Interest in Cell 4+ year history of student involvement and interest in Cell Already producing a pipeline of undergraduate and graduate students with Cell expertise Tao Zhang Co-authored the first Cell paper in PACT 2005 and an IBM Systems Journal Co-Author on the patent filed on Software Cache by IBM in 2006 joined IBM T.J. Watson Cell Compiler Group as a Research Staff Member in Fall 2006 after completing his Ph.D. under Santosh Pande Vipin Sachdeva Bader MS student involved with IBM PERCS, DARPA HPCS Expertise developed with IBM Mambo investigated performance of Bioinformatics applications on Cell currently at IBM Austin Research Lab s performance group Currently there is a large body of students working on the different dimensions of Cell throughout Georgia Tech s College of Computing David A. Bader, Cell Ecosystem 11
List Ranking on the Cell Processor (Virat Agarwal) Cell performs well for applications with predictable memory access patterns [Williams et. al. 2006] Can Cell also perform well for applications that exhibit irregular memory access patterns? Non-contiguous accesses to global data structures with low degrees of locality Design issues Frequent DMA transfers required to fetch successor elements. No significant computation in the algorithm, thus communication creates a bottleneck. Need to hide DMA latency by overlapping computation with communication. Innovation: A Cell latency-hiding technique Virat will present this work tomorrow. David A. Bader, Cell Ecosystem 12
Performance Analysis 10000 Running Time of List-ranking on Cell: Random Lists Cell (best sequential algorithm on PPE-only) Cell (our algorithm) Running time (msec) 1000 100 10 Overall speedup of 8.34 over an efficient PPE-only sequential implementation 1 16 17 18 19 20 21 22 23 log(list size) David A. Bader, Cell Ecosystem 13
Cell Evangelism! David A. Bader, Cell Ecosystem 14