Characterizing and Improving the Performance of Intel Threading Building Blocks
|
|
- Linda Clark
- 6 years ago
- Views:
Transcription
1 Characterizing and Improving the Performance of Intel Threading Building Blocks Gilberto Contreras, Margaret Martonosi Princeton University IISWC 08
2 Motivation Chip Multiprocessors are the new computing platform. 2 cores, 4 cores, 8 cores Are we ready? Why is parallelism so challenging? Identify parallelism Annotation/extraction of parallelism Mapping to cores Respond to: OS effects Thermal emergencies Variability trends Reliability issues 2
3 How is Parallelism Annotated/Extracted Compiler DSWP, SUIF, Polaris Other Parallel Languages Cilk, StreamIT, Linda Orca, Parloc, Emerald, etc Parallelization Libraries TBB, OpenMP, Java threads, pthreads, CUDA, etc This work answers the following questions: What are some of the major sources of overheads? How do they impact overall parallelism performance? How can we improve parallelism performance?
4 Our work focus This talk will focus on the Intel Threading Building Blocks (TBB) Task-based parallelization library for C++ applications Support a wide range of parallelism types Utilizes task stealing for load balancing Methodology is applicable to other parallelism management approaches
5 Presentation Outline Description of TBB Programming example Task management in TBB Characterization Methodology Measuring basic operations using simulation and real-system measurements TBB overheads in PARSEC benchmarks Performance of Task Stealing Improving TBB Occupancy-based task stealing Summary and Conclusions 5
6 Annotation and Management class LUWork { for (i=k+1; i<size; i++) { void operator()(cost blocked_range<int> &b){ for L[i][k] (i=b.begin(); = M[i][k] i!=b.end(); / M[k][k]; i++) { L[i][k] for(j=i+1; = M[i][k] j<size; / M[k][k]; j++) for(j=i+1; M[i][j] j<size; = M[i][j] j++) M[i][j] = M[i][j] L[i][k]*M[k][j]; L[i][k]*M[k][j]; } } } } LUWork work(l,m,k,size); parallel_for(blocked_range( k+1, size, CHUNK_SIZE),work); TBB chunk size worker 1 worker 1 Work Scheduler worker 2 worker 2 worker 3 worker 3 Runtime procedure: spawn() acquire_queue() get_task() spawn() spawn() steal() acquire_queue() get_task() spawn() steal() acquire_queue() get_task() 6
7 Reducing TBB Library Overhead? Understand Overheads Creating tasks spawn() Assigning tasks to worker threads get_task() queue_acquire() wait_for_all() Stealing or rebalancing parallelism steal() Improve parallelism reorganization policies Employ smart redistribution policies Make this as fast and as efficient as possible 7
8 Methodology Benchmarks PARSEC Microbenchmarks Intel Threading Building Blocks (TBB) Open source 2.0 version Real CMP System 4-core AMD system (2 processors) 4GB RAM Linux 2.6 Oprofile is used for performance counter measurements Cycle-accurate CMP simulator 2-issue, in-order cores 32KB D$ (coherent), 32K I$ 8MB shared L2 cache MSI directory-based coherence protocol Mesh network, 32b BW/port/cycle 8
9 Cost of Parallelism Management Simulation Results (4-32 cores) cores 8 cores 12 cores 16 cores 32 cores Cycles get_task spawn stealing (successful) stealing (unsuccessful) acquire_queue wait_for_all Runtime activity 9
10 TBB Overheads: PARSEC Scheduler Waiting Synchronization Stealing 30% fluidanimate 30% swaptions 41% 47% Average time per core 25% 20% 15% 10% 5% Average time per core 25% 20% 15% 10% 5% 0% P8 P12 P16 P25 P32 0% P8 P12 P16 P25 P32 Number of cores Number of cores 30% blackscholes 30% streamcluster 34% 54% Average time per core 25% 20% 15% 10% 5% Average time per core 25% 20% 15% 10% 5% 0% P8 P12 P16 P25 P32 0% P8 P12 P16 P25 P32 Number of cores Number of cores
11 Improving Stealing TBB utilizes random stealing as its victim selection policy Success Rate False Negatives 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% P4 P8 P12 P16 P25 P32 P4 P8 P12 P16 P25 P32 P4 P8 P12 P16 P25 P32 Bitcounter LU Matmult 11
12 Occupancy-based Stealing Random Stealing Occupancy-based stealing Scheduler Scheduler worker 1 worker 1 worker 2 worker 2 worker 3 worker 3 worker 4 worker 4 worker 1 worker 1 worker 2 worker 2 worker 3 worker 3 worker 4 worker 4 2 Random stealing: Random number Stealing Occupancy stealing: Scanning Stealing
13 Performance of Occupancy-based Stealing Occupancy-Based 1-cycle scan Normal stealing 1-cycle scan 1-cycle stealing P16 P25 P32 P16 P25 P32 P16 P25 P32 Bitcounter 2.5% 2.5% 2.7% 2.4% 2.8% 3.7% 4.7% 6.9% 7.8% LU 10% 4.1% 9.7% 10.2% 4.6% 8.0% 16% 10.4% 20.6% Matmult 9.5% 6% 19% 9.8% 7.0% 21.1% 10.8% 9.8% 28.7% Smarter selection policies are desired High potential in overhead reduction 13
14 Conclusions Increasing usage of TBB makes it a prime candidate for in-depth characterization Parallelization libraries help, but tend to exhibit high (dynamic) overheads (>40% at 32 cores) Understanding software overheads is the first step in creating high-performance parallel systems We have presented a detailed characterization of the Intel Threading building Blocks and implemented occupancy-based stealing (19% performance over random stealing). 14
15 Thanks! 15
16 Summary Programmers require tools that allows them to take (fast) advantage of increasing core counts. Parallelization libraries help, but tend to exhibit high (dynamic) overheads (>40% at 32 cores) Understanding software overheads is the first step in creating high-performance parallel systems We have presented a detailed characterization of the Intel Threading building Blocks and implemented occupancy-based stealing (19% performance over random stealing). 16
17 Cost of Parallelism Management 4 core, 1.8GHz AMD system Oprofile configured to measure CPU_CLK_UNHALTED 1 to 32 core CMP simulator 2-issue, in-order cores Shared L core 2 cores 3 cores 4 cores cores 8 cores 12 cores 16 cores 32 cores Cycles Cycles get_task spawn steal acquire_queue wait_for_all 0 get_task spawn stealing (successful) stealing (unsuccessful) acquire_queue wait_for_all Runtime activity Runtime activity Our goals: 1) Reduce per-event overheads 2) Improve rebalancing
18 Static versus Dynamic Management 32 Static (pthread) 32 TBB 32 PARSEC Speedup Number of cores Speedup Number of cores Speedup Number of cores fluidanimate swaptions blackscholes
Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips
Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips Timothy N. Miller, Xiang Pan, Renji Thomas, Naser Sedaghati, Radu Teodorescu
More informationPerformance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System
Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Ho Young Kim, Robert Maxwell, Ankil Patel, Byeong Kil Lee Abstract The purpose of this study is to analyze and compare the
More informationDemocratising Parallel Software
ParaFormance TM Democratising Parallel Software Chris Brown @paraformance www.paraformance.com chris@paraformance.com A Scottish Startup 600k Scottish Enterprise grant money so far built on over 7M of
More informationFinal Report: DBmbench
18-741 Final Report: DBmbench Yan Ke (yke@cs.cmu.edu) Justin Weisz (jweisz@cs.cmu.edu) Dec. 8, 2006 1 Introduction Conventional database benchmarks, such as the TPC-C and TPC-H, are extremely computationally
More informationCherry Picking: Exploiting Process Variations in the Dark Silicon Era
Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Siddharth Garg University of Waterloo Co-authors: Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu # Transistors Power/Dark
More informationRecent Advances in Simulation Techniques and Tools
Recent Advances in Simulation Techniques and Tools Yuyang Li, li.yuyang(at)wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download Abstract: Simulation refers to using specified kind
More informationImproving GPU Performance via Large Warps and Two-Level Warp Scheduling
Improving GPU Performance via Large Warps and Two-Level Warp Scheduling Veynu Narasiman The University of Texas at Austin Michael Shebanow NVIDIA Chang Joo Lee Intel Rustam Miftakhutdinov The University
More informationCharacterizing, Optimizing, and Auto-Tuning Applications for Energy Efficiency
PhD Dissertation Proposal Characterizing, Optimizing, and Auto-Tuning Applications for Efficiency Wei Wang The Committee: Chair: Dr. John Cavazos Member: Dr. Guang R. Gao Member: Dr. James Clause Member:
More informationProject 5: Optimizer Jason Ansel
Project 5: Optimizer Jason Ansel Overview Project guidelines Benchmarking Library OoO CPUs Project Guidelines Use optimizations from lectures as your arsenal If you decide to implement one, look at Whale
More informationSmart-M3-Based Robot Interaction in Cyber-Physical Systems
FRUCT 16, Oulu, Finland October 30, 2014 Smart-M3-Based Robot Interaction in Cyber-Physical Systems Nikolay Teslya *, Sergey Savosin * * St. Petersburg Institute for Informatics and Automation of the Russian
More informationCUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads
Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA
More informationExploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs
Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Michael Gordon, William Thies, and Saman Amarasinghe Massachusetts Institute of Technology ASPLOS October 2006 San Jose,
More informationHigh Performance Computing for Engineers
High Performance Computing for Engineers David Thomas dt10@ic.ac.uk / https://github.com/m8pple Room 903 http://cas.ee.ic.ac.uk/people/dt10/teaching/2014/hpce HPCE / dt10/ 2015 / 0.1 High Performance Computing
More informationA-WiNoC: Adaptive Wireless Network-on-Chip Architecture for Chip Multiprocessors
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL., NO., MONTH YEAR : Adaptive Wireless Network-on-Chip Architecture for Chip Multiprocessors Dominic DiTomaso, Student Member, IEEE, Avinash Kodi, Senior
More informationBig versus Little: Who will trip?
Big versus Little: Who will trip? Reena Panda University of Texas at Austin reena.panda@utexas.edu Christopher Donald Erb University of Texas at Austin cde593@utexas.edu Lizy Kurian John University of
More informationIEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, VOL. 1, NO. 1, JANUARY
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 1.119/TMSCS.218.287438,
More informationTrack and Vertex Reconstruction on GPUs for the Mu3e Experiment
Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Dorothea vom Bruch for the Mu3e Collaboration GPU Computing in High Energy Physics, Pisa September 11th, 2014 Physikalisches Institut Heidelberg
More informationUnderstanding the Interactions Hardware/Software Parameters on the Energy Consumption of Multi-Threaded Applications
Understanding the Interactions Hardware/Software Parameters on the Energy Consumption of Multi-Threaded Applications Jeyarajan Thiyagalingam, Anne E. Trefethen April 29, 2014 Abstract In recent years,
More informationPerformance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics
Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Performance Metrics http://www.yildiz.edu.tr/~naydin 1 2 Objectives How can we meaningfully measure and compare
More informationDynamic Adaptive Operating Systems -- I/O
Dynamic Adaptive Operating Systems -- I/O Seetharami R. Seelam Patricia J. Teller University of Texas at El Paso El Paso, TX 16 November 2005 SC 05, Seattle, WA 1 Goals Present a summary of our ongoing
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution
More informationEarly Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida
Early Adopter : Multiprocessor Programming in the Undergraduate Program NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Narsingh Deo Damian Dechev Mahadevan Vasudevan Department
More informationMosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes
Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Rachata Ausavarungnirun Joshua Landgraf Vance Miller Saugata Ghose Jayneel Gandhi Christopher J. Rossbach Onur
More informationHARDWARE ACCELERATION OF THE GIPPS MODEL
HARDWARE ACCELERATION OF THE GIPPS MODEL FOR REAL-TIME TRAFFIC SIMULATION Salim Farah 1 and Magdy Bayoumi 2 The Center for Advanced Computer Studies, University of Louisiana at Lafayette, USA 1 snf3346@cacs.louisiana.edu
More informationSelf-Aware Adaptation in FPGAbased
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Self-Aware Adaptation in FPGAbased Systems IEEE FPL 2010 Filippo Siorni: filippo.sironi@dresd.org Marco Triverio: marco.triverio@dresd.org Martina Maggio: mmaggio@mit.edu
More informationAn Energy Conservation DVFS Algorithm for the Android Operating System
Volume 1, Number 1, December 2010 Journal of Convergence An Energy Conservation DVFS Algorithm for the Android Operating System Wen-Yew Liang* and Po-Ting Lai Department of Computer Science and Information
More informationEvaluation of CPU Frequency Transition Latency
Noname manuscript No. (will be inserted by the editor) Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz Alexandre Laurent Benoît Pradelle William Jalby Abstract Dynamic Voltage and Frequency
More informationDynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance
Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance Randy Morris Ϯ, Avinash Kodi Ϯ and Ahmed Louri School of Electrical Engineering and Computer
More informationMultiple Clock and Voltage Domains for Chip Multi Processors
Multiple Clock and Voltage Domains for Chip Multi Processors Efraim Rotem- Intel Corporation Israel Avi Mendelson- Microsoft R&D Israel Ran Ginosar- Technion Israel institute of Technology Uri Weiser-
More informationCustomized Computing for Power Efficiency. There are Many Options to Improve Performance
ustomized omputing for Power Efficiency Jason ong cong@cs.ucla.edu ULA omputer Science Department http://cadlab.cs.ucla.edu/~cong There are Many Options to Improve Performance Page 1 Past Alternatives
More informationDocument downloaded from:
Document downloaded from: http://hdl.handle.net/1251/64738 This paper must be cited as: Reaño González, C.; Pérez López, F.; Silla Jiménez, F. (215). On the design of a demo for exhibiting rcuda. 15th
More informationSW simulation and Performance Analysis
SW simulation and Performance Analysis In Multi-Processing Embedded Systems Eugenio Villar University of Cantabria Context HW/SW Embedded Systems Design Flow HW/SW Simulation Performance Analysis Design
More informationParallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism
Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism Sangpil Lee and Won Woo Ro School of Electrical and Electronic Engineering Yonsei University Seoul, Republic of
More informationEvaluation of CPU Frequency Transition Latency
Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz 1 Alexandre Laurent 1 Benoît Pradelle 1 William Jalby 1 1 University of Versailles Saint-Quentin-en-Yvelines, France ENA-HPC 2013, Dresden
More informationCUDA-Accelerated Satellite Communication Demodulation
CUDA-Accelerated Satellite Communication Demodulation Renliang Zhao, Ying Liu, Liheng Jian, Zhongya Wang School of Computer and Control University of Chinese Academy of Sciences Outline Motivation Related
More informationGetting to Work with OpenPiton. Princeton University. OpenPit
Getting to Work with OpenPiton Princeton University http://openpiton.org OpenPit ASIC SYNTHESIS AND BACKEND 2 Whats in the Box? Synthesis Synopsys Design Compiler Static timing analysis (STA) Synopsys
More informationA Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability
A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability L. Wanner, C. Apte, R. Balani, Puneet Gupta, and Mani Srivastava University of California, Los Angeles puneet@ee.ucla.edu
More informationArchitectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance
Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance Michael D. Powell, Arijit Biswas, Shantanu Gupta, and Shubu Mukherjee SPEARS Group, Intel Massachusetts EECS, University
More informationΕΠΛ 605: Προχωρημένη Αρχιτεκτονική
ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Presentation of UniServer Horizon 2020 European project findings: X-Gene server chips, voltage-noise characterization, high-bandwidth voltage measurements,
More informationHardware-Software Interaction for Run-time Power Optimization: A Case Study of Embedded Linux on Multicore Smartphones
Hardware-Software Interaction for Run-time Optimization: A Case Study of Embedded Linux on Multicore Smartphones Anup Das, Matthew J. Walker, Andreas Hansson, Bashir M. Al-Hashimi and Geoff V. Merrett
More informationThe Who. Intel - no introduction required.
Delivering Demand-Based Worlds with Intel SSD GDC 2011 The Who Intel - no introduction required. Digital Extremes - In addition to be great developers of AAA games, they are also the authors of the Evolution
More informationTowards Real-Time Volunteer Distributed Computing
Towards Real-Time Volunteer Distributed Computing Sangho Yi 1, Emmanuel Jeannot 2, Derrick Kondo 1, David P. Anderson 3 1 INRIA MESCAL, 2 RUNTIME, France 3 UC Berkeley, USA Motivation Push towards large-scale,
More informationParallelism Across the Curriculum
Parallelism Across the Curriculum John E. Howland Department of Computer Science Trinity University One Trinity Place San Antonio, Texas 78212-7200 Voice: (210) 999-7364 Fax: (210) 999-7477 E-mail: jhowland@trinity.edu
More informationCombined Dynamic Thermal Management Exploiting Broadcast-Capable Wireless Networkon-Chip
Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 3-18-2016 Combined Dynamic Thermal Management Exploiting Broadcast-Capable Wireless Networkon-Chip Architecture
More informationThe Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance
The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance Aroon Nataraj, Alan Morris, Allen Malony, Matthew Sottile, Pete Beckman l {anataraj, amorris, malony,
More informationPROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs
PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs Li Zhou and Avinash Kodi Technologies for Emerging Computer Architecture Laboratory (TEAL) School of Electrical Engineering and
More informationGame Architecture. 4/8/16: Multiprocessor Game Loops
Game Architecture 4/8/16: Multiprocessor Game Loops Monolithic Dead simple to set up, but it can get messy Flow-of-control can be complex Top-level may have too much knowledge of underlying systems (gross
More informationProcessors Processing Processors. The meta-lecture
Simulators 5SIA0 Processors Processing Processors The meta-lecture Why Simulators? Your Friend Harm Why Simulators? Harm Loves Tractors Harm Why Simulators? The outside world Unfortunately for Harm you
More informationComputational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs
5 th International Conference on Logic and Application LAP 2016 Dubrovnik, Croatia, September 19-23, 2016 Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs
More informationAn Artificial Neural Networks based Temperature Prediction Framework for Network-on-Chip based Multicore Platform
Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 3-2016 An Artificial Neural Networks based Temperature Prediction Framework for Network-on-Chip based Multicore
More informationPerspective platforms for BOINC distributed computing network
Perspective platforms for BOINC distributed computing network Vitalii Koshura Lohika Odessa, Ukraine lestat.de.lionkur@gmail.com Profile page: https://www.linkedin.com/in/aenbleidd/ Abstract This paper
More informationAdaptive Guardband Scheduling to Improve System-Level Efficiency of the POWER7+
Adaptive Guardband Scheduling to Improve System-Level Efficiency of the POWER7+ Yazhou Zu 1, Charles R. Lefurgy, Jingwen Leng 1, Matthew Halpern 1, Michael S. Floyd, Vijay Janapa Reddi 1 1 The University
More informationEnergy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control
Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control Guangyi Cao and Arun Ravindran Department of Electrical and Computer Engineering University of North Carolina at Charlotte
More informationCLIPPER: Counter-based Low Impact Processor Power Estimation at Run-time
CLIPPER: Counter-based Low Impact Processor Power Estimation at Run-time Jorgen Peddersen, Sri Parameswaran School of Computer Science and Engineering The University of New South Wales & National ICT Australia
More informationRevisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence
Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence Katayoun Neshatpour George Mason University kneshatp@gmu.edu Amin Khajeh Broadcom Corporation amink@broadcom.com Houman Homayoun
More informationReal Time Visualization of Full Resolution Data of Indian Remote Sensing Satellite
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 8, Issue 9 (September 2013), PP. 42-51 Real Time Visualization of Full Resolution
More informationComputer Architecture A Quantitative Approach
Computer Architecture A Quantitative Approach Fourth Edition John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With Contributions by Andrea C. Arpaci-Dusseau
More informationScalable and Lightweight CTF Infrastructures Using Application Containers
Scalable and Lightweight CTF Infrastructures Using Application Containers Arvind S Raj, Bithin Alangot, Seshagiri Prabhu and Krishnashree Achuthan Amrita Center for Cybersecurity Systems and Networks Amrita
More informationParallel Randomized Best-First Search
Parallel Randomized Best-First Search Yaron Shoham and Sivan Toledo School of Computer Science, Tel-Aviv Univsity http://www.tau.ac.il/ stoledo, http://www.tau.ac.il/ ysh Abstract. We describe a novel
More informationFrom the New York Times Introduction to Concurrency
From the New York Times Introduction to Concurrency dapted by CP from lectures by Maurice Herlihy at rown SN FRNCISCO, May 7. 2004 - Intel said on Friday that it was scrapping its development of two microprocessors,
More informationFIFO WITH OFFSETS HIGH SCHEDULABILITY WITH LOW OVERHEADS. RTAS 18 April 13, Björn Brandenburg
FIFO WITH OFFSETS HIGH SCHEDULABILITY WITH LOW OVERHEADS RTAS 18 April 13, 2018 Mitra Nasri Rob Davis Björn Brandenburg FIFO SCHEDULING First-In-First-Out (FIFO) scheduling extremely simple very low overheads
More informationDICELIB: A REAL TIME SYNCHRONIZATION LIBRARY FOR MULTI-PROJECTION VIRTUAL REALITY DISTRIBUTED ENVIRONMENTS
DICELIB: A REAL TIME SYNCHRONIZATION LIBRARY FOR MULTI-PROJECTION VIRTUAL REALITY DISTRIBUTED ENVIRONMENTS Abstract: The recent availability of PC-clusters offers an alternative solution instead of high-end
More informationLecture #29. Moore s Law
Lecture #29 ANNOUNCEMENTS HW#15 will be for extra credit Quiz #6 (Thursday 5/8) will include MOSFET C-V No late Projects will be accepted after Thursday 5/8 The last Coffee Hour will be held this Thursday
More informationMicroarchitectural Attacks and Defenses in JavaScript
Microarchitectural Attacks and Defenses in JavaScript Michael Schwarz, Daniel Gruss, Moritz Lipp 25.01.2018 www.iaik.tugraz.at 1 Michael Schwarz, Daniel Gruss, Moritz Lipp www.iaik.tugraz.at Microarchitecture
More informationLiu Yang, Bong-Joo Jang, Sanghun Lim, Ki-Chang Kwon, Suk-Hwan Lee, Ki-Ryong Kwon 1. INTRODUCTION
Liu Yang, Bong-Joo Jang, Sanghun Lim, Ki-Chang Kwon, Suk-Hwan Lee, Ki-Ryong Kwon 1. INTRODUCTION 2. RELATED WORKS 3. PROPOSED WEATHER RADAR IMAGING BASED ON CUDA 3.1 Weather radar image format and generation
More informationApplication of Maxwell Equations to Human Body Modelling
Application of Maxwell Equations to Human Body Modelling Fumie Costen Room E, E0c at Sackville Street Building, fc@cs.man.ac.uk The University of Manchester, U.K. February 5, 0 Fumie Costen Room E, E0c
More informationThermal Management of Manycore Systems with Silicon-Photonic Networks
Thermal Management of Manycore Systems with Silicon-Photonic Networks Tiansheng Zhang, José L. Abellán, Ajay Joshi, Ayse K. Coskun Electrical and Computer Engineering Department, Boston University, Boston,
More informationMeasurement Driven Deployment of a Two-Tier Urban Mesh Access Network
Measurement Driven Deployment of a Two-Tier Urban Mesh Access Network J. Camp, J. Robinson, C. Steger, E. Knightly Rice Networks Group MobiSys 2006 6/20/06 Two-Tier Mesh Architecture Limited Gateway Nodes
More informationInvestigating the Post Processing of LS-DYNA in a Fully Immersive Workflow Environment
Investigating the Post Processing of LS-DYNA in a Fully Immersive Workflow Environment Ed Helwig 1, Facundo Del Pin 2 1 Livermore Software Technology Corporation, Livermore CA 2 Livermore Software Technology
More informationTLSync: Support for Multiple Fast Barriers Using On-Chip Transmission Lines
TLSync: Support for Multiple Fast Barriers Using On-Chip Transmission Lines Jungju Oh jungju@gatech.edu Milos Prvulovic milos@cc.gatech.edu Georgia Institute of Technology Atlanta, GA, USA Alenka Zajic
More informationTowards Scalable 1024 Processor Shared Memory Systems
Towards Scalable Processor Shared Memory Systems Robert B. Ciotti (ciotti@nas.nasa.gov), NASA Advanced Supercomputing (NAS) NASA Ames Research Center, Moffett Field, CA 95 ABSTRACT: Over the past years,
More informationHere are additional tips that we hope you will find useful in the future.
Bobbin and Needle Type Here are additional tips that we hope you will find useful in the future. Question: What type of bobbins and needles should I use with my SINGER Futura embroidery & sewing machine?
More informationOutline Simulators and such. What defines a simulator? What about emulation?
Outline Simulators and such Mats Brorsson & Mladen Nikitovic ICT Dept of Electronic, Computer and Software Systems (ECS) What defines a simulator? Why are simulators needed? Classifications Case studies
More informationArduino STEAM Academy Arduino STEM Academy Art without Engineering is dreaming. Engineering without Art is calculating. - Steven K.
Arduino STEAM Academy Arduino STEM Academy Art without Engineering is dreaming. Engineering without Art is calculating. - Steven K. Roberts Page 1 See Appendix A, for Licensing Attribution information
More informationWelcome to the Early Beta and Thank You for Your Continued Support!
REFERENCE CARD Welcome to the Early Beta and Thank You for Your Continued Support! In addition to the information below, we ve recently added tutorial messages to the game. Remember to look for the in-game
More informationDEMIGOD DEMIGOD. characterize stalls and pop-ups during game play. Serious gamers play games at their maximum settings driving HD monitors.
Intel Solid-State Drives (Intel SSDs) are revolutionizing storage performance on desktop and laptop PCs, delivering dramatically faster load times than hard disk drives (HDDs). When Intel SSDs are used
More informationDr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system
Name: Affiliation: Field of research: Specific Field of Study: Proposed Research Topic: Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar Information Science and Technology Computer Science
More informationVRCon: Dynamic Reconfiguration of Voltage Regulators in a Multicore Platform
VRCon: Dynamic Reconfiguration of Voltage Regulators in a Multicore Platform Woojoo Lee, Yanzhi Wang, and Massoud Pedram Dept. of Electrical Engineering, Univ. of Souther California, Los Angeles, California,
More informationRIVA. Real-Time Image Viewing and Acquisition. Version 1.0. User's Manual
1 RIVA Real-Time Image Viewing and Acquisition Version 1.0 User's Manual Susan Jarmuz and Dr. Mariusz Jankowski University of Southern Maine Gorham, Maine mjkcc@usm.maine.edu 2 2000-2001 Susan Jarmuz and
More informationDESKTOP APPLICATION FOR RED-GREEN COLOR BLINDNESS. Mahendra Sunt Servanda Informatics Engineering Supervisor: Dr.
DESKTOP APPLICATION FOR RED-GREEN COLOR BLINDNESS Mahendra Sunt Servanda 54411250 Informatics Engineering Supervisor: Dr. Yuhilza Hanum INTRODUCTION Background Image and colors now have important role
More informationParallel Simulation of Social Agents using Cilk and OpenCL
D. Moser, A. Riener, K. Zia, A. Ferscha Department for Pervasive Computing, JKU Linz/Austria Parallel Simulation of Social Agents using Cilk and OpenCL DS-RT 2011 15th International Symposium on Distributed
More informationChallenges in Transition
Challenges in Transition Keynote talk at International Workshop on Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC 2016) 1 Kazuaki Ishizaki IBM Research Tokyo kiszk@acm.org
More informationAndes Game Platform Porting
Andes Game Platform Porting Andes Technology Architecture for Next-generation Digital Engines for SoC Outline Porting guide System Architecture Package dependency Game package details Performance issue
More informationOverview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture
Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of
More informationEnergy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture
Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture Jingwen Leng Yazhou Zu Vijay Janapa Reddi The University of Texas at Austin {jingwen, yazhou.zu}@utexas.edu,
More informationUsing ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors
Using ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors Anys Bacha Computer Science and Engineering The Ohio State University bacha@cse.ohio-state.edu Radu Teodorescu Computer Science
More informationConsole Architecture 1
Console Architecture 1 Overview What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design PS3 in detail
More informationMitigating Parameter Variation with Dynamic Fine-Grain Body Biasing *
Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing * Radu Teodorescu, Jun Nakano, Abhishek Tiwari and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu
More informationFREQUENTLY ASKED QUESTIONS
FREQUENTLY ASKED QUESTIONS TABLE OF CONTENTS ABOUT STRATEGY AND COMMUNITY SUPPORT GAMEPLAY STEAM EARLY ACCESS LANGUAGES MISCELLANEOUS 3 5 5 6 9 10 10 Axis & Allies Online FAQ TABLE OF CONTENTS 2 ABOUT
More informationUse Nvidia Performance Primitives (NPP) in Deep Learning Training. Yang Song
Use Nvidia Performance Primitives (NPP) in Deep Learning Training Yang Song Outline Introduction Function Categories Performance Results Deep Learning Specific Further Information What is NPP? Image+Signal
More informationCS4961 Parallel Programming. Lecture 1: Introduction 08/24/2010. Course Details Time and Location: TuTh, 9:10-10:30 AM, WEB L112 Course Website
Parallel Programming Lecture 1: Introduction Mary Hall August 24, 2010 1 Course Details Time and Location: TuTh, 9:10-10:30 AM, WEB L112 Course Website - http://www.eng.utah.edu/~cs4961/ Instructor: Mary
More informationTable of Contents HOL EMT
Table of Contents Lab Overview - - Machine Learning Workloads in vsphere Using GPUs - Getting Started... 2 Lab Guidance... 3 Module 1 - Machine Learning Apps in vsphere VMs Using GPUs (15 minutes)...9
More informationLatency-aware DVFS for Efficient Power State Transitions on Many-core Architectures
J Supercomput manuscript No. (will be inserted by the editor) Latency-aware DVFS for Efficient Power State Transitions on Many-core Architectures Zhiquan Lai King Tin Lam Cho-Li Wang Jinshu Su Received:
More informationRamon Canal NCD Master MIRI. NCD Master MIRI 1
Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/
More informationCS434/534: Topics in Networked (Networking) Systems
CS434/534: Topics in Networked (Networking) Systems Improve Wireless Capacity; Programmable Wireless Networks Yang (Richard) Yang Computer Science Department Yale University 208A Watson Email: yry@cs.yale.edu
More informationWarp-Aware Trace Scheduling for GPUS. James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown)
Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown) Historical Trends in GFLOPS: CPUs vs. GPUs Theoretical GFLOP/s 3250 3000 2750 2500
More informationMonte Carlo integration and event generation on GPU and their application to particle physics
Monte Carlo integration and event generation on GPU and their application to particle physics Junichi Kanzaki (KEK) GPU2016 @ Rome, Italy Sep. 26, 2016 Motivation Increase of amount of LHC data (raw &
More informationArtificial Intelligence for Games. Santa Clara University, 2012
Artificial Intelligence for Games Santa Clara University, 2012 Introduction Class 1 Artificial Intelligence for Games What is different Gaming stresses computing resources Graphics Engine Physics Engine
More informationBIO Helmet EEL 4914 Senior Design I Group # 3 Frank Alexin Nicholas Dijkhoffz Adam Hollifield Mark Le
BIO Helmet EEL 4914 Senior Design I Group # 3 Frank Alexin Nicholas Dijkhoffz Adam Hollifield Mark Le Project Description and Motivation The goal of this project is to create and integrate a system that
More informationMulti-core Platforms for
20 JUNE 2011 Multi-core Platforms for Immersive-Audio Applications Course: Advanced Computer Architectures Teacher: Prof. Cristina Silvano Student: Silvio La Blasca 771338 Introduction on Immersive-Audio
More information