Characterizing and Improving the Performance of Intel Threading Building Blocks

Size: px
Start display at page:

Download "Characterizing and Improving the Performance of Intel Threading Building Blocks"

Transcription

1 Characterizing and Improving the Performance of Intel Threading Building Blocks Gilberto Contreras, Margaret Martonosi Princeton University IISWC 08

2 Motivation Chip Multiprocessors are the new computing platform. 2 cores, 4 cores, 8 cores Are we ready? Why is parallelism so challenging? Identify parallelism Annotation/extraction of parallelism Mapping to cores Respond to: OS effects Thermal emergencies Variability trends Reliability issues 2

3 How is Parallelism Annotated/Extracted Compiler DSWP, SUIF, Polaris Other Parallel Languages Cilk, StreamIT, Linda Orca, Parloc, Emerald, etc Parallelization Libraries TBB, OpenMP, Java threads, pthreads, CUDA, etc This work answers the following questions: What are some of the major sources of overheads? How do they impact overall parallelism performance? How can we improve parallelism performance?

4 Our work focus This talk will focus on the Intel Threading Building Blocks (TBB) Task-based parallelization library for C++ applications Support a wide range of parallelism types Utilizes task stealing for load balancing Methodology is applicable to other parallelism management approaches

5 Presentation Outline Description of TBB Programming example Task management in TBB Characterization Methodology Measuring basic operations using simulation and real-system measurements TBB overheads in PARSEC benchmarks Performance of Task Stealing Improving TBB Occupancy-based task stealing Summary and Conclusions 5

6 Annotation and Management class LUWork { for (i=k+1; i<size; i++) { void operator()(cost blocked_range<int> &b){ for L[i][k] (i=b.begin(); = M[i][k] i!=b.end(); / M[k][k]; i++) { L[i][k] for(j=i+1; = M[i][k] j<size; / M[k][k]; j++) for(j=i+1; M[i][j] j<size; = M[i][j] j++) M[i][j] = M[i][j] L[i][k]*M[k][j]; L[i][k]*M[k][j]; } } } } LUWork work(l,m,k,size); parallel_for(blocked_range( k+1, size, CHUNK_SIZE),work); TBB chunk size worker 1 worker 1 Work Scheduler worker 2 worker 2 worker 3 worker 3 Runtime procedure: spawn() acquire_queue() get_task() spawn() spawn() steal() acquire_queue() get_task() spawn() steal() acquire_queue() get_task() 6

7 Reducing TBB Library Overhead? Understand Overheads Creating tasks spawn() Assigning tasks to worker threads get_task() queue_acquire() wait_for_all() Stealing or rebalancing parallelism steal() Improve parallelism reorganization policies Employ smart redistribution policies Make this as fast and as efficient as possible 7

8 Methodology Benchmarks PARSEC Microbenchmarks Intel Threading Building Blocks (TBB) Open source 2.0 version Real CMP System 4-core AMD system (2 processors) 4GB RAM Linux 2.6 Oprofile is used for performance counter measurements Cycle-accurate CMP simulator 2-issue, in-order cores 32KB D$ (coherent), 32K I$ 8MB shared L2 cache MSI directory-based coherence protocol Mesh network, 32b BW/port/cycle 8

9 Cost of Parallelism Management Simulation Results (4-32 cores) cores 8 cores 12 cores 16 cores 32 cores Cycles get_task spawn stealing (successful) stealing (unsuccessful) acquire_queue wait_for_all Runtime activity 9

10 TBB Overheads: PARSEC Scheduler Waiting Synchronization Stealing 30% fluidanimate 30% swaptions 41% 47% Average time per core 25% 20% 15% 10% 5% Average time per core 25% 20% 15% 10% 5% 0% P8 P12 P16 P25 P32 0% P8 P12 P16 P25 P32 Number of cores Number of cores 30% blackscholes 30% streamcluster 34% 54% Average time per core 25% 20% 15% 10% 5% Average time per core 25% 20% 15% 10% 5% 0% P8 P12 P16 P25 P32 0% P8 P12 P16 P25 P32 Number of cores Number of cores

11 Improving Stealing TBB utilizes random stealing as its victim selection policy Success Rate False Negatives 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% P4 P8 P12 P16 P25 P32 P4 P8 P12 P16 P25 P32 P4 P8 P12 P16 P25 P32 Bitcounter LU Matmult 11

12 Occupancy-based Stealing Random Stealing Occupancy-based stealing Scheduler Scheduler worker 1 worker 1 worker 2 worker 2 worker 3 worker 3 worker 4 worker 4 worker 1 worker 1 worker 2 worker 2 worker 3 worker 3 worker 4 worker 4 2 Random stealing: Random number Stealing Occupancy stealing: Scanning Stealing

13 Performance of Occupancy-based Stealing Occupancy-Based 1-cycle scan Normal stealing 1-cycle scan 1-cycle stealing P16 P25 P32 P16 P25 P32 P16 P25 P32 Bitcounter 2.5% 2.5% 2.7% 2.4% 2.8% 3.7% 4.7% 6.9% 7.8% LU 10% 4.1% 9.7% 10.2% 4.6% 8.0% 16% 10.4% 20.6% Matmult 9.5% 6% 19% 9.8% 7.0% 21.1% 10.8% 9.8% 28.7% Smarter selection policies are desired High potential in overhead reduction 13

14 Conclusions Increasing usage of TBB makes it a prime candidate for in-depth characterization Parallelization libraries help, but tend to exhibit high (dynamic) overheads (>40% at 32 cores) Understanding software overheads is the first step in creating high-performance parallel systems We have presented a detailed characterization of the Intel Threading building Blocks and implemented occupancy-based stealing (19% performance over random stealing). 14

15 Thanks! 15

16 Summary Programmers require tools that allows them to take (fast) advantage of increasing core counts. Parallelization libraries help, but tend to exhibit high (dynamic) overheads (>40% at 32 cores) Understanding software overheads is the first step in creating high-performance parallel systems We have presented a detailed characterization of the Intel Threading building Blocks and implemented occupancy-based stealing (19% performance over random stealing). 16

17 Cost of Parallelism Management 4 core, 1.8GHz AMD system Oprofile configured to measure CPU_CLK_UNHALTED 1 to 32 core CMP simulator 2-issue, in-order cores Shared L core 2 cores 3 cores 4 cores cores 8 cores 12 cores 16 cores 32 cores Cycles Cycles get_task spawn steal acquire_queue wait_for_all 0 get_task spawn stealing (successful) stealing (unsuccessful) acquire_queue wait_for_all Runtime activity Runtime activity Our goals: 1) Reduce per-event overheads 2) Improve rebalancing

18 Static versus Dynamic Management 32 Static (pthread) 32 TBB 32 PARSEC Speedup Number of cores Speedup Number of cores Speedup Number of cores fluidanimate swaptions blackscholes

Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips

Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips Timothy N. Miller, Xiang Pan, Renji Thomas, Naser Sedaghati, Radu Teodorescu

More information

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Ho Young Kim, Robert Maxwell, Ankil Patel, Byeong Kil Lee Abstract The purpose of this study is to analyze and compare the

More information

Democratising Parallel Software

Democratising Parallel Software ParaFormance TM Democratising Parallel Software Chris Brown @paraformance www.paraformance.com chris@paraformance.com A Scottish Startup 600k Scottish Enterprise grant money so far built on over 7M of

More information

Final Report: DBmbench

Final Report: DBmbench 18-741 Final Report: DBmbench Yan Ke (yke@cs.cmu.edu) Justin Weisz (jweisz@cs.cmu.edu) Dec. 8, 2006 1 Introduction Conventional database benchmarks, such as the TPC-C and TPC-H, are extremely computationally

More information

Cherry Picking: Exploiting Process Variations in the Dark Silicon Era

Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Siddharth Garg University of Waterloo Co-authors: Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu # Transistors Power/Dark

More information

Recent Advances in Simulation Techniques and Tools

Recent Advances in Simulation Techniques and Tools Recent Advances in Simulation Techniques and Tools Yuyang Li, li.yuyang(at)wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download Abstract: Simulation refers to using specified kind

More information

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling Improving GPU Performance via Large Warps and Two-Level Warp Scheduling Veynu Narasiman The University of Texas at Austin Michael Shebanow NVIDIA Chang Joo Lee Intel Rustam Miftakhutdinov The University

More information

Characterizing, Optimizing, and Auto-Tuning Applications for Energy Efficiency

Characterizing, Optimizing, and Auto-Tuning Applications for Energy Efficiency PhD Dissertation Proposal Characterizing, Optimizing, and Auto-Tuning Applications for Efficiency Wei Wang The Committee: Chair: Dr. John Cavazos Member: Dr. Guang R. Gao Member: Dr. James Clause Member:

More information

Project 5: Optimizer Jason Ansel

Project 5: Optimizer Jason Ansel Project 5: Optimizer Jason Ansel Overview Project guidelines Benchmarking Library OoO CPUs Project Guidelines Use optimizations from lectures as your arsenal If you decide to implement one, look at Whale

More information

Smart-M3-Based Robot Interaction in Cyber-Physical Systems

Smart-M3-Based Robot Interaction in Cyber-Physical Systems FRUCT 16, Oulu, Finland October 30, 2014 Smart-M3-Based Robot Interaction in Cyber-Physical Systems Nikolay Teslya *, Sergey Savosin * * St. Petersburg Institute for Informatics and Automation of the Russian

More information

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA

More information

Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs

Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Michael Gordon, William Thies, and Saman Amarasinghe Massachusetts Institute of Technology ASPLOS October 2006 San Jose,

More information

High Performance Computing for Engineers

High Performance Computing for Engineers High Performance Computing for Engineers David Thomas dt10@ic.ac.uk / https://github.com/m8pple Room 903 http://cas.ee.ic.ac.uk/people/dt10/teaching/2014/hpce HPCE / dt10/ 2015 / 0.1 High Performance Computing

More information

A-WiNoC: Adaptive Wireless Network-on-Chip Architecture for Chip Multiprocessors

A-WiNoC: Adaptive Wireless Network-on-Chip Architecture for Chip Multiprocessors TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL., NO., MONTH YEAR : Adaptive Wireless Network-on-Chip Architecture for Chip Multiprocessors Dominic DiTomaso, Student Member, IEEE, Avinash Kodi, Senior

More information

Big versus Little: Who will trip?

Big versus Little: Who will trip? Big versus Little: Who will trip? Reena Panda University of Texas at Austin reena.panda@utexas.edu Christopher Donald Erb University of Texas at Austin cde593@utexas.edu Lizy Kurian John University of

More information

IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, VOL. 1, NO. 1, JANUARY

IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, VOL. 1, NO. 1, JANUARY This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 1.119/TMSCS.218.287438,

More information

Track and Vertex Reconstruction on GPUs for the Mu3e Experiment

Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Dorothea vom Bruch for the Mu3e Collaboration GPU Computing in High Energy Physics, Pisa September 11th, 2014 Physikalisches Institut Heidelberg

More information

Understanding the Interactions Hardware/Software Parameters on the Energy Consumption of Multi-Threaded Applications

Understanding the Interactions Hardware/Software Parameters on the Energy Consumption of Multi-Threaded Applications Understanding the Interactions Hardware/Software Parameters on the Energy Consumption of Multi-Threaded Applications Jeyarajan Thiyagalingam, Anne E. Trefethen April 29, 2014 Abstract In recent years,

More information

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Performance Metrics http://www.yildiz.edu.tr/~naydin 1 2 Objectives How can we meaningfully measure and compare

More information

Dynamic Adaptive Operating Systems -- I/O

Dynamic Adaptive Operating Systems -- I/O Dynamic Adaptive Operating Systems -- I/O Seetharami R. Seelam Patricia J. Teller University of Texas at El Paso El Paso, TX 16 November 2005 SC 05, Seattle, WA 1 Goals Present a summary of our ongoing

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Early Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida

Early Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Early Adopter : Multiprocessor Programming in the Undergraduate Program NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Narsingh Deo Damian Dechev Mahadevan Vasudevan Department

More information

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Rachata Ausavarungnirun Joshua Landgraf Vance Miller Saugata Ghose Jayneel Gandhi Christopher J. Rossbach Onur

More information

HARDWARE ACCELERATION OF THE GIPPS MODEL

HARDWARE ACCELERATION OF THE GIPPS MODEL HARDWARE ACCELERATION OF THE GIPPS MODEL FOR REAL-TIME TRAFFIC SIMULATION Salim Farah 1 and Magdy Bayoumi 2 The Center for Advanced Computer Studies, University of Louisiana at Lafayette, USA 1 snf3346@cacs.louisiana.edu

More information

Self-Aware Adaptation in FPGAbased

Self-Aware Adaptation in FPGAbased DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Self-Aware Adaptation in FPGAbased Systems IEEE FPL 2010 Filippo Siorni: filippo.sironi@dresd.org Marco Triverio: marco.triverio@dresd.org Martina Maggio: mmaggio@mit.edu

More information

An Energy Conservation DVFS Algorithm for the Android Operating System

An Energy Conservation DVFS Algorithm for the Android Operating System Volume 1, Number 1, December 2010 Journal of Convergence An Energy Conservation DVFS Algorithm for the Android Operating System Wen-Yew Liang* and Po-Ting Lai Department of Computer Science and Information

More information

Evaluation of CPU Frequency Transition Latency

Evaluation of CPU Frequency Transition Latency Noname manuscript No. (will be inserted by the editor) Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz Alexandre Laurent Benoît Pradelle William Jalby Abstract Dynamic Voltage and Frequency

More information

Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance

Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance Randy Morris Ϯ, Avinash Kodi Ϯ and Ahmed Louri School of Electrical Engineering and Computer

More information

Multiple Clock and Voltage Domains for Chip Multi Processors

Multiple Clock and Voltage Domains for Chip Multi Processors Multiple Clock and Voltage Domains for Chip Multi Processors Efraim Rotem- Intel Corporation Israel Avi Mendelson- Microsoft R&D Israel Ran Ginosar- Technion Israel institute of Technology Uri Weiser-

More information

Customized Computing for Power Efficiency. There are Many Options to Improve Performance

Customized Computing for Power Efficiency. There are Many Options to Improve Performance ustomized omputing for Power Efficiency Jason ong cong@cs.ucla.edu ULA omputer Science Department http://cadlab.cs.ucla.edu/~cong There are Many Options to Improve Performance Page 1 Past Alternatives

More information

Document downloaded from:

Document downloaded from: Document downloaded from: http://hdl.handle.net/1251/64738 This paper must be cited as: Reaño González, C.; Pérez López, F.; Silla Jiménez, F. (215). On the design of a demo for exhibiting rcuda. 15th

More information

SW simulation and Performance Analysis

SW simulation and Performance Analysis SW simulation and Performance Analysis In Multi-Processing Embedded Systems Eugenio Villar University of Cantabria Context HW/SW Embedded Systems Design Flow HW/SW Simulation Performance Analysis Design

More information

Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism

Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism Sangpil Lee and Won Woo Ro School of Electrical and Electronic Engineering Yonsei University Seoul, Republic of

More information

Evaluation of CPU Frequency Transition Latency

Evaluation of CPU Frequency Transition Latency Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz 1 Alexandre Laurent 1 Benoît Pradelle 1 William Jalby 1 1 University of Versailles Saint-Quentin-en-Yvelines, France ENA-HPC 2013, Dresden

More information

CUDA-Accelerated Satellite Communication Demodulation

CUDA-Accelerated Satellite Communication Demodulation CUDA-Accelerated Satellite Communication Demodulation Renliang Zhao, Ying Liu, Liheng Jian, Zhongya Wang School of Computer and Control University of Chinese Academy of Sciences Outline Motivation Related

More information

Getting to Work with OpenPiton. Princeton University. OpenPit

Getting to Work with OpenPiton. Princeton University.   OpenPit Getting to Work with OpenPiton Princeton University http://openpiton.org OpenPit ASIC SYNTHESIS AND BACKEND 2 Whats in the Box? Synthesis Synopsys Design Compiler Static timing analysis (STA) Synopsys

More information

A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability

A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability L. Wanner, C. Apte, R. Balani, Puneet Gupta, and Mani Srivastava University of California, Los Angeles puneet@ee.ucla.edu

More information

Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance

Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance Michael D. Powell, Arijit Biswas, Shantanu Gupta, and Shubu Mukherjee SPEARS Group, Intel Massachusetts EECS, University

More information

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Presentation of UniServer Horizon 2020 European project findings: X-Gene server chips, voltage-noise characterization, high-bandwidth voltage measurements,

More information

Hardware-Software Interaction for Run-time Power Optimization: A Case Study of Embedded Linux on Multicore Smartphones

Hardware-Software Interaction for Run-time Power Optimization: A Case Study of Embedded Linux on Multicore Smartphones Hardware-Software Interaction for Run-time Optimization: A Case Study of Embedded Linux on Multicore Smartphones Anup Das, Matthew J. Walker, Andreas Hansson, Bashir M. Al-Hashimi and Geoff V. Merrett

More information

The Who. Intel - no introduction required.

The Who. Intel - no introduction required. Delivering Demand-Based Worlds with Intel SSD GDC 2011 The Who Intel - no introduction required. Digital Extremes - In addition to be great developers of AAA games, they are also the authors of the Evolution

More information

Towards Real-Time Volunteer Distributed Computing

Towards Real-Time Volunteer Distributed Computing Towards Real-Time Volunteer Distributed Computing Sangho Yi 1, Emmanuel Jeannot 2, Derrick Kondo 1, David P. Anderson 3 1 INRIA MESCAL, 2 RUNTIME, France 3 UC Berkeley, USA Motivation Push towards large-scale,

More information

Parallelism Across the Curriculum

Parallelism Across the Curriculum Parallelism Across the Curriculum John E. Howland Department of Computer Science Trinity University One Trinity Place San Antonio, Texas 78212-7200 Voice: (210) 999-7364 Fax: (210) 999-7477 E-mail: jhowland@trinity.edu

More information

Combined Dynamic Thermal Management Exploiting Broadcast-Capable Wireless Networkon-Chip

Combined Dynamic Thermal Management Exploiting Broadcast-Capable Wireless Networkon-Chip Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 3-18-2016 Combined Dynamic Thermal Management Exploiting Broadcast-Capable Wireless Networkon-Chip Architecture

More information

The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance

The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance Aroon Nataraj, Alan Morris, Allen Malony, Matthew Sottile, Pete Beckman l {anataraj, amorris, malony,

More information

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs Li Zhou and Avinash Kodi Technologies for Emerging Computer Architecture Laboratory (TEAL) School of Electrical Engineering and

More information

Game Architecture. 4/8/16: Multiprocessor Game Loops

Game Architecture. 4/8/16: Multiprocessor Game Loops Game Architecture 4/8/16: Multiprocessor Game Loops Monolithic Dead simple to set up, but it can get messy Flow-of-control can be complex Top-level may have too much knowledge of underlying systems (gross

More information

Processors Processing Processors. The meta-lecture

Processors Processing Processors. The meta-lecture Simulators 5SIA0 Processors Processing Processors The meta-lecture Why Simulators? Your Friend Harm Why Simulators? Harm Loves Tractors Harm Why Simulators? The outside world Unfortunately for Harm you

More information

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs 5 th International Conference on Logic and Application LAP 2016 Dubrovnik, Croatia, September 19-23, 2016 Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs

More information

An Artificial Neural Networks based Temperature Prediction Framework for Network-on-Chip based Multicore Platform

An Artificial Neural Networks based Temperature Prediction Framework for Network-on-Chip based Multicore Platform Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 3-2016 An Artificial Neural Networks based Temperature Prediction Framework for Network-on-Chip based Multicore

More information

Perspective platforms for BOINC distributed computing network

Perspective platforms for BOINC distributed computing network Perspective platforms for BOINC distributed computing network Vitalii Koshura Lohika Odessa, Ukraine lestat.de.lionkur@gmail.com Profile page: https://www.linkedin.com/in/aenbleidd/ Abstract This paper

More information

Adaptive Guardband Scheduling to Improve System-Level Efficiency of the POWER7+

Adaptive Guardband Scheduling to Improve System-Level Efficiency of the POWER7+ Adaptive Guardband Scheduling to Improve System-Level Efficiency of the POWER7+ Yazhou Zu 1, Charles R. Lefurgy, Jingwen Leng 1, Matthew Halpern 1, Michael S. Floyd, Vijay Janapa Reddi 1 1 The University

More information

Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control

Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control Energy Efficient Soft Real-Time Computing through Cross-Layer Predictive Control Guangyi Cao and Arun Ravindran Department of Electrical and Computer Engineering University of North Carolina at Charlotte

More information

CLIPPER: Counter-based Low Impact Processor Power Estimation at Run-time

CLIPPER: Counter-based Low Impact Processor Power Estimation at Run-time CLIPPER: Counter-based Low Impact Processor Power Estimation at Run-time Jorgen Peddersen, Sri Parameswaran School of Computer Science and Engineering The University of New South Wales & National ICT Australia

More information

Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence

Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence Katayoun Neshatpour George Mason University kneshatp@gmu.edu Amin Khajeh Broadcom Corporation amink@broadcom.com Houman Homayoun

More information

Real Time Visualization of Full Resolution Data of Indian Remote Sensing Satellite

Real Time Visualization of Full Resolution Data of Indian Remote Sensing Satellite International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 8, Issue 9 (September 2013), PP. 42-51 Real Time Visualization of Full Resolution

More information

Computer Architecture A Quantitative Approach

Computer Architecture A Quantitative Approach Computer Architecture A Quantitative Approach Fourth Edition John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With Contributions by Andrea C. Arpaci-Dusseau

More information

Scalable and Lightweight CTF Infrastructures Using Application Containers

Scalable and Lightweight CTF Infrastructures Using Application Containers Scalable and Lightweight CTF Infrastructures Using Application Containers Arvind S Raj, Bithin Alangot, Seshagiri Prabhu and Krishnashree Achuthan Amrita Center for Cybersecurity Systems and Networks Amrita

More information

Parallel Randomized Best-First Search

Parallel Randomized Best-First Search Parallel Randomized Best-First Search Yaron Shoham and Sivan Toledo School of Computer Science, Tel-Aviv Univsity http://www.tau.ac.il/ stoledo, http://www.tau.ac.il/ ysh Abstract. We describe a novel

More information

From the New York Times Introduction to Concurrency

From the New York Times Introduction to Concurrency From the New York Times Introduction to Concurrency dapted by CP from lectures by Maurice Herlihy at rown SN FRNCISCO, May 7. 2004 - Intel said on Friday that it was scrapping its development of two microprocessors,

More information

FIFO WITH OFFSETS HIGH SCHEDULABILITY WITH LOW OVERHEADS. RTAS 18 April 13, Björn Brandenburg

FIFO WITH OFFSETS HIGH SCHEDULABILITY WITH LOW OVERHEADS. RTAS 18 April 13, Björn Brandenburg FIFO WITH OFFSETS HIGH SCHEDULABILITY WITH LOW OVERHEADS RTAS 18 April 13, 2018 Mitra Nasri Rob Davis Björn Brandenburg FIFO SCHEDULING First-In-First-Out (FIFO) scheduling extremely simple very low overheads

More information

DICELIB: A REAL TIME SYNCHRONIZATION LIBRARY FOR MULTI-PROJECTION VIRTUAL REALITY DISTRIBUTED ENVIRONMENTS

DICELIB: A REAL TIME SYNCHRONIZATION LIBRARY FOR MULTI-PROJECTION VIRTUAL REALITY DISTRIBUTED ENVIRONMENTS DICELIB: A REAL TIME SYNCHRONIZATION LIBRARY FOR MULTI-PROJECTION VIRTUAL REALITY DISTRIBUTED ENVIRONMENTS Abstract: The recent availability of PC-clusters offers an alternative solution instead of high-end

More information

Lecture #29. Moore s Law

Lecture #29. Moore s Law Lecture #29 ANNOUNCEMENTS HW#15 will be for extra credit Quiz #6 (Thursday 5/8) will include MOSFET C-V No late Projects will be accepted after Thursday 5/8 The last Coffee Hour will be held this Thursday

More information

Microarchitectural Attacks and Defenses in JavaScript

Microarchitectural Attacks and Defenses in JavaScript Microarchitectural Attacks and Defenses in JavaScript Michael Schwarz, Daniel Gruss, Moritz Lipp 25.01.2018 www.iaik.tugraz.at 1 Michael Schwarz, Daniel Gruss, Moritz Lipp www.iaik.tugraz.at Microarchitecture

More information

Liu Yang, Bong-Joo Jang, Sanghun Lim, Ki-Chang Kwon, Suk-Hwan Lee, Ki-Ryong Kwon 1. INTRODUCTION

Liu Yang, Bong-Joo Jang, Sanghun Lim, Ki-Chang Kwon, Suk-Hwan Lee, Ki-Ryong Kwon 1. INTRODUCTION Liu Yang, Bong-Joo Jang, Sanghun Lim, Ki-Chang Kwon, Suk-Hwan Lee, Ki-Ryong Kwon 1. INTRODUCTION 2. RELATED WORKS 3. PROPOSED WEATHER RADAR IMAGING BASED ON CUDA 3.1 Weather radar image format and generation

More information

Application of Maxwell Equations to Human Body Modelling

Application of Maxwell Equations to Human Body Modelling Application of Maxwell Equations to Human Body Modelling Fumie Costen Room E, E0c at Sackville Street Building, fc@cs.man.ac.uk The University of Manchester, U.K. February 5, 0 Fumie Costen Room E, E0c

More information

Thermal Management of Manycore Systems with Silicon-Photonic Networks

Thermal Management of Manycore Systems with Silicon-Photonic Networks Thermal Management of Manycore Systems with Silicon-Photonic Networks Tiansheng Zhang, José L. Abellán, Ajay Joshi, Ayse K. Coskun Electrical and Computer Engineering Department, Boston University, Boston,

More information

Measurement Driven Deployment of a Two-Tier Urban Mesh Access Network

Measurement Driven Deployment of a Two-Tier Urban Mesh Access Network Measurement Driven Deployment of a Two-Tier Urban Mesh Access Network J. Camp, J. Robinson, C. Steger, E. Knightly Rice Networks Group MobiSys 2006 6/20/06 Two-Tier Mesh Architecture Limited Gateway Nodes

More information

Investigating the Post Processing of LS-DYNA in a Fully Immersive Workflow Environment

Investigating the Post Processing of LS-DYNA in a Fully Immersive Workflow Environment Investigating the Post Processing of LS-DYNA in a Fully Immersive Workflow Environment Ed Helwig 1, Facundo Del Pin 2 1 Livermore Software Technology Corporation, Livermore CA 2 Livermore Software Technology

More information

TLSync: Support for Multiple Fast Barriers Using On-Chip Transmission Lines

TLSync: Support for Multiple Fast Barriers Using On-Chip Transmission Lines TLSync: Support for Multiple Fast Barriers Using On-Chip Transmission Lines Jungju Oh jungju@gatech.edu Milos Prvulovic milos@cc.gatech.edu Georgia Institute of Technology Atlanta, GA, USA Alenka Zajic

More information

Towards Scalable 1024 Processor Shared Memory Systems

Towards Scalable 1024 Processor Shared Memory Systems Towards Scalable Processor Shared Memory Systems Robert B. Ciotti (ciotti@nas.nasa.gov), NASA Advanced Supercomputing (NAS) NASA Ames Research Center, Moffett Field, CA 95 ABSTRACT: Over the past years,

More information

Here are additional tips that we hope you will find useful in the future.

Here are additional tips that we hope you will find useful in the future. Bobbin and Needle Type Here are additional tips that we hope you will find useful in the future. Question: What type of bobbins and needles should I use with my SINGER Futura embroidery & sewing machine?

More information

Outline Simulators and such. What defines a simulator? What about emulation?

Outline Simulators and such. What defines a simulator? What about emulation? Outline Simulators and such Mats Brorsson & Mladen Nikitovic ICT Dept of Electronic, Computer and Software Systems (ECS) What defines a simulator? Why are simulators needed? Classifications Case studies

More information

Arduino STEAM Academy Arduino STEM Academy Art without Engineering is dreaming. Engineering without Art is calculating. - Steven K.

Arduino STEAM Academy Arduino STEM Academy Art without Engineering is dreaming. Engineering without Art is calculating. - Steven K. Arduino STEAM Academy Arduino STEM Academy Art without Engineering is dreaming. Engineering without Art is calculating. - Steven K. Roberts Page 1 See Appendix A, for Licensing Attribution information

More information

Welcome to the Early Beta and Thank You for Your Continued Support!

Welcome to the Early Beta and Thank You for Your Continued Support! REFERENCE CARD Welcome to the Early Beta and Thank You for Your Continued Support! In addition to the information below, we ve recently added tutorial messages to the game. Remember to look for the in-game

More information

DEMIGOD DEMIGOD. characterize stalls and pop-ups during game play. Serious gamers play games at their maximum settings driving HD monitors.

DEMIGOD DEMIGOD. characterize stalls and pop-ups during game play. Serious gamers play games at their maximum settings driving HD monitors. Intel Solid-State Drives (Intel SSDs) are revolutionizing storage performance on desktop and laptop PCs, delivering dramatically faster load times than hard disk drives (HDDs). When Intel SSDs are used

More information

Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system

Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system Name: Affiliation: Field of research: Specific Field of Study: Proposed Research Topic: Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar Information Science and Technology Computer Science

More information

VRCon: Dynamic Reconfiguration of Voltage Regulators in a Multicore Platform

VRCon: Dynamic Reconfiguration of Voltage Regulators in a Multicore Platform VRCon: Dynamic Reconfiguration of Voltage Regulators in a Multicore Platform Woojoo Lee, Yanzhi Wang, and Massoud Pedram Dept. of Electrical Engineering, Univ. of Souther California, Los Angeles, California,

More information

RIVA. Real-Time Image Viewing and Acquisition. Version 1.0. User's Manual

RIVA. Real-Time Image Viewing and Acquisition. Version 1.0. User's Manual 1 RIVA Real-Time Image Viewing and Acquisition Version 1.0 User's Manual Susan Jarmuz and Dr. Mariusz Jankowski University of Southern Maine Gorham, Maine mjkcc@usm.maine.edu 2 2000-2001 Susan Jarmuz and

More information

DESKTOP APPLICATION FOR RED-GREEN COLOR BLINDNESS. Mahendra Sunt Servanda Informatics Engineering Supervisor: Dr.

DESKTOP APPLICATION FOR RED-GREEN COLOR BLINDNESS. Mahendra Sunt Servanda Informatics Engineering Supervisor: Dr. DESKTOP APPLICATION FOR RED-GREEN COLOR BLINDNESS Mahendra Sunt Servanda 54411250 Informatics Engineering Supervisor: Dr. Yuhilza Hanum INTRODUCTION Background Image and colors now have important role

More information

Parallel Simulation of Social Agents using Cilk and OpenCL

Parallel Simulation of Social Agents using Cilk and OpenCL D. Moser, A. Riener, K. Zia, A. Ferscha Department for Pervasive Computing, JKU Linz/Austria Parallel Simulation of Social Agents using Cilk and OpenCL DS-RT 2011 15th International Symposium on Distributed

More information

Challenges in Transition

Challenges in Transition Challenges in Transition Keynote talk at International Workshop on Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC 2016) 1 Kazuaki Ishizaki IBM Research Tokyo kiszk@acm.org

More information

Andes Game Platform Porting

Andes Game Platform Porting Andes Game Platform Porting Andes Technology Architecture for Next-generation Digital Engines for SoC Outline Porting guide System Architecture Package dependency Game package details Performance issue

More information

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of

More information

Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture

Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture Jingwen Leng Yazhou Zu Vijay Janapa Reddi The University of Texas at Austin {jingwen, yazhou.zu}@utexas.edu,

More information

Using ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors

Using ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors Using ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors Anys Bacha Computer Science and Engineering The Ohio State University bacha@cse.ohio-state.edu Radu Teodorescu Computer Science

More information

Console Architecture 1

Console Architecture 1 Console Architecture 1 Overview What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design PS3 in detail

More information

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing *

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing * Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing * Radu Teodorescu, Jun Nakano, Abhishek Tiwari and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

FREQUENTLY ASKED QUESTIONS

FREQUENTLY ASKED QUESTIONS FREQUENTLY ASKED QUESTIONS TABLE OF CONTENTS ABOUT STRATEGY AND COMMUNITY SUPPORT GAMEPLAY STEAM EARLY ACCESS LANGUAGES MISCELLANEOUS 3 5 5 6 9 10 10 Axis & Allies Online FAQ TABLE OF CONTENTS 2 ABOUT

More information

Use Nvidia Performance Primitives (NPP) in Deep Learning Training. Yang Song

Use Nvidia Performance Primitives (NPP) in Deep Learning Training. Yang Song Use Nvidia Performance Primitives (NPP) in Deep Learning Training Yang Song Outline Introduction Function Categories Performance Results Deep Learning Specific Further Information What is NPP? Image+Signal

More information

CS4961 Parallel Programming. Lecture 1: Introduction 08/24/2010. Course Details Time and Location: TuTh, 9:10-10:30 AM, WEB L112 Course Website

CS4961 Parallel Programming. Lecture 1: Introduction 08/24/2010. Course Details Time and Location: TuTh, 9:10-10:30 AM, WEB L112 Course Website Parallel Programming Lecture 1: Introduction Mary Hall August 24, 2010 1 Course Details Time and Location: TuTh, 9:10-10:30 AM, WEB L112 Course Website - http://www.eng.utah.edu/~cs4961/ Instructor: Mary

More information

Table of Contents HOL EMT

Table of Contents HOL EMT Table of Contents Lab Overview - - Machine Learning Workloads in vsphere Using GPUs - Getting Started... 2 Lab Guidance... 3 Module 1 - Machine Learning Apps in vsphere VMs Using GPUs (15 minutes)...9

More information

Latency-aware DVFS for Efficient Power State Transitions on Many-core Architectures

Latency-aware DVFS for Efficient Power State Transitions on Many-core Architectures J Supercomput manuscript No. (will be inserted by the editor) Latency-aware DVFS for Efficient Power State Transitions on Many-core Architectures Zhiquan Lai King Tin Lam Cho-Li Wang Jinshu Su Received:

More information

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Ramon Canal NCD Master MIRI. NCD Master MIRI 1 Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/

More information

CS434/534: Topics in Networked (Networking) Systems

CS434/534: Topics in Networked (Networking) Systems CS434/534: Topics in Networked (Networking) Systems Improve Wireless Capacity; Programmable Wireless Networks Yang (Richard) Yang Computer Science Department Yale University 208A Watson Email: yry@cs.yale.edu

More information

Warp-Aware Trace Scheduling for GPUS. James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown)

Warp-Aware Trace Scheduling for GPUS. James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown) Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown) Historical Trends in GFLOPS: CPUs vs. GPUs Theoretical GFLOP/s 3250 3000 2750 2500

More information

Monte Carlo integration and event generation on GPU and their application to particle physics

Monte Carlo integration and event generation on GPU and their application to particle physics Monte Carlo integration and event generation on GPU and their application to particle physics Junichi Kanzaki (KEK) GPU2016 @ Rome, Italy Sep. 26, 2016 Motivation Increase of amount of LHC data (raw &

More information

Artificial Intelligence for Games. Santa Clara University, 2012

Artificial Intelligence for Games. Santa Clara University, 2012 Artificial Intelligence for Games Santa Clara University, 2012 Introduction Class 1 Artificial Intelligence for Games What is different Gaming stresses computing resources Graphics Engine Physics Engine

More information

BIO Helmet EEL 4914 Senior Design I Group # 3 Frank Alexin Nicholas Dijkhoffz Adam Hollifield Mark Le

BIO Helmet EEL 4914 Senior Design I Group # 3 Frank Alexin Nicholas Dijkhoffz Adam Hollifield Mark Le BIO Helmet EEL 4914 Senior Design I Group # 3 Frank Alexin Nicholas Dijkhoffz Adam Hollifield Mark Le Project Description and Motivation The goal of this project is to create and integrate a system that

More information

Multi-core Platforms for

Multi-core Platforms for 20 JUNE 2011 Multi-core Platforms for Immersive-Audio Applications Course: Advanced Computer Architectures Teacher: Prof. Cristina Silvano Student: Silvio La Blasca 771338 Introduction on Immersive-Audio

More information