Recent Advances in Simulation Techniques and Tools

Size: px
Start display at page:

Download "Recent Advances in Simulation Techniques and Tools"

Transcription

1 Recent Advances in Simulation Techniques and Tools Yuyang Li, li.yuyang(at)wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download Abstract: Simulation refers to using specified kind of techniques to imitate the operation in real world. It is derived from then modeling of the object in the real world. In computer engineering, simulation is one of the three main methods in analyzing computer systems. It requires less time than measuring the real system and can be done by computer programs. Gem5 and ESESC are two featured simulation tools for simulating computer system. In this paper, their features will be studied, validated and then summarized. key words: simulation tools, computer system simulator, simulator validation, gem5, ESESC Table of Contents 1.Introduction 2.Back Ground o 2.1 Simulation o 2.2 Computer Architecture simulator o 2.3 Emulator 3.The Gem5 Simulator o 3.1 gem5 accuracy test o 3.2 pd-gem5 o 3.3 gem5-gpu 4.ESESC Simulator o 4.1 An Example of Simulation 5.Conclusions References Acronyms 1. Introduction 1

2 In computer system analysis, the three main methods are modeling, simulation and measuring. Among which, simulation can be performed on any stage, do not require a large portion of time and can be done using computer language. [Jain91] Since the appearance of the simulation technology, the techniques and tools has experienced a boost on the development. Different techniques are invented satisfying various demands. In this way, a comparation will be necessary in order to find the differences and the direction fitted by each one. 2. Background 2.1 Simulation Simulation is known as the imitation of the operation of a real-world process or system over time. [Banks01] It is based on the model developed representing the key features, behaviors and functions of the real-world object. The model is the representation of the system, and the simulation represents the operation on the system. Simulation is widely used in computer engineering and electrical engineering. 2.2 Computer Architecture simulator Computer architecture simulator, also known as architectural simulator, is a kind of software modeling computer hardware and give the predicted performance. The modeled target can be very flexible. It could be a microprocessor only or a full system include processor, memory and I/O devices. The architectural simulation can achieve the following purposes: evaluating designs without building the real hardware system, accessing non-existing computer devices, generating detailed performance data and quick debugging. A typical example is the multicore computer system, it demands a full-system simulation because creating and debugging can be very difficult and time-consuming. What's more, with the help of simulation, the software development can also start before the hardware is ready. This will validate the design if the hardware. [Joloboff09] 2.3 Emulator Emulator can be hardware or software, and allows one system to behave like another. The system running the emulator is called the host system and the one emulated called the guest. The emulator will allow the host system to run software that designed for the guest system. The word "emulator" was first created in 1963[Emerson95], at that time it only refers to the simulation assisted by microcode or hardware while running software emulation was still called "simulation". 3. The Gem5 Simulator Gem5 is a simulator platform doing simulation around system-level computer architecture and processor microarchitecture. It integrates interchangeable CPU model, GPU model, memory 2

3 system, and multiple instruction set architectures with has Full-system capability and Multisystem capability and power modeling ability. Untill now, it has developed into many branches. 3.1 gem5 accuracy test[butko12] A two-factor experiment is used to measure the accuracy of the gem5 simulator. The method is that run the same program on a real hardware system and the system simulated by gem5 respectively, collect output data and calculate the differences. For the referential hardware model, the Snowball SKY-S9500-ULP-C01 development kit is chosen. The system is built with ST-Ericsson A9500, a dual-core ARM Cortex-A9 processor on Linux Kernel. On the other hand, the gem5 model was built as a dual-core ARM Cortex-A9 processor@ 1GHz, running on Linux kernel, too. The ALP Media, SPLASH-2 and STREAM benchmark were chosen as the workload, permitting exploiting and assessing performance of multicore architecture. The first benchmark includes speech recognition, face recognition, race tracing, MPEG-2 encode and MPEG-2 decode. In the experiment, the last two application which are MPEG-2 encoder and decoder are selected as the tested service. In the second benchmark, there are eight complete applications: Barnes, FMM, Ocean, Radiosity, Water-Spatial and 3-kernels: FFT, LU, Radix. What's more, the STREAM benchmark is a simple program that calculating corresponding rate based on the measured memory bandwidth. The descriptions are presented in Table 1. Table 1. Benchmark Set Description The measured execution time was shown in table2. 3

4 Table 2. Execution Time of The Adopted Benchmarks The result shows that the mismatch rate is between 0,47% and 17.94%. The mismatch rate varies for different workloads. In this case, the Radix application is studied since it has the largest mismatch rate. This time, different number of keys were tested to study the relationship between the number and the mismatch rate. Table 3 shows the outputs. Table 3. Execution Time (ET) of Radix Sort Kernel Fig.1 shows that the mismatch rate will increase the mismatch rate. This is because the volume of communication between cores will increase when the number of keys increase, which will cause more cache misses. Fig.1 Radix Sort Execution Time Behavior For the stream benchmark, Table4 shows that the results from both system are very close. 4

5 Table 4. Memory Bandwidth When Executing Stream Benchmark 3.2 pd-gem5 pd-gem5 is developed for parallel/distributed computer systems. [Alian16] Each host runs one or more gem5 processes, which simulates a full system node or more network switch. An 8-node system simulated in pd-gem5 can be depicted as Fig.2: Fig.2 The Structure of a Four-node Computer System Simulation in pd-gem5 For networking, each packet is generated by a simulated NIC, and forwarded to a simulated network switch port through TCP sockets. The switch will route the packet to a simulated NIC destination node. The NIC model latency of traditional gem5 simulation is Where S is the packet size, it is smaller than the maximum transmission unit. lnic and bnic stand for NIC's fixed latency and maximum bandwidth. However, in pd-gem5, it is enhanced to more precisely capture non-linear latency effect due to diverse packet sizes. Barrier synchronization is implemented in order to synchronize simulated nodes in pd-gem5. In barrier synchronization, each simulated node is synchronized at the end of each simulated time quantum, which should be fixed. To run the simulation, hosts which consists a quad core Intel Xeon processor, 2x8GB DDR DIMMs and an Intel ethernet NIC are used. And another group of AMD quad core processor, 1x8GB DDR3 DIMM and a Realtek PCIe Ethernet NIC is used to validate this model. 5

6 2 to 24-node systems are evaluated with a star network topology. Then, the MPI implementation of the NAS benchmark is ran. Fig.3 shows the comparison of pd-gem5, gem5 and the measurement of a AMD quad core processor. From the graph, we can find that the pd-gem5 shows a similar non-linearity fits the real measured curve well. On contrast, the original gem-5 result is more linear and has more significant difference with the real result. In this way, the pd-gem5 method is validated. Fig.3 Comparison of Packet Round Trip Latency Figure 3 is the speed up of pd-gem5 running on multiple simulation hosts (0.25 times the number of simulated nodes/the number of cores per simulation host) over pd-gem5 running on a single simulation host. For example, the number of simulation hosts used for the simulation is 2, 4, and 6 for the number of simulated nodes 8, 16, and 24. This is because each simulation host has four cores. The simulation time of using multiple simulation hosts is normalized to that of using a single simulation host to get the speedup. When the number of simulated nodes is fewer than the number of (active) threads supported by a simulation host, pd-gem5 running on a single simulation host shows slightly higher performance than running on multiple simulation hosts, as the overhead of synchronization is lower (on-chip versus off-chip interconnect communications). However, pd-gem5 running the NAS benchmarks on 2, 4, and 6 simulation hosts offers 1.2, 1.6, and 3.2 higher geometric-mean performance for 8, 16, and 24 simulated nodes than running them on a single simulation host. Note that the microbenchmark fails to complete in a reasonable amount of time when a single simulation host attempts to simulate a 24node computer system because of extremely slow simulation and thus there is no associated speedup point in Fig

7 Fig.4 Simulation Speedup of Using Multiple Simulation Hosts over a Single Simulation Host 3.3 gem5-gpu [Jason15] Gem5 is the simulator modeling integrated CPU-GPU systems. It is based on the gem5 and GPGPU-sim simulators. It is designed to do simulations on both systems with coherent caches, using single virtual address space across CPU and GPU, and systems with separated caches. Gem5-gpu followed gem5's cache coherence language SLICC. In addition, gem5-gpu added a series of heterogeneous cache protocols. MOESI protocol is used in the newly-added models. The gem5-gpu simulator supports most unmodified CUDA 3.2 source code and allows CPU and GPU working simultaneously. Fig.5 shows that compared with the real system, gem5-gpu have a difference smaller than 22% in most cases. And the performance is tightly correlated with the performance of the GPGPU simulation. In this way, such simulator can be validated. 4. The ESESC Simulator Fig.5 Runtimes Normalized to NVIDIA GTX

8 Besides the gem5 simulator, ESESC is also a widely used simulator for computer systems. ESESC is an open source very fast simulator that models heterogeneous multicores (Out of Oder/In Order/GPUs) with detailed performance, power, and thermal models. [esesc] The simulator models ARM ISA, supports time-based sampling and can give detailed power and temperature reports. ESESC is famous for its speed. The simulator uses time-based sampling [Ehsan13]. Fig.6 shows that comparing with single threaded simulators without sampling and multithreaded simulators, ESESC showed a slight decrease when the number of cores increased. However, the simulation speed (represented by MIPS) still went beyond other simulators definitely. Supplemented by its detailed report for physical features (power and temperature), ESESC is can be used widely in its field. 4.1 An Example of Simulation Fig.6 Simulation Speed for Different Simulators A simple exampling experiment is implemented. The computer systems with different L1 cache size, with and without L2 cache are tested and the IPCs are studied as the representation of the performance. The details are shown in Table.1. Table 5. The Details of the Experiments The results are shown as follow: 8

9 Table.6 The Results of the Verification Experiment The ANOVA Table is as follow: Table.7 The ANOVA Table of the Experiment From the table, it is shown that the effect of the Cache size is significant while the effect of the L2 Cache is insignificant. The linear regression model of the IPC and cache size is What's more, increasing the cache size do decrease the miss rate and help to improve the performance. In this way, we can find the result valid and represent the real-world condition. 5. Conclusions Nowadays, computer system simulation is getting more and more important since its convenience for not requiring real system and its reliability validated by several experiments. Also, simulators are branching into more and more types fitting multiple demands. The gem5 simulator can simulate different models of CPU and GPU and many computer systems, it can be customized due to individual requirement for the simulation. In this way, Subsimulator software such as pd-gem5 and gem5-gpu is designed. Pd-gem5 is prepared for simulating parallel or distributed systems. It can run several hosts at the same time and is suitable for simulating network systems. Gem5-GPU combines gem5 CPU simulation and GPGPU simulation to simulate CPU and GPU together. It has independent interface between gem5 and GPGPU simulator and can give results very close to the measurement of real system. The ESESC simulator is good at running simulation in high speed. It can simulate multicore, in order/ out of order processors and different kind of caches. Comparing with other simulators 9

10 based on instructions such as SESC, this simulator is based on the TBS technique and generates output data faster than some other simulators. The data is detailed including power and temperature calculations. However, it is limited in ARM instruction set architectures. The current problem is that it cannot handle multithreads conditions. In this way, these simulators show different feature against each other and have their own advantages and disadvantages. There probably does not exist a simulator that is perfect for all tasks, but the development and evolution of computer simulation techniques and tools is obvious enough. The current techniques have already made great difference and made the simulation method an important step in computer system analysis. References 1. [Jain91] Raj Jain, "The Art of Computer Systems Performance Analysis," John Wiley & Sons, INC, 1991, ISBN-10: > 2. [Banks01] J. Banks; J. Carson; B. Nelson; D. Nicol (2001). Discrete-Event System Simulation. Prentice Hall. p. 3. ISBN [Joloboff09] "Full System Simulation of Embedded Systems" 4. [Emerson95] Pugh, Emerson W. (1995). Building IBM: Shaping an Industry and Its Technology. MIT. p ISBN [Alian16] Mohammad Alian, Daehoon Kim, Nam Sung Kim. "pd-gem5: Simulation Infrastructure for Parallel/Distributed Computer Systems", IEEE COMPUTER ARCHITECTURE LETTERS, VOL. 15, NO. 1, JANUARY-JUNE [Butko12] Anastasiia Butko, Rafael Garibotti, Luciano Ost, Gilles Sassatelli, "Accuracy Evaluation of GEM5 Simulator System", Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), th International Workshop on, 9-11 July [Jason15] Jason Power, Joel Hestness, Marc S. Orr, Mark D. Hill, David A. Wood, "gem5-gpu: A Heterogeneous CPU-GPU Simulator", IEEE COMPUTER ARCHITECTURE LETTERS, VOL. 14, NO. 1, JANUARY-JUNE [esesc]esesc Simulator Official Website 9. [Ehsan13] Ehsan K. Ardestani, Jose Renau, "ESESC:AFastMulticoreSimulatorUsingTime-BasedSampling", High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on, Feb Acronyms 1. SPLASH-2: Stanford ParalleL Applications of SHared memory 2. ET: Execution Time 3. ESESC: Enhanced Super ESCalar 4. TBS: Time-Based Sampling 5. IPC: Instructions Per Cycle 6. ANOVA: ANalysis Of VAriance 10

11 Last modified: December 15, 2017 This and other papers on performance analysis of computer systems are available online at Back to Raj Jain's Home Page 11

COTSon: Infrastructure for system-level simulation

COTSon: Infrastructure for system-level simulation COTSon: Infrastructure for system-level simulation Ayose Falcón, Paolo Faraboschi, Daniel Ortega HP Labs Exascale Computing Lab http://sites.google.com/site/hplabscotson MICRO-41 tutorial November 9, 28

More information

Outline Simulators and such. What defines a simulator? What about emulation?

Outline Simulators and such. What defines a simulator? What about emulation? Outline Simulators and such Mats Brorsson & Mladen Nikitovic ICT Dept of Electronic, Computer and Software Systems (ECS) What defines a simulator? Why are simulators needed? Classifications Case studies

More information

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Rachata Ausavarungnirun Joshua Landgraf Vance Miller Saugata Ghose Jayneel Gandhi Christopher J. Rossbach Onur

More information

Processors Processing Processors. The meta-lecture

Processors Processing Processors. The meta-lecture Simulators 5SIA0 Processors Processing Processors The meta-lecture Why Simulators? Your Friend Harm Why Simulators? Harm Loves Tractors Harm Why Simulators? The outside world Unfortunately for Harm you

More information

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links DLR.de Chart 1 GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links Chen Tang chen.tang@dlr.de Institute of Communication and Navigation German Aerospace Center DLR.de Chart

More information

Document downloaded from:

Document downloaded from: Document downloaded from: http://hdl.handle.net/1251/64738 This paper must be cited as: Reaño González, C.; Pérez López, F.; Silla Jiménez, F. (215). On the design of a demo for exhibiting rcuda. 15th

More information

SpiNNaker SPIKING NEURAL NETWORK ARCHITECTURE MAX BROWN NICK BARLOW

SpiNNaker SPIKING NEURAL NETWORK ARCHITECTURE MAX BROWN NICK BARLOW SpiNNaker SPIKING NEURAL NETWORK ARCHITECTURE MAX BROWN NICK BARLOW OVERVIEW What is SpiNNaker Architecture Spiking Neural Networks Related Work Router Commands Task Scheduling Related Works / Projects

More information

Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture

Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture Jingwen Leng Yazhou Zu Vijay Janapa Reddi The University of Texas at Austin {jingwen, yazhou.zu}@utexas.edu,

More information

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs Li Zhou and Avinash Kodi Technologies for Emerging Computer Architecture Laboratory (TEAL) School of Electrical Engineering and

More information

Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka

Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka Abstract Virtual prototyping is becoming increasingly important to embedded software developers, engineers, managers

More information

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Ho Young Kim, Robert Maxwell, Ankil Patel, Byeong Kil Lee Abstract The purpose of this study is to analyze and compare the

More information

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of

More information

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Performance Metrics http://www.yildiz.edu.tr/~naydin 1 2 Objectives How can we meaningfully measure and compare

More information

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs 5 th International Conference on Logic and Application LAP 2016 Dubrovnik, Croatia, September 19-23, 2016 Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs

More information

Supporting x86-64 Address Translation for 100s of GPU Lanes. Jason Power, Mark D. Hill, David A. Wood

Supporting x86-64 Address Translation for 100s of GPU Lanes. Jason Power, Mark D. Hill, David A. Wood Supporting x86-64 Address Translation for 100s of GPU s Jason Power, Mark D. Hill, David A. Wood Summary Challenges: CPU&GPUs physically integrated, but logically separate; This reduces theoretical bandwidth,

More information

Synthetic Aperture Beamformation using the GPU

Synthetic Aperture Beamformation using the GPU Paper presented at the IEEE International Ultrasonics Symposium, Orlando, Florida, 211: Synthetic Aperture Beamformation using the GPU Jens Munk Hansen, Dana Schaa and Jørgen Arendt Jensen Center for Fast

More information

Final Report: DBmbench

Final Report: DBmbench 18-741 Final Report: DBmbench Yan Ke (yke@cs.cmu.edu) Justin Weisz (jweisz@cs.cmu.edu) Dec. 8, 2006 1 Introduction Conventional database benchmarks, such as the TPC-C and TPC-H, are extremely computationally

More information

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Ramon Canal NCD Master MIRI. NCD Master MIRI 1 Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/

More information

Simulating GPGPUs ESESC Tutorial

Simulating GPGPUs ESESC Tutorial ESESC Tutorial Speaker: ankaranarayanan Department of Computer Engineering, University of California, Santa Cruz http://masc.soe.ucsc.edu 1 Outline Background GPU Emulation Setup GPU Simulation Setup Running

More information

IMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU

IMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU IMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU Seunghak Lee (HY-SDR Research Center, Hanyang Univ., Seoul, South Korea; invincible@dsplab.hanyang.ac.kr); Chiyoung Ahn (HY-SDR

More information

CS649 Sensor Networks IP Lecture 9: Synchronization

CS649 Sensor Networks IP Lecture 9: Synchronization CS649 Sensor Networks IP Lecture 9: Synchronization I-Jeng Wang http://hinrg.cs.jhu.edu/wsn06/ Spring 2006 CS 649 1 Outline Description of the problem: axes, shortcomings Reference-Broadcast Synchronization

More information

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling Improving GPU Performance via Large Warps and Two-Level Warp Scheduling Veynu Narasiman The University of Texas at Austin Michael Shebanow NVIDIA Chang Joo Lee Intel Rustam Miftakhutdinov The University

More information

A High Definition Motion JPEG Encoder Based on Epuma Platform

A High Definition Motion JPEG Encoder Based on Epuma Platform Available online at www.sciencedirect.com Procedia Engineering 29 (2012) 2371 2375 2012 International Workshop on Information and Electronics Engineering (IWIEE) A High Definition Motion JPEG Encoder Based

More information

CS4617 Computer Architecture

CS4617 Computer Architecture 1/26 CS4617 Computer Architecture Lecture 2 Dr J Vaughan September 10, 2014 2/26 Amdahl s Law Speedup = Execution time for entire task without using enhancement Execution time for entire task using enhancement

More information

Challenges in Transition

Challenges in Transition Challenges in Transition Keynote talk at International Workshop on Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC 2016) 1 Kazuaki Ishizaki IBM Research Tokyo kiszk@acm.org

More information

CUDA-Accelerated Satellite Communication Demodulation

CUDA-Accelerated Satellite Communication Demodulation CUDA-Accelerated Satellite Communication Demodulation Renliang Zhao, Ying Liu, Liheng Jian, Zhongya Wang School of Computer and Control University of Chinese Academy of Sciences Outline Motivation Related

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

GPU-accelerated track reconstruction in the ALICE High Level Trigger

GPU-accelerated track reconstruction in the ALICE High Level Trigger GPU-accelerated track reconstruction in the ALICE High Level Trigger David Rohr for the ALICE Collaboration Frankfurt Institute for Advanced Studies CHEP 2016, San Francisco ALICE at the LHC The Large

More information

Statement of Research Weiwei Chen

Statement of Research Weiwei Chen Statement of Research Weiwei Chen Embedded computer systems are ubiquitous and pervasive in our modern society with a wide application domain, such as automotive and avionic systems, electronic medical

More information

Customized Computing for Power Efficiency. There are Many Options to Improve Performance

Customized Computing for Power Efficiency. There are Many Options to Improve Performance ustomized omputing for Power Efficiency Jason ong cong@cs.ucla.edu ULA omputer Science Department http://cadlab.cs.ucla.edu/~cong There are Many Options to Improve Performance Page 1 Past Alternatives

More information

Game Architecture. 4/8/16: Multiprocessor Game Loops

Game Architecture. 4/8/16: Multiprocessor Game Loops Game Architecture 4/8/16: Multiprocessor Game Loops Monolithic Dead simple to set up, but it can get messy Flow-of-control can be complex Top-level may have too much knowledge of underlying systems (gross

More information

LIMITS OF PARALLELISM AND BOOSTING IN DIM SILICON

LIMITS OF PARALLELISM AND BOOSTING IN DIM SILICON ... LIMITS OF PARALLELISM AND BOOSTING IN DIM SILICON... THE AUTHORS INVESTIGATE THE LIMIT OF VOLTAGE SCALING TOGETHER WITH TASK PARALLELIZATION TO MAINTAIN TASK-COMPLETION LATENCY WHILE REDUCING ENERGY

More information

Perspective platforms for BOINC distributed computing network

Perspective platforms for BOINC distributed computing network Perspective platforms for BOINC distributed computing network Vitalii Koshura Lohika Odessa, Ukraine lestat.de.lionkur@gmail.com Profile page: https://www.linkedin.com/in/aenbleidd/ Abstract This paper

More information

Parallel Programming Design of BPSK Signal Demodulation Based on CUDA

Parallel Programming Design of BPSK Signal Demodulation Based on CUDA Int. J. Communications, Network and System Sciences, 216, 9, 126-134 Published Online May 216 in SciRes. http://www.scirp.org/journal/ijcns http://dx.doi.org/1.4236/ijcns.216.9511 Parallel Programming

More information

escience: Pulsar searching on GPUs

escience: Pulsar searching on GPUs escience: Pulsar searching on GPUs Alessio Sclocco Ana Lucia Varbanescu Karel van der Veldt John Romein Joeri van Leeuwen Jason Hessels Rob van Nieuwpoort And many others! Netherlands escience center Science

More information

IHV means Independent Hardware Vendor. Example is Qualcomm Technologies Inc. that makes Snapdragon processors. OEM means Original Equipment

IHV means Independent Hardware Vendor. Example is Qualcomm Technologies Inc. that makes Snapdragon processors. OEM means Original Equipment 1 2 IHV means Independent Hardware Vendor. Example is Qualcomm Technologies Inc. that makes Snapdragon processors. OEM means Original Equipment Manufacturer. Examples are smartphone manufacturers. Tuning

More information

Setting up a Digital Darkroom A guide

Setting up a Digital Darkroom A guide Setting up a Digital Darkroom A guide http://www.theuniversody.co.uk Planning / Theory Considerations: What does the facility need to be capable of? Downloading images from digital cameras, (in all Raw

More information

Evaluation of CPU Frequency Transition Latency

Evaluation of CPU Frequency Transition Latency Noname manuscript No. (will be inserted by the editor) Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz Alexandre Laurent Benoît Pradelle William Jalby Abstract Dynamic Voltage and Frequency

More information

Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism

Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism Sangpil Lee and Won Woo Ro School of Electrical and Electronic Engineering Yonsei University Seoul, Republic of

More information

What is a Simulation? Simulation & Modeling. Why Do Simulations? Emulators versus Simulators. Why Do Simulations? Why Do Simulations?

What is a Simulation? Simulation & Modeling. Why Do Simulations? Emulators versus Simulators. Why Do Simulations? Why Do Simulations? What is a Simulation? Simulation & Modeling Introduction and Motivation A system that represents or emulates the behavior of another system over time; a computer simulation is one where the system doing

More information

Cherry Picking: Exploiting Process Variations in the Dark Silicon Era

Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Siddharth Garg University of Waterloo Co-authors: Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu # Transistors Power/Dark

More information

Multiband RF-Interconnect for Reconfigurable Network-on-Chip Communications UCLA

Multiband RF-Interconnect for Reconfigurable Network-on-Chip Communications UCLA Multiband RF-Interconnect for Reconfigurable Network-on-hip ommunications Jason ong (cong@cs.ucla.edu) Joint work with Frank hang, Glenn Reinman and Sai-Wang Tam ULA 1 ommunication hallenges On-hip Issues

More information

Communications Planner for Operational and Simulation Effects With Realism (COMPOSER)

Communications Planner for Operational and Simulation Effects With Realism (COMPOSER) Communications Planner for Operational and Simulation Effects With Realism (COMPOSER) Alan J. Scrime CERDEC Chief, Spectrum Analysis & Frequency Management Branch (732) 427-6346, alan.scrime@us.army.mil

More information

6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS

6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS 6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS Editor: Publisher: Prof. Pece Mitrevski, PhD Faculty of Information and Communication

More information

Introduction to co-simulation. What is HW-SW co-simulation?

Introduction to co-simulation. What is HW-SW co-simulation? Introduction to co-simulation CPSC489-501 Hardware-Software Codesign of Embedded Systems Mahapatra-TexasA&M-Fall 00 1 What is HW-SW co-simulation? A basic definition: Manipulating simulated hardware with

More information

HARDWARE ACCELERATION OF THE GIPPS MODEL

HARDWARE ACCELERATION OF THE GIPPS MODEL HARDWARE ACCELERATION OF THE GIPPS MODEL FOR REAL-TIME TRAFFIC SIMULATION Salim Farah 1 and Magdy Bayoumi 2 The Center for Advanced Computer Studies, University of Louisiana at Lafayette, USA 1 snf3346@cacs.louisiana.edu

More information

Ben Baker. Sponsored by:

Ben Baker. Sponsored by: Ben Baker Sponsored by: Background Agenda GPU Computing Digital Image Processing at FamilySearch Potential GPU based solutions Performance Testing Results Conclusions and Future Work 2 CPU vs. GPU Architecture

More information

Multi-core Platforms for

Multi-core Platforms for 20 JUNE 2011 Multi-core Platforms for Immersive-Audio Applications Course: Advanced Computer Architectures Teacher: Prof. Cristina Silvano Student: Silvio La Blasca 771338 Introduction on Immersive-Audio

More information

DICELIB: A REAL TIME SYNCHRONIZATION LIBRARY FOR MULTI-PROJECTION VIRTUAL REALITY DISTRIBUTED ENVIRONMENTS

DICELIB: A REAL TIME SYNCHRONIZATION LIBRARY FOR MULTI-PROJECTION VIRTUAL REALITY DISTRIBUTED ENVIRONMENTS DICELIB: A REAL TIME SYNCHRONIZATION LIBRARY FOR MULTI-PROJECTION VIRTUAL REALITY DISTRIBUTED ENVIRONMENTS Abstract: The recent availability of PC-clusters offers an alternative solution instead of high-end

More information

REAL TIME DIGITAL SIGNAL PROCESSING. Introduction

REAL TIME DIGITAL SIGNAL PROCESSING. Introduction REAL TIME DIGITAL SIGNAL Introduction Why Digital? A brief comparison with analog. PROCESSING Seminario de Electrónica: Sistemas Embebidos Advantages The BIG picture Flexibility. Easily modifiable and

More information

NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology

NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology Bronson Messer Director of Science National Center for Computational Sciences & Senior R&D Staff Oak Ridge

More information

Performance Metrics, Amdahl s Law

Performance Metrics, Amdahl s Law ecture 26 Computer Science 61C Spring 2017 March 20th, 2017 Performance Metrics, Amdahl s Law 1 New-School Machine Structures (It s a bit more complicated!) Software Hardware Parallel Requests Assigned

More information

The Xbox One System on a Chip and Kinect Sensor

The Xbox One System on a Chip and Kinect Sensor The Xbox One System on a Chip and Kinect Sensor John Sell, Patrick O Connor, Microsoft Corporation 1 Abstract The System on a Chip at the heart of the Xbox One entertainment console is one of the largest

More information

Track and Vertex Reconstruction on GPUs for the Mu3e Experiment

Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Dorothea vom Bruch for the Mu3e Collaboration GPU Computing in High Energy Physics, Pisa September 11th, 2014 Physikalisches Institut Heidelberg

More information

Parallelism Across the Curriculum

Parallelism Across the Curriculum Parallelism Across the Curriculum John E. Howland Department of Computer Science Trinity University One Trinity Place San Antonio, Texas 78212-7200 Voice: (210) 999-7364 Fax: (210) 999-7477 E-mail: jhowland@trinity.edu

More information

Statistical Simulation of Multithreaded Architectures

Statistical Simulation of Multithreaded Architectures Statistical Simulation of Multithreaded Architectures Joshua L. Kihm and Daniel A. Connors University of Colorado at Boulder Department of Electrical and Computer Engineering UCB 425, Boulder, CO, 80309

More information

Evaluation of Mobile Ad Hoc Network with Reactive and Proactive Routing Protocols and Mobility Models

Evaluation of Mobile Ad Hoc Network with Reactive and Proactive Routing Protocols and Mobility Models Evaluation of Mobile Ad Hoc Network with Reactive and Proactive Routing Protocols and Mobility Models Rohit Kumar Department of Computer Sc. & Engineering Chandigarh University, Gharuan Mohali, Punjab

More information

Accelerated Impulse Response Calculation for Indoor Optical Communication Channels

Accelerated Impulse Response Calculation for Indoor Optical Communication Channels Accelerated Impulse Response Calculation for Indoor Optical Communication Channels M. Rahaim, J. Carruthers, and T.D.C. Little Department of Electrical and Computer Engineering Boston University, Boston,

More information

A virtual On Board Control Unit for system tests

A virtual On Board Control Unit for system tests A virtual On Board Control Unit for system tests Ove Kalkan (ove.kalkan@ese.de) test4rail, 17.10.2017, Braunschweig Agenda Introduction: - What is an OBCU - System Test Approach Virtualization - Approach

More information

Networks at the Speed of Light pave the way for the tactile internet

Networks at the Speed of Light pave the way for the tactile internet pave the way for the tactile internet Walter Haeffner Vodafone Distinguished Engineer Symposium Das Taktile Internet 1 October 2013 Vertretung des Freistaats Sachsen in Berlin We have no Warp like Star

More information

Characterizing and Improving the Performance of Intel Threading Building Blocks

Characterizing and Improving the Performance of Intel Threading Building Blocks Characterizing and Improving the Performance of Intel Threading Building Blocks Gilberto Contreras, Margaret Martonosi Princeton University IISWC 08 Motivation Chip Multiprocessors are the new computing

More information

Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips

Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips Timothy N. Miller, Xiang Pan, Renji Thomas, Naser Sedaghati, Radu Teodorescu

More information

Measuring and Evaluating Computer System Performance

Measuring and Evaluating Computer System Performance Measuring and Evaluating Computer System Performance Performance Marches On... But what is performance? The bottom line: Performance Car Time to Bay Area Speed Passengers Throughput (pmph) Ferrari 3.1

More information

GPU-based data analysis for Synthetic Aperture Microwave Imaging

GPU-based data analysis for Synthetic Aperture Microwave Imaging GPU-based data analysis for Synthetic Aperture Microwave Imaging 1 st IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis 1 st -3 rd June 2015 J.C. Chorley 1, K.J. Brunner 1, N.A.

More information

SOFTWARE IMPLEMENTATION OF THE

SOFTWARE IMPLEMENTATION OF THE SOFTWARE IMPLEMENTATION OF THE IEEE 802.11A/P PHYSICAL LAYER SDR`12 WInnComm Europe 27 29 June, 2012 Brussels, Belgium T. Cupaiuolo, D. Lo Iacono, M. Siti and M. Odoni Advanced System Technologies STMicroelectronics,

More information

Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs

Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Michael Gordon, William Thies, and Saman Amarasinghe Massachusetts Institute of Technology ASPLOS October 2006 San Jose,

More information

The Looming Software Crisis due to the Multicore Menace

The Looming Software Crisis due to the Multicore Menace The Looming Software Crisis due to the Multicore Menace Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 2 Today: The Happily Oblivious Average

More information

LSI and Circuit Technologies for the SX-8 Supercomputer

LSI and Circuit Technologies for the SX-8 Supercomputer LSI and Circuit Technologies for the SX-8 Supercomputer By Jun INASAKA,* Toshio TANAHASHI,* Hideaki KOBAYASHI,* Toshihiro KATOH,* Mikihiro KAJITA* and Naoya NAKAYAMA This paper describes the LSI and circuit

More information

THIS article focuses on the design of an advanced

THIS article focuses on the design of an advanced IEEE ACCESS JOURNAL, VOL. XX, NO. X, JULY 2014 1 A Novel MPSoC and Control Architecture for Multi-Standard RF Transceivers Siegfried Brandstätter, and Mario Huemer, Senior Member, IEEE Abstract The introduction

More information

Console Architecture 1

Console Architecture 1 Console Architecture 1 Overview What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design PS3 in detail

More information

ROM/UDF CPU I/O I/O I/O RAM

ROM/UDF CPU I/O I/O I/O RAM DATA BUSSES INTRODUCTION The avionics systems on aircraft frequently contain general purpose computer components which perform certain processing functions, then relay this information to other systems.

More information

Bridge RF Design and Test Applications with NI SDR Platforms

Bridge RF Design and Test Applications with NI SDR Platforms Bridge RF Design and Test Applications with NI SDR Platforms Jason Strydom Application Engineer National Instruments - Midrand The National Instruments Vision To do for test and measurement what the spreadsheet

More information

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir Parallel Computing 2020: Preparing for the Post-Moore Era Marc Snir THE (CMOS) WORLD IS ENDING NEXT DECADE So says the International Technology Roadmap for Semiconductors (ITRS) 2 End of CMOS? IN THE LONG

More information

Stress Testing the OpenSimulator Virtual World Server

Stress Testing the OpenSimulator Virtual World Server Stress Testing the OpenSimulator Virtual World Server Introduction OpenSimulator (http://opensimulator.org) is an open source project building a general purpose virtual world simulator. As part of a larger

More information

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA

More information

GPU Computing for Cognitive Robotics

GPU Computing for Cognitive Robotics GPU Computing for Cognitive Robotics Martin Peniak, Davide Marocco, Angelo Cangelosi GPU Technology Conference, San Jose, California, 25 March, 2014 Acknowledgements This study was financed by: EU Integrating

More information

Experience with new architectures: moving from HELIOS to Marconi

Experience with new architectures: moving from HELIOS to Marconi Experience with new architectures: moving from HELIOS to Marconi Serhiy Mochalskyy, Roman Hatzky 3 rd Accelerated Computing For Fusion Workshop November 28 29 th, 2016, Saclay, France High Level Support

More information

Model checking in the cloud VIGYAN SINGHAL OSKI TECHNOLOGY

Model checking in the cloud VIGYAN SINGHAL OSKI TECHNOLOGY Model checking in the cloud VIGYAN SINGHAL OSKI TECHNOLOGY Views are biased by Oski experience Service provider, only doing model checking Using off-the-shelf tools (Cadence, Jasper, Mentor, OneSpin Synopsys)

More information

The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance

The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance Aroon Nataraj, Alan Morris, Allen Malony, Matthew Sottile, Pete Beckman l {anataraj, amorris, malony,

More information

An Energy Conservation DVFS Algorithm for the Android Operating System

An Energy Conservation DVFS Algorithm for the Android Operating System Volume 1, Number 1, December 2010 Journal of Convergence An Energy Conservation DVFS Algorithm for the Android Operating System Wen-Yew Liang* and Po-Ting Lai Department of Computer Science and Information

More information

Design Challenges in Multi-GHz Microprocessors

Design Challenges in Multi-GHz Microprocessors Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor Development www.compaq.com Introduction Moore s Law ( Law (the trend that the demand for IC functions and the

More information

WEI HUANG Curriculum Vitae

WEI HUANG Curriculum Vitae 1 WEI HUANG Curriculum Vitae 4025 Duval Road, Apt 2538 Phone: (434) 227-6183 Austin, TX 78759 Email: wh6p@virginia.edu (preferred) https://researcher.ibm.com/researcher/view.php?person=us-huangwe huangwe@us.ibm.com

More information

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp Programming and Optimization with Intel Xeon Phi Coprocessors Colfax Developer Training One-day Boot Camp Abstract: Colfax Developer Training (CDT) is an in-depth intensive course on efficient parallel

More information

Dynamic MIPS Rate Stabilization in Out-of-Order Processors

Dynamic MIPS Rate Stabilization in Out-of-Order Processors Dynamic Rate Stabilization in Out-of-Order Processors Jinho Suh and Michel Dubois Ming Hsieh Dept of EE University of Southern California Outline Motivation Performance Variability of an Out-of-Order Processor

More information

An architecture for Scalable Concurrent Embedded Software" No more communication in your program, the key to multi-core and distributed programming.

An architecture for Scalable Concurrent Embedded Software No more communication in your program, the key to multi-core and distributed programming. An architecture for Scalable Concurrent Embedded Software" No more communication in your program, the key to multi-core and distributed programming. Eric.Verhulst@altreonic.com www.altreonic.com 1 Content

More information

Table of Contents HOL ADV

Table of Contents HOL ADV Table of Contents Lab Overview - - Horizon 7.1: Graphics Acceleartion for 3D Workloads and vgpu... 2 Lab Guidance... 3 Module 1-3D Options in Horizon 7 (15 minutes - Basic)... 5 Introduction... 6 3D Desktop

More information

Liu Yang, Bong-Joo Jang, Sanghun Lim, Ki-Chang Kwon, Suk-Hwan Lee, Ki-Ryong Kwon 1. INTRODUCTION

Liu Yang, Bong-Joo Jang, Sanghun Lim, Ki-Chang Kwon, Suk-Hwan Lee, Ki-Ryong Kwon 1. INTRODUCTION Liu Yang, Bong-Joo Jang, Sanghun Lim, Ki-Chang Kwon, Suk-Hwan Lee, Ki-Ryong Kwon 1. INTRODUCTION 2. RELATED WORKS 3. PROPOSED WEATHER RADAR IMAGING BASED ON CUDA 3.1 Weather radar image format and generation

More information

EE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004

EE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004 EE 382C EMBEDDED SOFTWARE SYSTEMS Literature Survey Report Characterization of Embedded Workloads Ajay Joshi March 30, 2004 ABSTRACT Security applications are a class of emerging workloads that will play

More information

Use Nvidia Performance Primitives (NPP) in Deep Learning Training. Yang Song

Use Nvidia Performance Primitives (NPP) in Deep Learning Training. Yang Song Use Nvidia Performance Primitives (NPP) in Deep Learning Training Yang Song Outline Introduction Function Categories Performance Results Deep Learning Specific Further Information What is NPP? Image+Signal

More information

Combined Dynamic Thermal Management Exploiting Broadcast-Capable Wireless Networkon-Chip

Combined Dynamic Thermal Management Exploiting Broadcast-Capable Wireless Networkon-Chip Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 3-18-2016 Combined Dynamic Thermal Management Exploiting Broadcast-Capable Wireless Networkon-Chip Architecture

More information

Towards a Cross-Layer Framework for Accurate Power Modeling of Microprocessor Designs

Towards a Cross-Layer Framework for Accurate Power Modeling of Microprocessor Designs Towards a Cross-Layer Framework for Accurate Power Modeling of Microprocessor Designs Monir Zaman, Mustafa M. Shihab, Ayse K. Coskun and Yiorgos Makris Department of Electrical and Computer Engineering,

More information

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS Charlie Jenkins, (Altera Corporation San Jose, California, USA; chjenkin@altera.com) Paul Ekas, (Altera Corporation San Jose, California, USA; pekas@altera.com)

More information

PoC #1 On-chip frequency generation

PoC #1 On-chip frequency generation 1 PoC #1 On-chip frequency generation This PoC covers the full on-chip frequency generation system including transport of signals to receiving blocks. 5G frequency bands around 30 GHz as well as 60 GHz

More information

cfireworks: a Tool for Measuring the Communication Costs in Collective I/O

cfireworks: a Tool for Measuring the Communication Costs in Collective I/O Vol., No. 8, cfireworks: a Tool for Measuring the Communication Costs in Collective I/O Kwangho Cha National Institute of Supercomputing and Networking, Korea Institute of Science and Technology Information,

More information

Using SDR for Cost-Effective DTV Applications

Using SDR for Cost-Effective DTV Applications Int'l Conf. Wireless Networks ICWN'16 109 Using SDR for Cost-Effective DTV Applications J. Kwak, Y. Park, and H. Kim Dept. of Computer Science and Engineering, Korea University, Seoul, Korea {jwuser01,

More information

IEEE n MIMO Radio Design Verification Challenge and a Resulting ATE Program Implemented for MIMO Transmitter and Receiver Test

IEEE n MIMO Radio Design Verification Challenge and a Resulting ATE Program Implemented for MIMO Transmitter and Receiver Test 2012 IEEE 18th International Mixed-Signal, Sensors, and Systems Test Workshop IEEE 802.11n MIMO Radio Design Verification Challenge and a Resulting ATE Program Implemented for MIMO Transmitter and Receiver

More information

Smart-M3-Based Robot Interaction in Cyber-Physical Systems

Smart-M3-Based Robot Interaction in Cyber-Physical Systems FRUCT 16, Oulu, Finland October 30, 2014 Smart-M3-Based Robot Interaction in Cyber-Physical Systems Nikolay Teslya *, Sergey Savosin * * St. Petersburg Institute for Informatics and Automation of the Russian

More information

Real-time Pulsar Timing signal processing on GPUs

Real-time Pulsar Timing signal processing on GPUs Real-Time Pulsar Timing Signal Processing on GPUs Plan : Pulsar Timing Instrumentations LPC2E, CNRS Orléans - FRANCE Ismaël Cognard, Gilles Theureau, Grégory Desvignes, Cédric Viou, Dalal Ait-Allal Pulsars

More information

Parallel Simulation of Social Agents using Cilk and OpenCL

Parallel Simulation of Social Agents using Cilk and OpenCL D. Moser, A. Riener, K. Zia, A. Ferscha Department for Pervasive Computing, JKU Linz/Austria Parallel Simulation of Social Agents using Cilk and OpenCL DS-RT 2011 15th International Symposium on Distributed

More information

Matthew Grossman Mentor: Rick Brownrigg

Matthew Grossman Mentor: Rick Brownrigg Matthew Grossman Mentor: Rick Brownrigg Outline What is a WMS? JOCL/OpenCL Wavelets Parallelization Implementation Results Conclusions What is a WMS? A mature and open standard to serve georeferenced imagery

More information