Game Architecture. 4/8/16: Multiprocessor Game Loops

Size: px
Start display at page:

Download "Game Architecture. 4/8/16: Multiprocessor Game Loops"

Transcription

1 Game Architecture 4/8/16: Multiprocessor Game Loops

2 Monolithic Dead simple to set up, but it can get messy Flow-of-control can be complex Top-level may have too much knowledge of underlying systems (gross bubble-up effects like UT Actor) Tough to maintain

3 Cooperative Tasks class Task { virtual void Run() = 0; }; class Renderer : public Task { void Run(float time); }; class TaskManager{ void RunTasks(); void AddTask(Task*); }; void TaskManager::RunTasks(){ foreach(task) task->run(); }

4 Cooperative Tasks Flexible, but clarity suffers Can be too much flexibility What happens in what order difficult to discern by examining code

5 Pre-emptive void InputThread(){ while(1) input(); } void SimulationThread(){ while(1) simulate(); } void RenderThread() { while(1) render(); } void SoundThread() { while(1) sound(); }

6 Pre-emptive Tough to get right Complex interprocess communication Deadlocks, race conditions Questionable performance if used extensively But, increasingly parallel hardware makes this a major area for focus

7 Multiprocessor Game Loops In 2004, the microprocessor industry hit a brick wall due to heat dissipation problems Shifted focus to multicore processors Another painful shift (after all that graphics nonsense!) multithreaded program design is much harder than single-threaded By 2008, most studios ended the gradual transition

8

9

10 CPU Memory 512 MB DRAM Core0 Core1 Core2 L1D L1I MC0 MC1 L1D L1I 1MB L2 GPU 10MB EDRAM L1D BIU/IO Intf 3D Core L1I Video Out I/O Chip XMA Decoder SMC Analog Chip DVD (SATA) HDD port (SATA) Front controllers (2 USB) Wireless controllers MU ports (2 USB) Rear Panel USB Ethernet IR Audio Out FLASH System control Video Out

11 Memory Caches A cache is just a bank of memory that can be read from and written to by the CPU much more quickly than main RAM cache memory typically utilizes the fastest (and most expensive) technology available cache memory is located as physically close as possible to the CPU core, typically on the same die. Cache memory is usually quite a bit smaller in size than main RAM.

12 Memory Caches Improves memory access performance by keeping local copies in the cache of those chunks of data that are most frequently accessed by the program If the data requested by the CPU is already in the cache, it can be provided to the CPU very quickly on the order of tens of cycles (hit) If the data is not already present in the cache, it must be fetched into the cache from main RAM (miss) Reading data from main RAM can take thousands of cycles, so the cost of a cache miss is very high indeed

13 I$ and D$ Both instructions and data are cached The instruction cache (I$) is used to preload executable machine code before it runs The data cache (D$) is used to speed up reading and writing of data to main RAM Always physically distinct

14 Multilevel Caches There is a fundamental trade-off between cache latency and hit rate Larger caches mean higher hit rates, but larger caches cannot be located as close to the CPU, so they tend to be slower than smaller ones. Most game consoles employ at least two levels of cache The CPU first tries to find the data it s looking for in the level 1 (L1) cache. (small, but very low access latency) If the data isn t there, it tries the larger but higherlatency level 2 (L2) cache Only if the data cannot be found in the L2 cache do we incur the full cost of a main memory access.

15 Minimizing Misses The best way to avoid D$ misses is to organize your data in contiguous blocks that are as small as possible and then access them sequentially For I$, keep your high-performance loops as small as possible in terms of code size, and avoid calling functions within your innermost loops. Keep the entire body of the loop in the cache the entire time the loop is running.

16 I$ Misses Keep high-performance code as small as possible, in terms of number of machine language instructions The compiler and linker take care of keeping our functions contiguous in memory Avoid calling functions from within a performance-critical section of code If you have to, place it as close as possible to the calling function preferably immediately before or after the calling function and never in a different translation (compilation) unit Inlining? Inlining a small function can be a big performance boost. But too much bloats the size of the code, which can cause a performance-critical section of code to no longer fit within the cache

17 360 CPU Memory 512 MB DRAM Core0 Core1 Core2 L1D L1I MC0 MC1 L1D L1I 1MB L2 GPU 10MB EDRAM L1D BIU/IO Intf 3D Core L1I Video Out I/O Chip XMA Decoder SMC Analog Chip DVD (SATA) HDD port (SATA) Front controllers (2 USB) Wireless controllers MU ports (2 USB) Rear Panel USB Ethernet IR Audio Out FLASH System control Video Out

18 PS3

19 PS4

20 PS4 huma - heterogeneous unified memory architecture

21 PS4 Cache Architecture 220+ CYCLES CPU Regs FREE 30+ CYCLES L1 I$ (32 KiB) L1 D$ (32 KiB) 3 CYCLES L2 (2 MiB) MAIN RAM (8 GiB) Tuesday, March 4, 14

22 PS4 Cache Architecture C0 C2 C1 C3 C4CPUC5 C6 C7 Regs FREE L1 I$ (32 KiB) L1 D$ (32 KiB) L2 (2 MiB) MAIN RAM (8 GiB) Tuesday, March 4, 14

23 PS4 Cache Architecture C0 C2 C1 C3 L2 (1 MiB) C4CPUC5 C6 C7 Regs FREE L1 I$ (32 KiB) L1 D$ (32 KiB) L2 (1 MiB) MAIN RAM (8 GiB) Tuesday, March 4, 14

24 PS4 Cache Architecture C0 C2 C1 C3 26 CYCLES L2 (1 MiB) C4CPUC5 C6 C7 Regs FREE 26 CYCLES L1 I$ (32 KiB) L1 D$ (32 KiB) L2 (1 MiB) MAIN RAM (8 GiB) Tuesday, March 4, 14

25 PS4 Cache Architecture C0 C1 C2 C3 190 CYCLES L2 (1 MiB) C4 C5 CPU C6 C7 L1 I$ Regs FREE Tuesday, March 4, 14 (32 KiB) L1 D$ (32 KiB) L2 (1 MiB) MAIN RAM (8 GiB)

26 PS4 Cache Architecture MAIN RAM CACHE 0x5280 0x5240 0x5200 0x51C0 0x5180 0x5140 0x5100 0x50C0 0x5080 0x5040 0x5000 0x0280 0x0240 0x0200 0x01C0 0x0180 0x0140 0x0100 0x00C0 0x0080 0x0040 0x0000 Tuesday, March 4, 14

27 PS4 Cache Architecture MAIN RAM CACHE 0x5280 0x5240 0x5200 0x51C0 0x5180 0x5140 0x5100 0x50C0 0x5080 0x5040 0x5000 0x0280 0x0240 0x0200 0x01C0 0x0180 0x0140 0x0100 0x00C0 0x0080 0x0040 0x0000 Tuesday, March 4, 14

28 PS4 Cache Architecture MAIN RAM CACHE 0x5280 0x5240 0x5200 0x51C0 0x5180 0x5140 0x5100 0x50C0 0x5080 0x5040 0x5000 0x0280 0x0240 0x0200 0x01C0 0x0180 0x0140 0x0100 0x00C0 0x0080 0x0040 0x0000 Tuesday, March 4, 14

29 PS4 Optimization PS4-specific: avoid cross-cluster L2 cache line sharing (190 cycles versus 26 cycles)! U32Bg_jobCount[6];B//BoneBperBcore Tuesday, March 4, 14

30 PS4 Optimization PS4-specific: avoid cross-cluster L2 cache line sharing (190 cycles versus 26 cycles)! structbjobcount { BBBBU32Bm_count; BBBBU8BBm_padding[60]; }; JobCountBg_jobCount[6];B//BoneBperBcore Tuesday, March 4, 14

31 PS4

32 Xbox One

33 Subtle Differences Memory type The Xbox One utilizes GDDR3 RAM, while the PS4 uses GDDR5, which gives the PS4 higher theoretical memory bandwidth. The Xbox One counteracts this to some degree by providing its GPU with a dedicated 32 MiB memory store, implemented as very high-speed esram

34 Subtle Differences Bus speeds The buses in the Xbox One support higher bandwidth data transfers than those of the PS4 (30GB/sec vs 20) GPU PS4 s GPU is roughly equivalent to an AMD Radeon 7870, with 1152 parallel stream processors, the Xbox One s GPU is closer to an AMD Radeon 7790, supporting only 768 stream processors the Xbox One s GPU runs at 853MHz vs 800 for the PS4

35 Main Thread Pose Blending Simulate / Integrate Update Game Objects Pose Blending Post Animation Game Object Update Simulate / Integrate Ragdoll Physics etc. Pose Blending Simulate / Integrate Fork Join Fork Join

36 Main Thread Animation Thread Dynamics Thread Rendering Thread HID Update Game Objects Kick off Animation Post Animation Game Object Update Kick Dynamics Sim Ragdoll Physics Finalize Animation Finalize Collision Other Processing (AI Planning, Audio Work, etc.) Kick Redering (for next frame) Sleep Pose Blending Sleep Global Pose Calculation Skin Matrix Palette Calculation Sleep Ragdoll Skinning Sleep Simulate and Integrate Sleep Broad Phase Coll. Narrow Phase Coll. Resolve Constraints Sleep Visibility Determination Sort Submit Primitives Wait for GPU Full-Screen Effects Wait for V- Blank Swap Buffers

37 PPU HID Update Game Objects Kick Animation Jobs Post Animation Game Object Update Kick Dynamics Jobs Ragdoll Physics Finalize Animation Finalize Collision Other Processing (AI Planning, Audio Work, etc.) Kick Redering (for next frame) SPU0 Visibility Visibility Sort Sort Visibility Pose Blend Sort Pose Blend Physics Sim Submit Primitives Global Pose Submit Primitives Global Pose Collisions / Constraints Matrix Palette Ragdoll Skinning SPU1 Visibility Visibility Visibility Sort Visibility Pose Blend Pose Blend Pose Blend Sort Physics Simulation Global Pose Broad Phase Narrow Phase Narrow Phase Matrix Palette Ragdoll Skinning

38 Async Design while (true) { // main game loop //... // Cast a ray to see if the player has line of sight // to the enemy. RayCastResult r = castray(playerpos, enemypos); // Now process the results... if (r.hitsomething() && isenemy(r.gethitobject())) { // Player can see the enemy. //... } // }

39 Async Design while (true) { // main game loop //... // Cast a ray to see if the player has line of sight // to the enemy. RayCastResult r; requestraycast(playerpos, enemypos, &r); } // Do other unrelated work while we wait for the // other CPU to perform the ray cast for us. // // OK, we can't do any more useful work. Wait for the // results of our ray cast job. If the job is // complete, this function will return immediately. // Otherwise, the main thread will idle until the // results are ready... waitforraycastresults(&r); // Process results... if (r.hitsomething() && isenemy(r.gethitobject())) { // Player can see the enemy. //... //... } //...

40 Async Design RayCastResult r; bool rayjobpending = false; while (true) { // main game loop // // Wait for the results of last frame's ray cast job. if (rayjobpending) { waitforraycastresults(&r); // Process results... if (r.hitsomething() && isenemy(r.gethitobject())) { // Player can see the enemy. //... } } // Cast a new ray for next frame. rayjobpending = true; requestraycast(playerpos, enemypos, &r); // Do other work... //... }

Console Architecture 1

Console Architecture 1 Console Architecture 1 Overview What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design PS3 in detail

More information

Killzone Shadow Fall: Threading the Entity Update on PS4. Jorrit Rouwé Lead Game Tech, Guerrilla Games

Killzone Shadow Fall: Threading the Entity Update on PS4. Jorrit Rouwé Lead Game Tech, Guerrilla Games Killzone Shadow Fall: Threading the Entity Update on PS4 Jorrit Rouwé Lead Game Tech, Guerrilla Games Introduction Killzone Shadow Fall is a First Person Shooter PlayStation 4 launch title In SP up to

More information

Supporting x86-64 Address Translation for 100s of GPU Lanes. Jason Power, Mark D. Hill, David A. Wood

Supporting x86-64 Address Translation for 100s of GPU Lanes. Jason Power, Mark D. Hill, David A. Wood Supporting x86-64 Address Translation for 100s of GPU s Jason Power, Mark D. Hill, David A. Wood Summary Challenges: CPU&GPUs physically integrated, but logically separate; This reduces theoretical bandwidth,

More information

Like Mobile Games* Currently a Distinguished i Engineer at Zynga, and CTO of FarmVille 2: Country Escape (for ios/android/kindle)

Like Mobile Games* Currently a Distinguished i Engineer at Zynga, and CTO of FarmVille 2: Country Escape (for ios/android/kindle) Console Games Are Just Like Mobile Games* (* well, not really. But they are more alike than you think ) Hi, I m Brian Currently a Distinguished i Engineer at Zynga, and CTO of FarmVille 2: Country Escape

More information

The Xbox One System on a Chip and Kinect Sensor

The Xbox One System on a Chip and Kinect Sensor The Xbox One System on a Chip and Kinect Sensor John Sell, Patrick O Connor, Microsoft Corporation 1 Abstract The System on a Chip at the heart of the Xbox One entertainment console is one of the largest

More information

The Next Generation of Gaming Consoles

The Next Generation of Gaming Consoles The Next Generation of Gaming Consoles History of the Last Gen Sony had the #1 Console (PS2), was also the oldest and weakest, but had strong developer support Newcomer, Microsoft X-Box, attracted more

More information

The Critical Role of Firmware and Flash Translation Layers in Solid State Drive Design

The Critical Role of Firmware and Flash Translation Layers in Solid State Drive Design The Critical Role of Firmware and Flash Translation Layers in Solid State Drive Design Robert Sykes Director of Applications OCZ Technology Flash Memory Summit 2012 Santa Clara, CA 1 Introduction This

More information

INTRODUCTION TO GAME AI

INTRODUCTION TO GAME AI CS 387: GAME AI INTRODUCTION TO GAME AI 3/31/2016 Instructor: Santiago Ontañón santi@cs.drexel.edu Class website: https://www.cs.drexel.edu/~santi/teaching/2016/cs387/intro.html Outline Game Engines Perception

More information

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Rachata Ausavarungnirun Joshua Landgraf Vance Miller Saugata Ghose Jayneel Gandhi Christopher J. Rossbach Onur

More information

Arduino STEAM Academy Arduino STEM Academy Art without Engineering is dreaming. Engineering without Art is calculating. - Steven K.

Arduino STEAM Academy Arduino STEM Academy Art without Engineering is dreaming. Engineering without Art is calculating. - Steven K. Arduino STEAM Academy Arduino STEM Academy Art without Engineering is dreaming. Engineering without Art is calculating. - Steven K. Roberts Page 1 See Appendix A, for Licensing Attribution information

More information

Lecture Topics. Announcements. Today: Memory Management (Stallings, chapter ) Next: continued. Self-Study Exercise #6. Project #4 (due 10/11)

Lecture Topics. Announcements. Today: Memory Management (Stallings, chapter ) Next: continued. Self-Study Exercise #6. Project #4 (due 10/11) Lecture Topics Today: Memory Management (Stallings, chapter 7.1-7.4) Next: continued 1 Announcements Self-Study Exercise #6 Project #4 (due 10/11) Project #5 (due 10/18) 2 Memory Hierarchy 3 Memory Hierarchy

More information

Project 5: Optimizer Jason Ansel

Project 5: Optimizer Jason Ansel Project 5: Optimizer Jason Ansel Overview Project guidelines Benchmarking Library OoO CPUs Project Guidelines Use optimizations from lectures as your arsenal If you decide to implement one, look at Whale

More information

occam on the Arduino Adam T. Sampson School of Computing, University of Kent Matt C. Jadud Department of Computer Science, Allegheny College

occam on the Arduino Adam T. Sampson School of Computing, University of Kent Matt C. Jadud Department of Computer Science, Allegheny College occam on the Arduino Adam T. Sampson School of Computing, University of Kent Matt C. Jadud Department of Computer Science, Allegheny College Christian L. Jacobsen Department of Computer Science, University

More information

ADVANCED EMBEDDED MONITORING SYSTEM FOR ELECTROMAGNETIC RADIATION

ADVANCED EMBEDDED MONITORING SYSTEM FOR ELECTROMAGNETIC RADIATION 98 Chapter-5 ADVANCED EMBEDDED MONITORING SYSTEM FOR ELECTROMAGNETIC RADIATION 99 CHAPTER-5 Chapter 5: ADVANCED EMBEDDED MONITORING SYSTEM FOR ELECTROMAGNETIC RADIATION S.No Name of the Sub-Title Page

More information

Introduction to Game Design. Truong Tuan Anh CSE-HCMUT

Introduction to Game Design. Truong Tuan Anh CSE-HCMUT Introduction to Game Design Truong Tuan Anh CSE-HCMUT Games Games are actually complex applications: interactive real-time simulations of complicated worlds multiple agents and interactions game entities

More information

The Who. Intel - no introduction required.

The Who. Intel - no introduction required. Delivering Demand-Based Worlds with Intel SSD GDC 2011 The Who Intel - no introduction required. Digital Extremes - In addition to be great developers of AAA games, they are also the authors of the Evolution

More information

Understanding OpenGL

Understanding OpenGL This document provides an overview of the OpenGL implementation in Boris Red. About OpenGL OpenGL is a cross-platform standard for 3D acceleration. GL stands for graphics library. Open refers to the ongoing,

More information

1 of 34 6/10/2007 7:52 PM. print this page

1 of 34 6/10/2007 7:52 PM. print this page 1 of 34 6/10/2007 7:52 PM print this page 2 of 34 6/10/2007 7:52 PM Inside Microsoft's Xbox 360 Date: Nov 16, 2005 Type: System Manufacturer: Microsoft Author: Anand Lal Shimpi, Kristopher Kubicki & Tuan

More information

Artificial Intelligence for Games. Santa Clara University, 2012

Artificial Intelligence for Games. Santa Clara University, 2012 Artificial Intelligence for Games Santa Clara University, 2012 Introduction Class 1 Artificial Intelligence for Games What is different Gaming stresses computing resources Graphics Engine Physics Engine

More information

Track and Vertex Reconstruction on GPUs for the Mu3e Experiment

Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Track and Vertex Reconstruction on GPUs for the Mu3e Experiment Dorothea vom Bruch for the Mu3e Collaboration GPU Computing in High Energy Physics, Pisa September 11th, 2014 Physikalisches Institut Heidelberg

More information

CRYPTOSHOOTER MULTI AGENT BASED SECRET COMMUNICATION IN AUGMENTED VIRTUALITY

CRYPTOSHOOTER MULTI AGENT BASED SECRET COMMUNICATION IN AUGMENTED VIRTUALITY CRYPTOSHOOTER MULTI AGENT BASED SECRET COMMUNICATION IN AUGMENTED VIRTUALITY Submitted By: Sahil Narang, Sarah J Andrabi PROJECT IDEA The main idea for the project is to create a pursuit and evade crowd

More information

CS4617 Computer Architecture

CS4617 Computer Architecture 1/26 CS4617 Computer Architecture Lecture 2 Dr J Vaughan September 10, 2014 2/26 Amdahl s Law Speedup = Execution time for entire task without using enhancement Execution time for entire task using enhancement

More information

Microarchitectural Attacks and Defenses in JavaScript

Microarchitectural Attacks and Defenses in JavaScript Microarchitectural Attacks and Defenses in JavaScript Michael Schwarz, Daniel Gruss, Moritz Lipp 25.01.2018 www.iaik.tugraz.at 1 Michael Schwarz, Daniel Gruss, Moritz Lipp www.iaik.tugraz.at Microarchitecture

More information

Console Games Are Just Like Mobile Games* (* well, not really. But they are more alike than you

Console Games Are Just Like Mobile Games* (* well, not really. But they are more alike than you Console Games Are Just Like Mobile Games* (* well, not really. But they are more alike than you think ) Hi, I m Brian Currently a Software Architect at Zynga, and CTO of CastleVille Legends (for ios/android)

More information

Recent Advances in Simulation Techniques and Tools

Recent Advances in Simulation Techniques and Tools Recent Advances in Simulation Techniques and Tools Yuyang Li, li.yuyang(at)wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download Abstract: Simulation refers to using specified kind

More information

Campus Fighter. CSEE 4840 Embedded System Design. Haosen Wang, hw2363 Lei Wang, lw2464 Pan Deng, pd2389 Hongtao Li, hl2660 Pengyi Zhang, pnz2102

Campus Fighter. CSEE 4840 Embedded System Design. Haosen Wang, hw2363 Lei Wang, lw2464 Pan Deng, pd2389 Hongtao Li, hl2660 Pengyi Zhang, pnz2102 Campus Fighter CSEE 4840 Embedded System Design Haosen Wang, hw2363 Lei Wang, lw2464 Pan Deng, pd2389 Hongtao Li, hl2660 Pengyi Zhang, pnz2102 March 2011 Project Introduction In this project we aim to

More information

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA

More information

Design of Embedded Systems - Advanced Course Project

Design of Embedded Systems - Advanced Course Project 2011-10-31 Bomberman A Design of Embedded Systems - Advanced Course Project Linus Sandén, Mikael Göransson & Michael Lennartsson et07ls4@student.lth.se, et07mg7@student.lth.se, mt06ml8@student.lth.se Abstract

More information

Out-of-Order Execution. Register Renaming. Nima Honarmand

Out-of-Order Execution. Register Renaming. Nima Honarmand Out-of-Order Execution & Register Renaming Nima Honarmand Out-of-Order (OOO) Execution (1) Essence of OOO execution is Dynamic Scheduling Dynamic scheduling: processor hardware determines instruction execution

More information

Introduction to Real-Time Systems

Introduction to Real-Time Systems Introduction to Real-Time Systems Real-Time Systems, Lecture 1 Martina Maggio and Karl-Erik Årzén 16 January 2018 Lund University, Department of Automatic Control Content [Real-Time Control System: Chapter

More information

Parallelism Across the Curriculum

Parallelism Across the Curriculum Parallelism Across the Curriculum John E. Howland Department of Computer Science Trinity University One Trinity Place San Antonio, Texas 78212-7200 Voice: (210) 999-7364 Fax: (210) 999-7477 E-mail: jhowland@trinity.edu

More information

Lecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University

Lecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University Lecture 6: Electronics Beyond the Logic Switches Xufeng Kou School of Information Science and Technology ShanghaiTech University EE 224 Solid State Electronics II Lecture 3: Lattice and symmetry 1 Outline

More information

SOFTWARE IMPLEMENTATION OF THE

SOFTWARE IMPLEMENTATION OF THE SOFTWARE IMPLEMENTATION OF THE IEEE 802.11A/P PHYSICAL LAYER SDR`12 WInnComm Europe 27 29 June, 2012 Brussels, Belgium T. Cupaiuolo, D. Lo Iacono, M. Siti and M. Odoni Advanced System Technologies STMicroelectronics,

More information

Oculus Rift Getting Started Guide

Oculus Rift Getting Started Guide Oculus Rift Getting Started Guide Version 1.23 2 Introduction Oculus Rift Copyrights and Trademarks 2017 Oculus VR, LLC. All Rights Reserved. OCULUS VR, OCULUS, and RIFT are trademarks of Oculus VR, LLC.

More information

SDR-14 User s Guide Version 1.2 Software Defined Receiver & Spectrum Analyzer

SDR-14 User s Guide Version 1.2 Software Defined Receiver & Spectrum Analyzer SDR-14 User s Guide Version 1.2 Software Defined Receiver & Spectrum Analyzer Software Defined Receiver & Spectrum Analyzer 2004 RFSPACE. All rights reserved. 2 TABLE OF CONTENTS PACKAGE CONTENTS..3 GETTING

More information

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs 5 th International Conference on Logic and Application LAP 2016 Dubrovnik, Croatia, September 19-23, 2016 Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs

More information

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling Improving GPU Performance via Large Warps and Two-Level Warp Scheduling Veynu Narasiman The University of Texas at Austin Michael Shebanow NVIDIA Chang Joo Lee Intel Rustam Miftakhutdinov The University

More information

Blackfin Online Learning & Development

Blackfin Online Learning & Development Presentation Title: Introduction to VisualDSP++ Tools Presenter Name: Nicole Wright Chapter 1:Introduction 1a:Module Description 1b:CROSSCORE Products Chapter 2: ADSP-BF537 EZ-KIT Lite Configuration 2a:

More information

Real Time Operating Systems Lecture 29.1

Real Time Operating Systems Lecture 29.1 Real Time Operating Systems Lecture 29.1 EE345M Final Exam study guide (Spring 2014): Final is both a closed and open book exam. During the closed book part you can have a pencil, pen and eraser. During

More information

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of

More information

Blackfin Online Learning & Development

Blackfin Online Learning & Development A Presentation Title: Blackfin Optimizations for Performance and Power Consumption Presenter: Merril Weiner, Senior DSP Engineer Chapter 1: Introduction Subchapter 1a: Agenda Chapter 1b: Overview Chapter

More information

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics

Performance Metrics. Computer Architecture. Outline. Objectives. Basic Performance Metrics. Basic Performance Metrics Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Performance Metrics http://www.yildiz.edu.tr/~naydin 1 2 Objectives How can we meaningfully measure and compare

More information

Which motherboard is in the xbox 360 elite

Which motherboard is in the xbox 360 elite Which motherboard is in the xbox 360 elite ASRock X470 Master SLI/AC AM4 AMD Promontory X470 Wi-Fi $109.99. Main article: List of Xbox 360 retail configurations. Between January 2011 and October 2013,

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 14 Improving Performance: Interleaving Israel Koren ECE568/Koren Part.14.1 Background Performance

More information

Advances in Antenna Measurement Instrumentation and Systems

Advances in Antenna Measurement Instrumentation and Systems Advances in Antenna Measurement Instrumentation and Systems Steven R. Nichols, Roger Dygert, David Wayne MI Technologies Suwanee, Georgia, USA Abstract Since the early days of antenna pattern recorders,

More information

FPGA Based 70MHz Digital Receiver for RADAR Applications

FPGA Based 70MHz Digital Receiver for RADAR Applications Technology Volume 1, Issue 1, July-September, 2013, pp. 01-07, IASTER 2013 www.iaster.com, Online: 2347-6109, Print: 2348-0017 FPGA Based 70MHz Digital Receiver for RADAR Applications ABSTRACT Dr. M. Kamaraju

More information

Early Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida

Early Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Early Adopter : Multiprocessor Programming in the Undergraduate Program NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Narsingh Deo Damian Dechev Mahadevan Vasudevan Department

More information

SPACEYARD SCRAPPERS 2-D GAME DESIGN DOCUMENT

SPACEYARD SCRAPPERS 2-D GAME DESIGN DOCUMENT SPACEYARD SCRAPPERS 2-D GAME DESIGN DOCUMENT Abstract This game design document describes the details for a Vertical Scrolling Shoot em up (AKA shump or STG) video game that will be based around concepts

More information

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Ramon Canal NCD Master MIRI. NCD Master MIRI 1 Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/

More information

Embedded & Robotics Training

Embedded & Robotics Training Embedded & Robotics Training WebTek Labs creates and delivers high-impact solutions, enabling our clients to achieve their business goals and enhance their competitiveness. With over 13+ years of experience,

More information

Image Processing Architectures (and their future requirements)

Image Processing Architectures (and their future requirements) Lecture 16: Image Processing Architectures (and their future requirements) Visual Computing Systems Smart phone processing resources Example SoC: Qualcomm Snapdragon Image credit: Qualcomm Apple A7 (iphone

More information

An architecture for Scalable Concurrent Embedded Software" No more communication in your program, the key to multi-core and distributed programming.

An architecture for Scalable Concurrent Embedded Software No more communication in your program, the key to multi-core and distributed programming. An architecture for Scalable Concurrent Embedded Software" No more communication in your program, the key to multi-core and distributed programming. Eric.Verhulst@altreonic.com www.altreonic.com 1 Content

More information

1. The decimal number 62 is represented in hexadecimal (base 16) and binary (base 2) respectively as

1. The decimal number 62 is represented in hexadecimal (base 16) and binary (base 2) respectively as BioE 1310 - Review 5 - Digital 1/16/2017 Instructions: On the Answer Sheet, enter your 2-digit ID number (with a leading 0 if needed) in the boxes of the ID section. Fill in the corresponding numbered

More information

Image Processing Architectures (and their future requirements)

Image Processing Architectures (and their future requirements) Lecture 17: Image Processing Architectures (and their future requirements) Visual Computing Systems Smart phone processing resources Qualcomm snapdragon Image credit: Qualcomm Apple A7 (iphone 5s) Chipworks

More information

Datorstödd Elektronikkonstruktion

Datorstödd Elektronikkonstruktion Datorstödd Elektronikkonstruktion [Computer Aided Design of Electronics] Zebo Peng, Petru Eles and Gert Jervan Embedded Systems Laboratory IDA, Linköping University http://www.ida.liu.se/~tdts80/~tdts80

More information

8 Frames in 16ms. Michael Stallone Lead Software Engineer Engine NetherRealm Studios

8 Frames in 16ms. Michael Stallone Lead Software Engineer Engine NetherRealm Studios 8 Frames in 16ms Rollback Networking in Mortal Kombat and Injustice Michael Stallone Lead Software Engineer Engine NetherRealm Studios mstallone@netherrealm.com What is this talk about? The how, why, and

More information

Lec 24: Parallel Processors. Announcements

Lec 24: Parallel Processors. Announcements Lec 24: Parallel Processors Kavita ala CS 3410, Fall 2008 Computer Science Cornell University P 3 out Hack n Seek nnouncements The goal is to have fun with it Recitations today will talk about it Pizza

More information

Ps3 Computers Instruction Set Definition Reduced

Ps3 Computers Instruction Set Definition Reduced Ps3 Computers Instruction Set Definition Reduced (Compare scalar processors, whose instructions operate on single data items.) microprocessor designs led to the vector supercomputer's demise in the later

More information

A NEW ARCHITECTURE FOR FLIGHTGEAR FLIGHT SIMULATOR

A NEW ARCHITECTURE FOR FLIGHTGEAR FLIGHT SIMULATOR A NEW ARCHITECTURE FOR FLIGHTGEAR FLIGHT SIMULATOR AJ MacLeod, Ampere K. Hardraade, Michael Koehne, Steve Knoblock MVC architecture,, FDM Instance, Client To continue improving existing features and add

More information

Architecting Systems of the Future, page 1

Architecting Systems of the Future, page 1 Architecting Systems of the Future featuring Eric Werner interviewed by Suzanne Miller ---------------------------------------------------------------------------------------------Suzanne Miller: Welcome

More information

? 5. VR/AR AI GPU

? 5. VR/AR AI GPU 1896 1935 1987 2006 1896 1935 1987 2006 1. 2. 3 3. 1. 4.? 5. VR/AR 6. 7. 8. 9. AI GPU VR 1. Lecture notes 2. Real Time Rendering, Tomas Möller and Eric Haines 3. The Art of Game Design, Jesse Schell 4.

More information

2020 Computing: Virtual Immersion Architectures (VIA-2020)

2020 Computing: Virtual Immersion Architectures (VIA-2020) 2020 Computing: Virtual Immersion Architectures (VIA-2020) SRC/NSF/ITRS Forum on Emerging nano-cmos Architectures Meeting Date: July 10-11, 2008 Meeting Place: Seymour Marine Discovery Center of UC Santa

More information

Digital Integrated Circuits Perspectives. Administrivia

Digital Integrated Circuits Perspectives. Administrivia Lecture 30 Perspectives Administrivia Final on Friday December 14, 2001 8 am Location: 180 Tan Hall Topics all what was covered in class. Review Session - TBA Lab and hw scores to be posted on the web

More information

SpinSpectra NSMS. Noise Spectrum Measurement System

SpinSpectra NSMS. Noise Spectrum Measurement System SpinSpectra NSMS Noise Spectrum Measurement System SpinSpectra NSMS is used to measure the intrinsic noise of a sensor, an electronic device, or a new electronic or magnetic material as a function of frequency

More information

Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka

Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka Simulation Performance Optimization of Virtual Prototypes Sammidi Mounika, B S Renuka Abstract Virtual prototyping is becoming increasingly important to embedded software developers, engineers, managers

More information

Interfacing ACT-R with External Simulations

Interfacing ACT-R with External Simulations Interfacing ACT-R with External Simulations Eric Biefeld, Brad Best, Christian Lebiere Human-Computer Interaction Institute Carnegie Mellon University We Have Integrated ACT-R With Several External Simulations

More information

Table of Contents HOL ADV

Table of Contents HOL ADV Table of Contents Lab Overview - - Horizon 7.1: Graphics Acceleartion for 3D Workloads and vgpu... 2 Lab Guidance... 3 Module 1-3D Options in Horizon 7 (15 minutes - Basic)... 5 Introduction... 6 3D Desktop

More information

Tutorial 3: Entering the World of GNU Software Radio

Tutorial 3: Entering the World of GNU Software Radio Tutorial 3: Entering the World of GNU Software Radio Dawei Shen August 3, 2005 Abstract This article provides an overview of the GNU Radio toolkit for building software radios. This tutorial is a modified

More information

ECE 498 Linux Assembly Language Lecture 8

ECE 498 Linux Assembly Language Lecture 8 ECE 498 Linux Assembly Language Lecture 8 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 11 December 2012 Video Game Programming My personal gateway into assembly programming

More information

Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen

Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen GIGA seminar 11.1.2010 Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen janne.janhunen@ee.oulu.fi 2 Outline Introduction Benefits and Challenges

More information

Putting It All Together: Computer Architecture and the Digital Camera

Putting It All Together: Computer Architecture and the Digital Camera 461 Putting It All Together: Computer Architecture and the Digital Camera This book covers many topics in circuit analysis and design, so it is only natural to wonder how they all fit together and how

More information

Performance Metrics, Amdahl s Law

Performance Metrics, Amdahl s Law ecture 26 Computer Science 61C Spring 2017 March 20th, 2017 Performance Metrics, Amdahl s Law 1 New-School Machine Structures (It s a bit more complicated!) Software Hardware Parallel Requests Assigned

More information

Low-Power CMOS VLSI Design

Low-Power CMOS VLSI Design Low-Power CMOS VLSI Design ( 范倫達 ), Ph. D. Department of Computer Science, National Chiao Tung University, Taiwan, R.O.C. Fall, 2017 ldvan@cs.nctu.edu.tw http://www.cs.nctu.tw/~ldvan/ Outline Introduction

More information

Overview of Design Methodology. A Few Points Before We Start 11/4/2012. All About Handling The Complexity. Lecture 1. Put things into perspective

Overview of Design Methodology. A Few Points Before We Start 11/4/2012. All About Handling The Complexity. Lecture 1. Put things into perspective Overview of Design Methodology Lecture 1 Put things into perspective ECE 156A 1 A Few Points Before We Start ECE 156A 2 All About Handling The Complexity Design and manufacturing of semiconductor products

More information

Tobii Pro VR Integration based on HTC Vive Development Kit Description

Tobii Pro VR Integration based on HTC Vive Development Kit Description Tobii Pro VR Integration based on HTC Vive Development Kit Description 1 Introduction This document describes the features and functionality of the Tobii Pro VR Integration, a retrofitted version of the

More information

Fall 2015 COMP Operating Systems. Lab #7

Fall 2015 COMP Operating Systems. Lab #7 Fall 2015 COMP 3511 Operating Systems Lab #7 Outline Review and examples on virtual memory Motivation of Virtual Memory Demand Paging Page Replacement Q. 1 What is required to support dynamic memory allocation

More information

Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system

Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system Name: Affiliation: Field of research: Specific Field of Study: Proposed Research Topic: Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar Information Science and Technology Computer Science

More information

Li-Fi And Microcontroller Based Home Automation Or Device Control Introduction

Li-Fi And Microcontroller Based Home Automation Or Device Control Introduction Li-Fi And Microcontroller Based Home Automation Or Device Control Introduction Optical communications have been used in various forms for thousands of years. After the invention of light amplification

More information

Chapter 1:Object Interaction with Blueprints. Creating a project and the first level

Chapter 1:Object Interaction with Blueprints. Creating a project and the first level Chapter 1:Object Interaction with Blueprints Creating a project and the first level Setting a template for a new project Making sense of the project settings Creating the project 2 Adding objects to our

More information

Using an FPGA based system for IEEE 1641 waveform generation

Using an FPGA based system for IEEE 1641 waveform generation Using an FPGA based system for IEEE 1641 waveform generation Colin Baker EADS Test & Services (UK) Ltd 23 25 Cobham Road Wimborne, Dorset, UK colin.baker@eads-ts.com Ashley Hulme EADS Test Engineering

More information

Ps3 Computing Instruction Set Definition Reduced

Ps3 Computing Instruction Set Definition Reduced Ps3 Computing Instruction Set Definition Reduced (Compare scalar processors, whose instructions operate on single data items.) that feature instructions for a form of vector processing on multiple (vectorized)

More information

Computer Aided Design of Electronics

Computer Aided Design of Electronics Computer Aided Design of Electronics [Datorstödd Elektronikkonstruktion] Zebo Peng, Petru Eles, and Nima Aghaee Embedded Systems Laboratory IDA, Linköping University www.ida.liu.se/~tdts01 Electronic Systems

More information

Propietary Engine VS Commercial engine. by Zalo

Propietary Engine VS Commercial engine. by Zalo Propietary Engine VS Commercial engine by Zalo zalosan@gmail.com About me B.S. Computer Engineering 9 years of experience, 5 different companies 3 propietary engines, 2 commercial engines I have my own

More information

CMP 301B Computer Architecture. Appendix C

CMP 301B Computer Architecture. Appendix C CMP 301B Computer Architecture Appendix C Dealing with Exceptions What should be done when an exception arises and many instructions are in the pipeline??!! Force a trap instruction in the next IF stage

More information

Lesson 3: Arduino. Goals

Lesson 3: Arduino. Goals Introduction: This project introduces you to the wonderful world of Arduino and how to program physical devices. In this lesson you will learn how to write code and make an LED flash. Goals 1 - Get to

More information

AN IMPLEMENTATION OF MULTI-DSP SYSTEM ARCHITECTURE FOR PROCESSING VARIANT LENGTH FRAME FOR WEATHER RADAR

AN IMPLEMENTATION OF MULTI-DSP SYSTEM ARCHITECTURE FOR PROCESSING VARIANT LENGTH FRAME FOR WEATHER RADAR DOI: 10.21917/ime.2018.0096 AN IMPLEMENTATION OF MULTI- SYSTEM ARCHITECTURE FOR PROCESSING VARIANT LENGTH FRAME FOR WEATHER RADAR Min WonJun, Han Il, Kang DokGil and Kim JangSu Institute of Information

More information

RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM

RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, Shaojun Wei Institute of Microelectronics Tsinghua University The 45th International

More information

Synthetic Aperture Beamformation using the GPU

Synthetic Aperture Beamformation using the GPU Paper presented at the IEEE International Ultrasonics Symposium, Orlando, Florida, 211: Synthetic Aperture Beamformation using the GPU Jens Munk Hansen, Dana Schaa and Jørgen Arendt Jensen Center for Fast

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Out-of-Order Schedulers Data-Capture Scheduler Dispatch: read available operands from ARF/ROB, store in scheduler Commit: Missing operands filled in from bypass Issue: When

More information

Enabling Mobile Virtual Reality ARM 助力移动 VR 产业腾飞

Enabling Mobile Virtual Reality ARM 助力移动 VR 产业腾飞 Enabling Mobile Virtual Reality ARM 助力移动 VR 产业腾飞 Nathan Li Ecosystem Manager Mobile Compute Business Line Shenzhen, China May 20, 2016 3 Photograph: Mark Zuckerberg Facebook https://www.facebook.com/photo.php?fbid=10102665120179591&set=pcb.10102665126861201&type=3&theater

More information

CSEE4840 Project Design Document. Battle City

CSEE4840 Project Design Document. Battle City CSEE4840 Project Design Document Battle City March 18, 2011 Group memebers: Tian Chu (tc2531) Liuxun Zhu (lz2275) Tianchen Li (tl2445) Quan Yuan (qy2129) Yuanzhao Huangfu (yh2453) Introduction: Our project

More information

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors Chapter 16 - Instruction-Level Parallelism and Superscalar Processors Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 16 - Superscalar Processors 1 / 78 Table of Contents I 1 Overview

More information

Signal Technologies 1

Signal Technologies 1 Signal Technologies 1 Gunning Transceiver Logic (GTL) - evolution Evolved from BTL, the backplane transceiver logic, which in turn evolved from ECL (emitter-coupled logic) Setup of an open collector bus

More information

Tobii Pro VR Analytics Product Description

Tobii Pro VR Analytics Product Description Tobii Pro VR Analytics Product Description 1 Introduction 1.1 Overview This document describes the features and functionality of Tobii Pro VR Analytics. It is an analysis software tool that integrates

More information

DEMIGOD DEMIGOD. characterize stalls and pop-ups during game play. Serious gamers play games at their maximum settings driving HD monitors.

DEMIGOD DEMIGOD. characterize stalls and pop-ups during game play. Serious gamers play games at their maximum settings driving HD monitors. Intel Solid-State Drives (Intel SSDs) are revolutionizing storage performance on desktop and laptop PCs, delivering dramatically faster load times than hard disk drives (HDDs). When Intel SSDs are used

More information

Pangolin: A Look at the Conceptual Architecture of SuperTuxKart. Caleb Aikens Russell Dawes Mohammed Gasmallah Leonard Ha Vincent Hung Joseph Landy

Pangolin: A Look at the Conceptual Architecture of SuperTuxKart. Caleb Aikens Russell Dawes Mohammed Gasmallah Leonard Ha Vincent Hung Joseph Landy Pangolin: A Look at the Conceptual Architecture of SuperTuxKart Caleb Aikens Russell Dawes Mohammed Gasmallah Leonard Ha Vincent Hung Joseph Landy Abstract This report will be taking a look at the conceptual

More information

Processors Processing Processors. The meta-lecture

Processors Processing Processors. The meta-lecture Simulators 5SIA0 Processors Processing Processors The meta-lecture Why Simulators? Your Friend Harm Why Simulators? Harm Loves Tractors Harm Why Simulators? The outside world Unfortunately for Harm you

More information

Outline Simulators and such. What defines a simulator? What about emulation?

Outline Simulators and such. What defines a simulator? What about emulation? Outline Simulators and such Mats Brorsson & Mladen Nikitovic ICT Dept of Electronic, Computer and Software Systems (ECS) What defines a simulator? Why are simulators needed? Classifications Case studies

More information

RF and Microwave Test and Design Roadshow Cape Town & Midrand

RF and Microwave Test and Design Roadshow Cape Town & Midrand RF and Microwave Test and Design Roadshow Cape Town & Midrand Advanced PXI Technologies Signal Recording, FPGA s, and Synchronization Philip Ehlers Outline Introduction to the PXI Architecture PXI Data

More information

MITOCW Project: Backgammon tutor MIT Multicore Programming Primer, IAP 2007

MITOCW Project: Backgammon tutor MIT Multicore Programming Primer, IAP 2007 MITOCW Project: Backgammon tutor MIT 6.189 Multicore Programming Primer, IAP 2007 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue

More information