The Looming Software Crisis due to the Multicore Menace
|
|
- Felix Lane
- 5 years ago
- Views:
Transcription
1 The Looming Software Crisis due to the Multicore Menace Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
2 2 Today: The Happily Oblivious Average Joe Programmer Joe is oblivious about the processor Moore s law bring Joe performance Sufficient for Joe s requirements Joe has built a solid boundary between Hardware and Software High level languages abstract away the processors Ex: Java bytecode is machine independent This abstraction has provided a lot of freedom for Joe Parallel Programming is only practiced by a few experts
3 3 Joe the Parallel Programmer Moore s law is not bringing anymore performance gains If Joe needs performance he has to deal with multicores Joe has to deal with performance Joe has to deal with parallelism Is there a better way?
4 4 Why Parallelism is Hard A huge increase in complexity and work for the programmer Programmer has to think about performance! Parallelism has to be designed in at every level Humans are sequential beings Deconstructing problems into parallel tasks is hard for many of us Parallelism is not easy to implement Parallelism cannot be abstracted or layered away Code and data has to be restructured in very different (non-intuitive) ways Parallel programs are very hard to debug Combinatorial explosion of possible execution orderings Race condition and deadlock bugs are non-deterministic and illusive Non-deterministic bugs go away in lab environment and with instrumentation
5 Compiler-Aware Language Design The StreamIt Experience FMDemod Scatter LPF LPF 2 LPF 3 Gather Speaker
6 6 Stream Application Domain Graphics Cryptography Databases Object recognition Network processing and security Scientific codes
7 7 StreamIt Project Language Semantics / Programmability StreamIt Language (CC 02) Programming Environment in Eclipse (P-PHEC 05) Optimizations / Code Generation Phased Scheduling (LCTES 03) Cache Aware Optimization (LCTES 05) Domain Specific Optimizations Linear Analysis and Optimization (PLDI 03) Optimizations for bit streaming (PLDI 05) Linear State Space Analysis (CASES 05) Parallelism Teleport Messaging (PPOPP 05) Compiling for Communication-Exposed Architectures (ASPLOS 02) Load-Balanced Rendering (Graphics Hardware 05) Applications SAR, DSP benchmarks, JPEG, MPEG [IPDPS 06], DES and Serpent [PLDI 05], Uniprocessor backend StreamIt Program Front-end Stream-Aware Optimizations Cluster backend Annotated Java Raw backend C MPI-like C per tile + C msg code IBM X0 backend Streaming X0 runtime
8 8 Compiler-Aware Language Design boost productivity, enable faster development and rapid prototyping programmability enable parallel execution target multicores, clusters, tiled architectures, DSPs, graphics processors,
9 Streaming Application Design MPEG bit stream picture type VLD quantization coefficients <QC> <PT, PT2> frequency encoded macroblocks ZigZag <QC> IDCT IQuantization Saturation spatially encoded macroblocks Motion Compensation <PT> splitter joiner splitter Y Cb Cr reference picture Motion Compensation <PT> reference picture Channel Upsample joiner <PT2> macroblocks, motion vectors Picture Reorder differentially coded motion vectors Motion Vector Decode Repeat recovered picture Color Space Conversion motion vectors Motion Compensation <PT> reference picture Channel Upsample add VLD(QC, PT, PT2); add Structured splitjoin { block level split roundrobin(n B, V); diagram describes add pipeline { add ZigZag(B); computation add IQuantization(B) to and QC; flow add IDCT(B); add Saturation(B); } of add data pipeline { } add MotionVectorDecode(); add Repeat(V, N); join roundrobin(b, V); } add splitjoin { split roundrobin(4 (B+V), B+V, B+V); Conceptually easy to understand } add MotionCompensation(4 (B+V)) to PT; for (int i = 0; i < 2; i++) { add pipeline { add MotionCompensation(B+V) to PT; add ChannelUpsample(B); } } functionality Clean abstraction of join roundrobin(,, ); add PictureReorder(3 W H) to PT2; add ColorSpaceConversion(3 W H); MPEG-2 Decoder 9
10 0 StreamIt Philosophy picture type VLD quantization coefficients <QC> <PT, PT2> frequency encoded macroblocks ZigZag <QC> IDCT IQuantization Saturation spatially encoded macroblocks Motion Compensation <PT> splitter joiner splitter Y Cb Cr reference picture Motion Compensation <PT> reference picture Channel Upsample joiner <PT2> MPEG bit stream macroblocks, motion vectors Picture Reorder differentially coded motion vectors Motion Vector Decode Repeat recovered picture Color Space Conversion motion vectors Motion Compensation <PT> reference picture Channel Upsample add VLD(QC, PT, PT2); add Preserve splitjoin { program split roundrobin(n B, V); structure Natural for application developers to express add pipeline { add ZigZag(B); add IQuantization(B) to QC; add IDCT(B); add Saturation(B); } add pipeline { add MotionVectorDecode(); add Repeat(V, N); } Leverage join roundrobin(b, V); program } add splitjoin { split roundrobin(4 (B+V), B+V, B+V); structure to discover parallelism and deliver high performance add MotionCompensation(4 (B+V)) to PT; for (int i = 0; i < 2; i++) { add pipeline { add MotionCompensation(B+V) to PT; add ChannelUpsample(B); } } join roundrobin(,, ); } Programs remain clean add PictureReorder(3 W H) to PT2; Portable and malleable add ColorSpaceConversion(3 W H);
11 StreamIt Philosophy MPEG bit stream picture type VLD quantization coefficients <QC> <PT, PT2> frequency encoded macroblocks ZigZag <QC> IDCT IQuantization Saturation spatially encoded macroblocks Motion Compensation <PT> splitter joiner splitter Y Cb Cr reference picture Motion Compensation <PT> reference picture Channel Upsample joiner <PT2> macroblocks, motion vectors Picture Reorder differentially coded motion vectors Motion Vector Decode Repeat recovered picture Color Space Conversion motion vectors Motion Compensation <PT> reference picture Channel Upsample add VLD(QC, PT, PT2); add splitjoin { split roundrobin(n B, V); add pipeline { add ZigZag(B); add IQuantization(B) to QC; add IDCT(B); add Saturation(B); } add pipeline { add MotionVectorDecode(); add Repeat(V, N); } join roundrobin(b, V); } add splitjoin { split roundrobin(4 (B+V), B+V, B+V); add MotionCompensation(4 (B+V)) to PT; for (int i = 0; i < 2; i++) { add pipeline { add MotionCompensation(B+V) to PT; add ChannelUpsample(B); } } join roundrobin(,, ); } add PictureReorder(3 W H) to PT2; add ColorSpaceConversion(3 W H); output to player
12 2 Compiler-Aware Language Design boost productivity, enable faster development and rapid prototyping programmability enable parallel execution target multicores, clusters, tiled architectures, DSPs, graphics processors,
13 3 Common Machine Languages Unicores: Common Properties Single flow of control Single memory image Multicores: Common Properties Multiple flows of control Multiple local memories Differences: Register File ISA Register Allocation Instruction Selection Instruction Scheduling Functional Units von-neumann languages represent the common properties and abstract away the differences Differences: Number and capabilities of cores Communication Model Synchronization Model
14 4 Bridging the Abstraction layers StreamIt exposes the data movement Graph structure is architecture independent StreamIt exposes the parallelism Explicit task parallelism Implicit but inherent data and pipeline parallelism Each multicore is different in granularity and topology Communication is exposed to the compiler The compiler needs to efficiently bridge the abstraction Map the computation and communication pattern of the program to the cores, memory and the communication substrate
15 Types of Parallelism Task Parallelism (traditionally thread fork/join) Parallelism explicit in algorithm Between filters without producer/consumer relationship Scatter Gather Data Parallelism Peel iterations of filter, place within scatter/gather pair (fission) parallelize filters with state Pipeline Parallelism Between producers and consumers Stateful filters can be parallelized 5 Task
16 Types of Parallelism Scatter Data Parallel Gather Task Parallelism (traditionally thread fork/join) Parallelism explicit in algorithm Between filters without producer/consumer relationship Pipeline Scatter Data Parallelism (traditionally data parallel loops) Between iterations of a stateless filter Place within scatter/gather pair (fission) Can t parallelize filters with state Data Gather Pipeline Parallelism (traditionally in hardware) Between producers and consumers Statefull filters can be parallelized 6 Task
17 7 Problem Statement Given: Find: Stream graph with compute and communication estimate for each filter Computation and communication resources of the target machine Schedule of execution for the filters that best utilizes the available parallelism to fit the machine resources
18 8 Baseline : Task Parallelism BandPass BandPass Inherent task parallelism between two processing pipelines Compress Expand BandStop Compress Expand BandStop Task Parallel Model: Only parallelize explicit task parallelism Fork/join parallelism Execute this on a 2 core machine ~2x speedup over single core Adder What about 4, 6, 024, cores?
19 ChannelVocoder DCT DES FFT Filterbank FMRadio Serpent TDE MPEG2Decoder Vocoder Radar Geometric Mean BitonicSort Evaluation: Task Parallelism Raw Microprocessor 6 inorder, single-issue cores with D$ and I$ 6 memory banks, each bank with DMA Cycle accurate simulator Parallelism: Not matched to target! Synchronization: Not matched to target! Throughput Normalized to Single Core StreamIt
20 20 Baseline 2: Fine-Grained Data Parallelism BandPass Compress Expand BandStop Adder BandStop BandPass Compress Expand BandStop Each of the filters in the example are stateless Fine-grained Data Parallel Model: Fiss each stateless filter N ways (N is number of cores) Remove scatter/gather if possible We can introduce data parallelism Example: 4 cores Each fission group occupies entire machine
21 Evaluation: Fine-Grained Data Parallelism Task Fine-Grained Data Good Parallelism! Too Much Synchronization! ChannelVocoder DCT DES FFT Filterbank FMRadio Serpent TDE MPEG2Decoder Vocoder Radar Geometric Mean BitonicSort Throughput Normalized to Single Core StreamIt
22 22 Phase : Coarsen the Stream Graph BandPass Peek BandPass Peek Before data-parallelism is exploited Compress Expand BandStop Peek Compress Expand BandStop Peek Fuse stateless pipelines as much as possible without introducing state Don t fuse stateless with stateful Don t fuse a peeking filter with anything upstream Adder
23 Phase : Coarsen the Stream Graph 23 BandPass Compress Expand BandStop Adder BandPass Compress Expand BandStop Before data-parallelism is exploited Fuse stateless pipelines as much as possible without introducing state Don t fuse stateless with stateful Don t fuse a peeking filter with anything upstream Benefits: Reduces global communication and synchronization Exposes inter-node optimization opportunities
24 24 Phase 2: Data Parallelize Data Parallelize for 4 cores BandPass Compress Expand BandPass Compress Expand BandStop BandStop Adder Adder Adder Fiss 4 ways, to occupy entire chip
25 25 Phase 2: Data Parallelize Data Parallelize for 4 cores BandPass BandPass Compress Compress Expand Expand BandPass BandPass Compress Compress Expand Expand Task parallelism! Each fused filter does equal work Fiss each filter 2 times to occupy entire chip BandStop BandStop Adder Adder Adder
26 26 Phase 2: Data Parallelize BandPass BandPass Compress Compress Expand Expand BandPass BandPass Compress Compress Expand Expand Data Parallelize for 4 cores Task-conscious data parallelization Preserve task parallelism Benefits: Reduces global communication and synchronization BandStop BandStop BandStop BandStop Task parallelism, each filter does equal work Fiss each filter 2 times to occupy entire chip Adder Adder Adder
27 Evaluation: Coarse-Grained Data Parallelism Task Fine-Grained Data Coarse-Grained Task + Data Good Parallelism! Low Synchronization! ChannelVocoder DCT DES FFT Filterbank FMRadio Serpent TDE MPEG2Decoder Vocoder Radar Geometric Mean BitonicSort Throughput Normalized to Single Core StreamIt
28 28 Target a 4 core machine Simplified Vocoder 6 AdaptDFT AdaptDFT 6 RectPolar 20 Data Parallel 2 UnWrap Unwrap 2 Diff Diff Amplify Amplify Data Parallel, but too little work! Accum Accum PolarRect 20 Data Parallel
29 29 Target a 4 core machine Data Parallelize 6 AdaptDFT AdaptDFT 6 RectPolar RectPolar RectPolar RectPolar UnWrap Unwrap 2 Diff Diff Amplify Amplify Accum Accum RectPolar RectPolar PolarRect RectPolar 20 5
30 30 Target 4 core machine Data + Task Parallel Execution Cores 2 2 Time 2 RectPolar 5
31 3 Target 4 core machine We Can Do Better! Cores 2 2 Time 6 RectPolar 5
32 Phase 3: Coarse-Grained Software Pipelining Prologue New Steady State RectPolar RectPolar New steady-state is free of dependencies Schedule new steady-state using a greedy partitioning 32 RectPolar RectPolar
33 33 Target 4 core machine Greedy Partitioning To Schedule: Cores Time 6
34 BitonicSort ChannelVocoder DCT DES FFT Filterbank FMRadio Serpent TDE MPEG2Decoder Vocoder Radar Geometric Mean Evaluation: Coarse-Grained Task + Data + Software Pipelining Task Fine-Grained Data Coarse-Grained Task + Data Coarse-Grained Task + Data + Software Pipeline Best Parallelism! Lowest Synchronization! Throughput Normalized to Single Core StreamIt
35 Next: Scalable Stream Representation Data parallelism Pipeline parallelism 4 tiles 6 tiles 64 tiles
36 36 Conclusions Computer Architecture is at a cross roads Once in a lifetime opportunity to redesign from scratch How to use the Moore s law gains to improve the programmability? Switching to multicores without losing the gains in programmer productivity may be the Grandest of the Grand Challenges Half a century of work still no winning solution Will affect everyone! Streaming programming model Can break the von Neumann bottleneck A natural fit for a large class of applications An ideal machine language for multicores. Compiler can extract explicit and inherent parallelism Parallelism is abstracted away from architectural details of multicores Sustainable Speedups (5x to 9x on the 6 core Raw) Increased abstraction does not have to sacrifice performance
Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs
Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Michael Gordon, William Thies, and Saman Amarasinghe Massachusetts Institute of Technology ASPLOS October 2006 San Jose,
More informationLanguage and Compiler
Language and Compiler Support for Stream Programs Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Thesis Defense September 11, 2008 Date: Wed, 17
More informationStreamIt: High-Level Stream Programming on Raw
StreamIt: High-Level Stream Programming on Raw Michael Gordon, Michal Karczmarek, Andrew Lamb, Jasper Lin, David Maze, William Thies, and Saman Amarasinghe March 6, 2003 The StreamIt Language Why use the
More informationMIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:
MIT OpenCourseWare http://ocw.mit.edu 6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation format: Rodric Rabbah, 6.189 Multicore Programming Primer, January (IAP) 2007.
More informationAPPLICATIONS OF DSP OBJECTIVES
APPLICATIONS OF DSP OBJECTIVES This lecture will discuss the following: Introduce analog and digital waveform coding Introduce Pulse Coded Modulation Consider speech-coding principles Introduce the channel
More informationA High Definition Motion JPEG Encoder Based on Epuma Platform
Available online at www.sciencedirect.com Procedia Engineering 29 (2012) 2371 2375 2012 International Workshop on Information and Electronics Engineering (IWIEE) A High Definition Motion JPEG Encoder Based
More informationHybrid Coding (JPEG) Image Color Transform Preparation
Hybrid Coding (JPEG) 5/31/2007 Kompressionsverfahren: JPEG 1 Image Color Transform Preparation Example 4: 2: 2 YUV, 4: 1: 1 YUV, and YUV9 Coding Luminance (Y): brightness sampling frequency 13.5 MHz Chrominance
More informationRecent Advances in Simulation Techniques and Tools
Recent Advances in Simulation Techniques and Tools Yuyang Li, li.yuyang(at)wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download Abstract: Simulation refers to using specified kind
More informationEE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004
EE 382C EMBEDDED SOFTWARE SYSTEMS Literature Survey Report Characterization of Embedded Workloads Ajay Joshi March 30, 2004 ABSTRACT Security applications are a class of emerging workloads that will play
More informationArchitecting Systems of the Future, page 1
Architecting Systems of the Future featuring Eric Werner interviewed by Suzanne Miller ---------------------------------------------------------------------------------------------Suzanne Miller: Welcome
More informationA Complete Real-Time a Baseband Receiver Implemented on an Array of Programmable Processors
A Complete Real-Time 802.11a Baseband Receiver Implemented on an Array of Programmable Processors ACSSC 2008 Pacific Grove, CA Anh Tran, Dean Truong and Bevan Baas VLSI Computation Lab, ECE Department,
More informationParallelism Across the Curriculum
Parallelism Across the Curriculum John E. Howland Department of Computer Science Trinity University One Trinity Place San Antonio, Texas 78212-7200 Voice: (210) 999-7364 Fax: (210) 999-7477 E-mail: jhowland@trinity.edu
More informationDesign and Implementation of Signal Processing Systems: An Introduction
Design and Implementation of Signal Processing Systems: An Introduction Yu Hen Hu (c) 1997-2013 by Yu Hen Hu 1 Outline Course Objectives and Outline, Conduct What is signal processing? Implementation Options
More informationCosimulating Synchronous DSP Applications with Analog RF Circuits
Presented at the Thirty-Second Annual Asilomar Conference on Signals, Systems, and Computers - November 1998 Cosimulating Synchronous DSP Applications with Analog RF Circuits José Luis Pino and Khalil
More informationHigh Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the
High Performance Computing Systems and Scalable Networks for Information Technology Joint White Paper from the Department of Computer Science and the Department of Electrical and Computer Engineering With
More informationComputer Science as a Discipline
Computer Science as a Discipline 1 Computer Science some people argue that computer science is not a science in the same sense that biology and chemistry are the interdisciplinary nature of computer science
More informationCOTSon: Infrastructure for system-level simulation
COTSon: Infrastructure for system-level simulation Ayose Falcón, Paolo Faraboschi, Daniel Ortega HP Labs Exascale Computing Lab http://sites.google.com/site/hplabscotson MICRO-41 tutorial November 9, 28
More informationBricken Technologies Corporation Presentations: Bricken Technologies Corporation Corporate: Bricken Technologies Corporation Marketing:
TECHNICAL REPORTS William Bricken compiled 2004 Bricken Technologies Corporation Presentations: 2004: Synthesis Applications of Boundary Logic 2004: BTC Board of Directors Technical Review (quarterly)
More informationSOFTWARE IMPLEMENTATION OF THE
SOFTWARE IMPLEMENTATION OF THE IEEE 802.11A/P PHYSICAL LAYER SDR`12 WInnComm Europe 27 29 June, 2012 Brussels, Belgium T. Cupaiuolo, D. Lo Iacono, M. Siti and M. Odoni Advanced System Technologies STMicroelectronics,
More informationBased with permission on lectures by John Getty Laboratory Electronics II (PHSX262) Spring 2011 Lecture 9 Page 1
Today 3// Lecture 9 Analog Digital Conversion Sampled Data Acquisition Systems Discrete Sampling and Nyquist Digital to Analog Conversion Analog to Digital Conversion Homework Study for Exam next week
More informationA Balanced Introduction to Computer Science, 3/E
A Balanced Introduction to Computer Science, 3/E David Reed, Creighton University 2011 Pearson Prentice Hall ISBN 978-0-13-216675-1 Chapter 10 Computer Science as a Discipline 1 Computer Science some people
More informationCUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads
Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA
More informationModule 6 STILL IMAGE COMPRESSION STANDARDS
Module 6 STILL IMAGE COMPRESSION STANDARDS Lesson 16 Still Image Compression Standards: JBIG and JPEG Instructional Objectives At the end of this lesson, the students should be able to: 1. Explain the
More informationWHITEPAPER MULTICORE SOFTWARE DESIGN FOR AN LTE BASE STATION
WHITEPAPER MULTICORE SOFTWARE DESIGN FOR AN LTE BASE STATION Executive summary This white paper details the results of running the parallelization features of SLX to quickly explore the HHI/ Frauenhofer
More informationSno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations
Sno Projects List IEEE 1 High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations 2 A Generalized Algorithm And Reconfigurable Architecture For Efficient And Scalable
More information7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10 CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan CSE-2021 July-12-2012 2 Single Cycle with Jump Multi-Cycle Implementation
More informationEarly Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida
Early Adopter : Multiprocessor Programming in the Undergraduate Program NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Narsingh Deo Damian Dechev Mahadevan Vasudevan Department
More informationDSP Design Lecture 1. Introduction and DSP Basics. Fredrik Edman, PhD
DSP Design Lecture 1 Introduction and DSP Basics Fredrik Edman, PhD fredrik.edman@eit.lth.se Lecturers Fredrik Edman (course responsible) Mail: fredrik.edman@eit.lth.se Room E:2538 Mojtaba Mahdavi (exercises
More informationENEE408G Multimedia Signal Processing
ENEE48G Multimedia Signal Processing Design Project on Image Processing and Digital Photography Goals:. Understand the fundamentals of digital image processing.. Learn how to enhance image quality and
More informationCHAPTER 4 FIELD PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF FIVE LEVEL CASCADED MULTILEVEL INVERTER
87 CHAPTER 4 FIELD PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF FIVE LEVEL CASCADED MULTILEVEL INVERTER 4.1 INTRODUCTION The Field Programmable Gate Array (FPGA) is a high performance data processing general
More informationAssistant Lecturer Sama S. Samaan
MP3 Not only does MPEG define how video is compressed, but it also defines a standard for compressing audio. This standard can be used to compress the audio portion of a movie (in which case the MPEG standard
More informationEnergy efficient multi-granular arithmetic in a coarse-grain reconfigurable architecture
Eindhoven University of Technology MASTER Energy efficient multi-granular arithmetic in a coarse-grain reconfigurable architecture Louwers, S.T. Award date: 216 Link to publication Disclaimer This document
More informationIEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH 2009 427 Power Management of Voltage/Frequency Island-Based Systems Using Hardware-Based Methods Puru Choudhary,
More informationEvolution of DSP Processors. Kartik Kariya EE, IIT Bombay
Evolution of DSP Processors Kartik Kariya EE, IIT Bombay Agenda Expected features of DSPs Brief overview of early DSPs Multi-issue DSPs Case Study: VLIW based Processor (SPXK5) for Mobile Applications
More informationA High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction
1514 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 8, DECEMBER 2000 A High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction Bai-Jue Shieh, Yew-San Lee,
More informationASIP Solution for Implementation of H.264 Multi Resolution Motion Estimation
Int. J. Communications, Network and System Sciences, 2010, 3, 453-461 doi:10.4236/ijcns.2010.35060 Published Online May 2010 (http://www.scirp.org/journal/ijcns/) ASIP Solution for Implementation of H.264
More informationCamera Image Processing Pipeline: Part II
Lecture 14: Camera Image Processing Pipeline: Part II Visual Computing Systems Today Finish image processing pipeline Auto-focus / auto-exposure Camera processing elements Smart phone processing elements
More informationMerging Propagation Physics, Theory and Hardware in Wireless. Ada Poon
HKUST January 3, 2007 Merging Propagation Physics, Theory and Hardware in Wireless Ada Poon University of Illinois at Urbana-Champaign Outline Multiple-antenna (MIMO) channels Human body wireless channels
More informationescience: Pulsar searching on GPUs
escience: Pulsar searching on GPUs Alessio Sclocco Ana Lucia Varbanescu Karel van der Veldt John Romein Joeri van Leeuwen Jason Hessels Rob van Nieuwpoort And many others! Netherlands escience center Science
More informationIMPLEMENTATION OF G.726 ITU-T VOCODER ON A SINGLE CHIP USING VHDL
IMPLEMENTATION OF G.726 ITU-T VOCODER ON A SINGLE CHIP USING VHDL G.Murugesan N. Ramadass Dr.J.Raja paul Perinbum School of ECE Anna University Chennai-600 025 Gm1gm@rediffmail.com ramadassn@yahoo.com
More informationChapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:
Chapter 4 The Processor Part II Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup p = 2n/(0.5n + 1.5) 4 =
More informationParallel Multiple-Symbol Variable-Length Decoding
Parallel Multiple-Symbol Variable-Length Decoding Jari Nikara, Stamatis Vassiliadis, Jarmo Takala, Mihai Sima, and Petri Liuha Institute of Digital and Computer Systems, Tampere University of Technology,
More informationDigital Signal Processing System Design: LabVIEW-Based Hybrid Programming
Digital Signal Processing System Design: LabVIEW-Based Hybrid Programming by Nasser Kehtarnavaz University of Texas at Dallas With laboratory contributions by Namjin Kim and Qingzhong Peng 1111» AMSTERDAM
More information신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일
신경망기반자동번역기술 Konkuk University Computational Intelligence Lab. http://ci.konkuk.ac.kr kikim01@kunkuk.ac.kr 김강일 Index Issues in AI and Deep Learning Overview of Machine Translation Advanced Techniques in
More informationLow-Power CMOS VLSI Design
Low-Power CMOS VLSI Design ( 范倫達 ), Ph. D. Department of Computer Science, National Chiao Tung University, Taiwan, R.O.C. Fall, 2017 ldvan@cs.nctu.edu.tw http://www.cs.nctu.tw/~ldvan/ Outline Introduction
More informationParallel Computing in the Multicore Era
Parallel Computing in the Multicore Era Mikel Lujan & Graham Riley 21 st September 2016 Combining the strengths of UMIST and The Victoria University of Manchester MSc in Advanced Computer Science Theme
More informationRamon Canal NCD Master MIRI. NCD Master MIRI 1
Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/
More informationCamera Image Processing Pipeline: Part II
Lecture 13: Camera Image Processing Pipeline: Part II Visual Computing Systems Today Finish image processing pipeline Auto-focus / auto-exposure Camera processing elements Smart phone processing elements
More informationChapter 1. Introduction
Chapter 1 Introduction Signals are used to communicate among human beings, and human beings and machines. They are used to probe the environment to uncover details of structure and state not easily observable,
More information(Theory-Practice-Lab) Credit BBM 1511 Introduction to Computer Engineering - 1 (2-0-0) 2
ARAS Brief Course Descriptions (Theory-Practice-Lab) Credit BBM 1511 Introduction to Computer Engineering - 1 (2-0-0) 2 Basic Concepts in Computer Science / Computer Systems and Peripherals / Introduction
More informationHardware Implementation of Automatic Control Systems using FPGAs
Hardware Implementation of Automatic Control Systems using FPGAs Lecturer PhD Eng. Ionel BOSTAN Lecturer PhD Eng. Florin-Marian BÎRLEANU Romania Disclaimer: This presentation tries to show the current
More informationDesign of Parallel Algorithms. Communication Algorithms
+ Design of Parallel Algorithms Communication Algorithms + Topic Overview n One-to-All Broadcast and All-to-One Reduction n All-to-All Broadcast and Reduction n All-Reduce and Prefix-Sum Operations n Scatter
More informationAnti aliasing and Graphics Formats
Anti aliasing and Graphics Formats Eric C. McCreath School of Computer Science The Australian National University ACT 0200 Australia ericm@cs.anu.edu.au Overview 2 Nyquist sampling frequency supersampling
More informationFrom the New York Times Introduction to Concurrency
From the New York Times Introduction to Concurrency dapted by CP from lectures by Maurice Herlihy at rown SN FRNCISCO, May 7. 2004 - Intel said on Friday that it was scrapping its development of two microprocessors,
More informationA Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor
A Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor Umesh 1,Mr. Suraj Rana 2 1 M.Tech Student, 2 Associate Professor (ECE) Department of Electronic and Communication Engineering
More informationUNDERSTANDING LTE WITH MATLAB
UNDERSTANDING LTE WITH MATLAB FROM MATHEMATICAL MODELING TO SIMULATION AND PROTOTYPING Dr Houman Zarrinkoub MathWorks, Massachusetts, USA WILEY Contents Preface List of Abbreviations 1 Introduction 1.1
More informationEnhancing System Architecture by Modelling the Flash Translation Layer
Enhancing System Architecture by Modelling the Flash Translation Layer Robert Sykes Sr. Dir. Firmware August 2014 OCZ Storage Solutions A Toshiba Group Company Introduction This presentation will discuss
More informationCreating Intelligence at the Edge
Creating Intelligence at the Edge Vladimir Stojanović E3S Retreat September 8, 2017 The growing importance of machine learning Page 2 Applications exploding in the cloud Huge interest to move to the edge
More informationDr. D. M. Akbar Hussain
Course Objectives: To enable the students to learn some more practical facts about DSP architectures. Objective is that they can apply this knowledge to map any digital filtering algorithm and related
More informationJanuary 11, 2017 Administrative notes
January 11, 2017 Administrative notes Clickers Updated on Canvas as of people registered yesterday night. REEF/iClicker mobile is not working for everyone. Use at your own risk. If you are having trouble
More informationECE 124 Digital Circuits and Systems Winter 2011 Introduction Calendar Description:
ECE 124 Digital Circuits and Systems Winter 2011 Introduction Calendar Description: Number systems. Switching algebra. Hardware description languages. Simplification of Boolean functions. Combinational
More informationECEn 487 Digital Signal Processing Laboratory. Lab 3 FFT-based Spectrum Analyzer
ECEn 487 Digital Signal Processing Laboratory Lab 3 FFT-based Spectrum Analyzer Due Dates This is a three week lab. All TA check off must be completed by Friday, March 14, at 3 PM or the lab will be marked
More informationEE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling
EE241 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolic Lecture 15 Low-Power Design: Supply Voltage Scaling Announcements Homework #2 due today Midterm project reports due next Thursday
More informationAn Energy Scalable Computational Array for Energy Harvesting Sensor Signal Processing. Rajeevan Amirtharajah University of California, Davis
An Energy Scalable Computational Array for Energy Harvesting Sensor Signal Processing Rajeevan Amirtharajah University of California, Davis Energy Scavenging Wireless Sensor Extend sensor node lifetime
More informationCompiler Optimisation
Compiler Optimisation 6 Instruction Scheduling Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2018 Introduction This
More informationMulti-core Platforms for
20 JUNE 2011 Multi-core Platforms for Immersive-Audio Applications Course: Advanced Computer Architectures Teacher: Prof. Cristina Silvano Student: Silvio La Blasca 771338 Introduction on Immersive-Audio
More informationChallenges in Transition
Challenges in Transition Keynote talk at International Workshop on Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC 2016) 1 Kazuaki Ishizaki IBM Research Tokyo kiszk@acm.org
More informationdeepening of the professional skills
Transporta un sakaru institūts New specialization in electronics - deepening of the professional skills Explore the world of opportunities! The disadvantage of existing Master s study programs Proposal
More informationLIST 04 Submission Date: 04/05/2017; Cut-off: 14/05/2017. Part 1 Theory. Figure 1: horizontal profile of the R, G and B components.
Universidade de Brasília (UnB) Faculdade de Tecnologia (FT) Departamento de Engenharia Elétrica (ENE) Course: Image Processing Prof. Mylène C.Q. de Farias Semester: 2017.1 LIST 04 Submission Date: 04/05/2017;
More informationAdvances in Parallel Discrete Event Simulation for Electronic System-Level Design
Advances in Parallel Discrete Event Simulation for Electronic System-Level Design Weiwei Chen, Xu Han, Che-Wei Chang, and Rainer Dömer University of California Editors notes: The authors target the speeding
More informationCSCI 445 Laurent Itti. Group Robotics. Introduction to Robotics L. Itti & M. J. Mataric 1
Introduction to Robotics CSCI 445 Laurent Itti Group Robotics Introduction to Robotics L. Itti & M. J. Mataric 1 Today s Lecture Outline Defining group behavior Why group behavior is useful Why group behavior
More informationPoC #1 On-chip frequency generation
1 PoC #1 On-chip frequency generation This PoC covers the full on-chip frequency generation system including transport of signals to receiving blocks. 5G frequency bands around 30 GHz as well as 60 GHz
More informationIntroduction to Real-Time Digital Signal Processing
Real-Time Digital Signal Processing. Sen M Kuo, Bob H Lee Copyright # 2001 John Wiley & Sons Ltd ISBNs: 0-470-84137-0 Hardback); 0-470-84534-1 Electronic) 1 Introduction to Real-Time Digital Signal Processing
More informationUNIT-III LIFE-CYCLE PHASES
INTRODUCTION: UNIT-III LIFE-CYCLE PHASES - If there is a well defined separation between research and development activities and production activities then the software is said to be in successful development
More informationCG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003
CG40 Advanced Dr Stuart Lawson Room A330 Tel: 23780 e-mail: ssl@eng.warwick.ac.uk 03 January 2003 Lecture : Overview INTRODUCTION What is a signal? An information-bearing quantity. Examples of -D and 2-D
More informationSound Synthesis Methods
Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like
More informationGPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links
DLR.de Chart 1 GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links Chen Tang chen.tang@dlr.de Institute of Communication and Navigation German Aerospace Center DLR.de Chart
More informationStatement of Research Weiwei Chen
Statement of Research Weiwei Chen Embedded computer systems are ubiquitous and pervasive in our modern society with a wide application domain, such as automotive and avionic systems, electronic medical
More information2002 IEEE International Solid-State Circuits Conference 2002 IEEE
Outline 802.11a Overview Medium Access Control Design Baseband Transmitter Design Baseband Receiver Design Chip Details What is 802.11a? IEEE standard approved in September, 1999 12 20MHz channels at 5.15-5.35
More informationProgramming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Labs CDT 102
Programming and Optimization with Intel Xeon Phi Coprocessors Colfax Developer Training One-day Labs CDT 102 Abstract: Colfax Developer Training (CDT) is an in-depth intensive course on efficient parallel
More informationComputer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks
Advanced Computer Architecture Spring 2010 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture Outline Instruction-Level Parallelism Scoreboarding (A.8) Instruction Level Parallelism
More informationA HIGH PERFORMANCE HARDWARE ARCHITECTURE FOR HALF-PIXEL ACCURATE H.264 MOTION ESTIMATION
A HIGH PERFORMANCE HARDWARE ARCHITECTURE FOR HALF-PIXEL ACCURATE H.264 MOTION ESTIMATION Sinan Yalcin and Ilker Hamzaoglu Faculty of Engineering and Natural Sciences, Sabanci University, 34956, Tuzla,
More informationSIMULATION AND PROGRAM REALIZATION OF RECURSIVE DIGITAL FILTERS
SIMULATION AND PROGRAM REALIZATION OF RECURSIVE DIGITAL FILTERS Stela Angelova Stefanova, Radostina Stefanova Gercheva Technology School Electronic System associated to the Technical University of Sofia,
More informationOverview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture
Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of
More informationDr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system
Name: Affiliation: Field of research: Specific Field of Study: Proposed Research Topic: Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar Information Science and Technology Computer Science
More informationEE382V: Embedded System Design and Modeling
EE382V: Embedded System Design and System-Level Design Tools Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu : Outline Overview System-level design
More informationParallel Computing in the Multicore Era
Parallel Computing in the Multicore Era Prof. John Gurd 18 th September 2014 Combining the strengths of UMIST and The Victoria University of Manchester MSc in Advanced Computer Science Theme on Routine
More informationJim Waldo, Sun Microsystems Laboratories SCALING. in games & virtual worlds. 10 November/December 2008 ACM QUEUE rants:
Jim Waldo, Sun Microsystems Laboratories SCALING 10 November/December 2008 ACM QUEUE rants: feedback@acmqueue.com Q GAME FOCUS DEVELOPMENT ONLINE GAMES AND VIRTUAL WORLDS HAVE FAMILIAR SCALING REQUIREMENTS,
More informationB.E, Electronics and Telecommunication, Vishwatmak Om Gurudev College of Engineering, Aghai, Maharashtra, India
2018 IJSRSET Volume 4 Issue 1 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section : Engineering and Technology Implementation of Various JPEG Algorithm for Image Compression Swanand Labad 1, Vaibhav
More informationOverview of Signal Processing
Overview of Signal Processing Chapter Intended Learning Outcomes: (i) Understand basic terminology in signal processing (ii) Differentiate digital signal processing and analog signal processing (iii) Describe
More informationArchitecture ISCA 16 Luis Ceze, Tom Wenisch
Architecture 2030 @ ISCA 16 Luis Ceze, Tom Wenisch Mark Hill (CCC liaison, mentor) LIVE! Neha Agarwal, Amrita Mazumdar, Aasheesh Kolli (Student volunteers) Context Many fantastic community formation/visioning
More informationJournal of Engineering Science and Technology Review 9 (5) (2016) Research Article. L. Pyrgas, A. Kalantzopoulos* and E. Zigouris.
Jestr Journal of Engineering Science and Technology Review 9 (5) (2016) 51-55 Research Article Design and Implementation of an Open Image Processing System based on NIOS II and Altera DE2-70 Board L. Pyrgas,
More informationChapter 9 Image Compression Standards
Chapter 9 Image Compression Standards 9.1 The JPEG Standard 9.2 The JPEG2000 Standard 9.3 The JPEG-LS Standard 1IT342 Image Compression Standards The image standard specifies the codec, which defines how
More informationImage processing. Case Study. 2-diemensional Image Convolution. From a hardware perspective. Often massively yparallel.
Case Study Image Processing Image processing From a hardware perspective Often massively yparallel Can be used to increase throughput Memory intensive Storage size Memory bandwidth -diemensional Image
More informationWireless Communication Systems: Implementation perspective
Wireless Communication Systems: Implementation perspective Course aims To provide an introduction to wireless communications models with an emphasis on real-life systems To investigate a major wireless
More informationChapter 12. Cross-Layer Optimization for Multi- Hop Cognitive Radio Networks
Chapter 12 Cross-Layer Optimization for Multi- Hop Cognitive Radio Networks 1 Outline CR network (CRN) properties Mathematical models at multiple layers Case study 2 Traditional Radio vs CR Traditional
More informationAN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION
AN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION K.Mahesh #1, M.Pushpalatha *2 #1 M.Phil.,(Scholar), Padmavani Arts and Science College. *2 Assistant Professor, Padmavani Arts
More informationSPIRO SOLUTIONS PVT LTD
VLSI S.NO PROJECT CODE TITLE YEAR ANALOG AMS(TANNER EDA) 01 ITVL01 20-Mb/s GFSK Modulator Based on 3.6-GHz Hybrid PLL With 3-b DCO Nonlinearity Calibration and Independent Delay Mismatch Control 02 ITVL02
More informationDesign and Implementation of Orthogonal Frequency Division Multiplexing (OFDM) Signaling
Design and Implementation of Orthogonal Frequency Division Multiplexing (OFDM) Signaling Research Project Description Study by: Alan C. Brooks Stephen J. Hoelzer Department: Electrical and Computer Engineering
More informationExperience Report on Developing a Software Communications Architecture (SCA) Core Framework. OMG SBC Workshop Arlington, Va.
Communication, Navigation, Identification and Reconnaissance Experience Report on Developing a Software Communications Architecture (SCA) Core Framework OMG SBC Workshop Arlington, Va. September, 2004
More information