Stencil Pattern. CS 472 Concurrent & Parallel Programming University of Evansville

Similar documents
Collectives Pattern CS 472 Concurrent & Parallel Programming University of Evansville

The Evolution of Waveform Relaxation for Circuit and Electromagnetic Solvers

High Performance Low-Power Signed Multiplier

Elliptic Partial Differential Equations

CHAPTER 1 INTRODUCTION

Convolutional Networks Overview

Fast sweeping methods and applications to traveltime tomography

Chapter 8 Traffic Channel Allocation

LabVIEW 8" Student Edition

A PageRank Algorithm based on Asynchronous Gauss-Seidel Iterations

CSCD 433 Network Programming Fall Lecture 5 Physical Layer Continued

Gauss and AGM. Burton Rosenberg. January 30, 2004

Digital Integrated CircuitDesign

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

The Use of Non-Local Means to Reduce Image Noise

Francis J. O'Brien, Jr Chung T. Neuven NOTICE

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

CSCD 433 Network Programming Fall Lecture 5 Physical Layer Continued

Math 259 Winter Recitation Handout 6: Limits in Two Dimensions

CHAPTER 5 DESIGN OF COMBINATIONAL LOGIC CIRCUITS IN QCA

CSE 312 Midterm Exam May 7, 2014

GRADE 3 TEKS ALIGNMENT CHART

Grade 6 Math Circles November 15 th /16 th. Arithmetic Tricks

CS256 Applied Theory of Computation

Graphs and Network Flows IE411. Lecture 14. Dr. Ted Ralphs

using dc inputs. You will verify circuit operation with a multimeter.

K-PREP. Kentucky Performance Rating For Educational Progress

Mathematics Grade 2. grade 2 17

Systolic modular VLSI Architecture for Multi-Model Neural Network Implementation +

Design and Analysis of Adders using Nanotechnology Based Quantum dot Cellular Automata

Estimating the Transmission Probability in Wireless Networks with Configuration Models

The light microscope

Antenna Theory and Design

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

Midterm Examination CS 534: Computational Photography

Performance Analysis of Multipliers in VLSI Design

Distance-Vector Routing

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

Chapter 4 Heuristics & Local Search

Reduction of Mutual Coupling in Closely Spaced Strip Dipole Antennas with Elliptical Metasurfaces. Hossein M. Bernety and Alexander B.

GRAY: a quasi-optical beam tracing code for Electron Cyclotron absorption and current drive. Daniela Farina

Distributed Control of LED Array for Architectural and Signage Lighting

Math 122 Rough Guide to Studying for Exam 2 Spring, 2009

Math 3012 Applied Combinatorics Lecture 2

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Spectacle lens design following Hamilton, Maxwell and Keller

Y8 & Y9 Number Starters A Spire Maths Activity

Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms. Armein Z. R. Langi

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

VLSI Timing Simulation with Selective Dynamic Regionization

B 2 3 = 4 B 2 = 7 B = 14

MATH 259 FINAL EXAM. Friday, May 8, Alexandra Oleksii Reshma Stephen William Klimova Mostovyi Ramadurai Russel Boney A C D G H B F E

will talk about Carry Look Ahead adder for speed improvement of multi-bit adder. Also, some people call it CLA Carry Look Ahead adder.

Comparative Study and Analysis of Performances among RNS, DBNS, TBNS and MNS for DSP Applications

Contribution to the Smecy Project

16nm with 193nm Immersion Lithography and Double Exposure

The analysis of microstrip antennas using the FDTD method

Study on the UWB Rader Synchronization Technology

MATH CIRCLE, 10/13/2018

The proposal should be accepted as part of PHY standard for BWA.

EFFECTS OF PHASE AND AMPLITUDE ERRORS ON QAM SYSTEMS WITH ERROR- CONTROL CODING AND SOFT DECISION DECODING

Distributed Collaborative Path Planning in Sensor Networks with Multiple Mobile Sensor Nodes

Video 8: 2 Point Perspective

FPGA IMPLEMENATION OF HIGH SPEED AND LOW POWER CARRY SAVE ADDER

Example Enemy agents are trying to invent a new type of cipher. They decide on the following encryption scheme: Plaintext converts to Ciphertext

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

Demosaicing Algorithms

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

Transmitting Antenna Modules Refusals Reaction on Wireless Power Transmission efficiency

High-Throughput VLSI Implementations of Iterative Decoders and Related Code Construction Problems

Mathematics of Magic Squares and Sudoku

Combinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #14

Upper Primary Division Round 2. Time: 120 minutes

A New Noise Parameter Measurement Method Results in More than 100x Speed Improvement and Enhanced Measurement Accuracy

Coherent noise attenuation: A synthetic and field example

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Math 3338: Probability (Fall 2006)

Victory Probability in the Fire Emblem Arena

Blink. EE 285 Arduino 1

CMPSCI 240: Reasoning Under Uncertainty First Midterm Exam

MITOCW Project: Battery simulation MIT Multicore Programming Primer, IAP 2007

Physics 1C. Lecture 24A. Finish Chapter 27: X-ray diffraction Start Chapter 24: EM waves. Average Quiz score = 6.8 out of 10.

Fast Placement Optimization of Power Supply Pads

Microsoft Excel Illustrated Unit B: Working with Formulas and Functions

Chapter 9: Localization & Positioning

CS 758/858: Algorithms

Analog to Digital Conversion

THE EIKONAL EQUATION: NUMERICAL EFFICIENCY VS. ALGORITHMIC COMPLEXITY ON QUADRILATERAL GRIDS. 1. Introduction. The Eikonal equation, defined by (1)

What is an image? Images and Displays. Representative display technologies. An image is:

NSCAS - Math Table of Specifications

A Practical Channel Estimation Scheme for Indoor 60GHz Massive MIMO System. Arumugam Nallanathan King s College London

Total Hours Registration through Website or for further details please visit (Refer Upcoming Events Section)

Planning and Optimization

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY of PENNSYLVANIA CIS 391/521: Fundamentals of AI Midterm 1, Spring 2010

SCATTERING POLARIMETRY PART 1. Dr. A. Bhattacharya (Slide courtesy Prof. E. Pottier and Prof. L. Ferro-Famil)

Topology Control. Chapter 3. Ad Hoc and Sensor Networks. Roger Wattenhofer 3/1

AMTS STANDARD WORKSHOP PRACTICE. Bond Design

Optimal design of a linear antenna array using particle swarm optimization

Transcription:

Stencil Pattern CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 41/51 Introduction to Parallel Computing Department of Computer and Information Science, University of Oregon http://ipcc.cs.uoregon.edu/curriculum.html

Recall: Partitioning 1D 2D Data is divided into non-overlapping regions (avoid write conflicts, race conditions) equal-sized regions (improve load balancing) 2

Recall: Partitioning 1D 2D Data is divided into non-overlapping regions (avoid write conflicts, race conditions) equal-sized regions (improve load balancing) 3D 3

Stencil Pattern A stencil pattern is a map where each output depends on a neighborhood of inputs These inputs are a set of fixed offsets relative to the output position A stencil output is a function of a neighborhood of elements in an input collection Applies the stencil to select the inputs Data access patterns of stencils are regular Stencil is the shape of neighborhood Stencil remains the same 4

Serial Stencil Example (part 1) 5

Serial Stencil Example (part 2) How would we parallelize this? 6

What is the stencil pattern? 7

What is the stencil pattern? Input array 8

What is the stencil pattern? Function 9

What is the stencil pattern? Output Array 1

What is the stencil pattern? neighborhood i i-1 i+1 This stencil has 3 elements in the neighborhood: i-1, i, i+1 11

What is the stencil pattern? neighborhood i i-1 i+1 Applies some function to them 12

What is the stencil pattern? And outputs to the i th position of the output array 13

Stencil Patterns Stencils can operate on one dimensional and multidimensional data Stencil neighborhoods can range from compact to sparse, square to cube, and anything else! It is the pattern of the stencil that determines how the stencil operates in an application 14

2-Dimensional Stencils 4-point stencil Center cell (P) is not used 5-point stencil Center cell (P) is used as well 9-point stencil Center cell (C) is used as well Source: http://en.wikipedia.org/wiki/stencil_code 15

3-Dimensional Stencils 6-point stencil (7-point stencil) Source: http://en.wikipedia.org/wiki/stencil_code 24-point stencil (25-point stencil) 16

Stencil Example A Here is our array, A 9 7 6 4 17

Stencil Example Here is our array A B is the output array Initialize to all Apply a stencil operation to the inner square of the form: B(i,j) = avg( A(i,j), A(i-1,j), A(i+1,j), A(i,j-1), A(i,j+1) ) A 9 7 6 4 What is the stencil? 18

Stencil Pattern Procedure A 1) Average all blue squares 9 7 6 4 19

Stencil Pattern Procedure B 1) Average all blue squares 2) Store result in B 4.4 2

Stencil Pattern Procedure A 1) Average all blue squares 2) Store result in B 3) Repeat 1 and 2 for all green squares 9 7 6 4 21

Practice! A B 9 7 6 4 22

Stencil Pattern Practice A B 9 7 4.4 6 4 23

Stencil Pattern Practice A B 9 7 4.4 4. 6 4 24

Stencil Pattern Practice A B 9 7 4.4 4. 6 4 3.8 25

Stencil Pattern Practice A B 9 7 4.4 4. 6 4 3.8 3.4 26

Serial Stencil Example (part 1) 27

Serial Stencil Example (part 2) How would we parallelize this? a[i] Updates occur in place!!! 28

Stencil Pattern with In Place Update 29

Stencil Pattern with In Place Update Input array 3

Stencil Pattern with In Place Update Function 31

Stencil Pattern with In Place Update Input Array!!! 32

Stencil Example A Here is our array, A 9 7 6 4 33

Stencil Example A Here is our array A Update A in place Apply a stencil operation to the inner square of the form: 9 7 A(i,j) = avg( A(i,j), A(i-1,j), A(i+1,j), A(i,j-1), A(i,j+1) ) 6 4 What is the stencil? 34

Stencil Pattern Procedure 1) Average all blue squares 9 7 6 4 35

Stencil Pattern Procedure 1) Average all blue squares 2) Store result in red square 4.4 7 6 4 36

Stencil Pattern Procedure 1) Average all blue squares 2) Store result in red square 3) Repeat 1 and 2 for all green squares 9 7 6 4 37

Practice! A B 9 7 6 4 38

Stencil Pattern Practice A 9 7 6 4 39

Stencil Pattern Practice A 4.4 7 6 4 4

What is the stencil pattern? A 4.4 7 6 4 41

What is the stencil pattern? A 4.4 3.8 6 4 42

What is the stencil pattern? A 4.4 3.8 6 4 43

What is the stencil pattern? A 4.4 3.8 2.88 4 44

What is the stencil pattern? A 4.4 3.8 2.88 4 45

What is the stencil pattern? A 4.4 3.8 2.88 1.992 46

Different Cases Input Output Separate output array 9 6 7 4 4.4 3.8 4. 3.4 Input Output Updates occur in place 9 6 7 4 4.4 2.88 3.8 1.992 47

Which is correct? Input Output 9 7 4.4 3.8 6 4 2.88 1.992 Is this output incorrect? 48

Iterative Codes Iterative codes are ones that update their data in steps At each step, a new value of an element is computed using a formula based on other elements Once all elements are updated, the computation proceeds to the next step or completes Iterative codes are most commonly found in computer simulations of physical systems for scientific and engineering applications Computational fluid dynamics Electromagnetics modeling They are often applied to solve partial differential equations Jacobi iteration Gauss-Seidel iteration Successive over relaxation 49

Iterative Codes and Stencils Stencils essentially define which elements are used in the update formula Because the data is organized in a regular manner, stencils can be applied across the data uniformly 5

Simple 2D Example Consider the following code for k=1, 1 for i=1, N-2 for j = 1, N-2 a[i][j] =.25 * (a[i][j] } } } + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]) 5-point stencil Do you see anything interesting? How would you parallelize? 51

2-Dimension Jacobi Iteration Initialize each array element to some value At each step, update each array element to the arithmetic mean of its N, S, E, W neighbors Iterate until array values converge Here we are using a 4-point stencil 4-point stencil It is different from before because we want to update all array elements simultaneously How? 52

2-Dimension Jacobi Iteration hot Consider a 2D array of elements Initialize each array element to some value At each step, update each array element to the arithmetic mean of its N, S, E, W neighbors Iterate until array values converge cold Heat equation simulation 4-point stencil Step Step 2 Step 4 Step 6 Step 8 Step 1 53