Collectives Pattern CS 472 Concurrent & Parallel Programming University of Evansville

Similar documents
Collectives Pattern. Parallel Computing CIS 410/510 Department of Computer and Information Science. Lecture 8 Collective Pattern

Stencil Pattern. CS 472 Concurrent & Parallel Programming University of Evansville

CS101 Lecture 01: Introduction. What You ll Learn Today

Digital Integrated CircuitDesign

CHAPTER 1 INTRODUCTION

Decision Mathematics practice paper

Lectures: Feb 27 + Mar 1 + Mar 3, 2017

Class Project: Low power Design of Electronic Circuits (ELEC 6970) 1

CSS 343 Data Structures, Algorithms, and Discrete Math II. Balanced Search Trees. Yusuf Pisan

Design of Parallel Algorithms. Communication Algorithms

CS1800: More Counting. Professor Kevin Gold

CSc 110, Spring Lecture 40: Sorting Adapted from slides by Marty Stepp and Stuart Reges

DATA STRUCTURES USING C

Analysis of Workflow Graphs through SESE Decomposition

Divide & conquer. Which works better for multi-cores: insertion sort or merge sort? Why?

Previous Lecture. How can computation sort data faster for you? Sorting Algorithms: Speed Comparison. Recursive Algorithms 10/31/11

Homework Assignment #1

SOME MORE DECREASE AND CONQUER ALGORITHMS

Animation Demos. Shows time complexities on best, worst and average case.

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Chapter 4: Patterns and Relationships

CS 758/858: Algorithms

Programming Abstractions

What is an image? Images and Displays. Representative display technologies. An image is:

Animation Demos. Shows time complexities on best, worst and average case.

GENERALIZATION: RANK ORDER FILTERS

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

Outline. In One Slide. LR Parsing. LR Parsing. No Stopping The Parsing! Bottom-Up Parsing. LR(1) Parsing Tables #2

MITOCW 7. Counting Sort, Radix Sort, Lower Bounds for Sorting

CSE373: Data Structure & Algorithms Lecture 23: More Sorting and Other Classes of Algorithms. Nicki Dell Spring 2014

Bibliography. S. Gill Williamson

CHAPTER 5 IMPLEMENTATION OF MULTIPLIERS USING VEDIC MATHEMATICS

SORTING BY REVERSALS. based on chapter 7 of Setubal, Meidanis: Introduction to Computational molecular biology

Tiling Problems. This document supersedes the earlier notes posted about the tiling problem. 1 An Undecidable Problem about Tilings of the Plane

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Algorithms and Data Structures CS 372. The Sorting Problem. Insertion Sort - Summary. Merge Sort. Input: Output:

Unit 12: Artificial Intelligence CS 101, Fall 2018

A Lower Bound for Comparison Sort

Lecture 12: Divide and Conquer Algorithms. Divide and Conquer Algorithms

Issue 1 June 1987 MERLIN II. COMMUNICATIONS SYTEM System Manual

Lecture 20: Combinatorial Search (1997) Steven Skiena. skiena

Electrical Engineering 40 Introduction to Microelectronic Circuits

Combinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #14

Computer Graphics Si Lu Fall /25/2017

In this lecture, we will look at how different electronic modules communicate with each other. We will consider the following topics:

CS 540-2: Introduction to Artificial Intelligence Homework Assignment #2. Assigned: Monday, February 6 Due: Saturday, February 18

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

Mathematics Competition Practice Session 6. Hagerstown Community College: STEM Club November 20, :00 pm - 1:00 pm STC-170

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Computing Layers

Link State Routing. Brad Karp UCL Computer Science. CS 3035/GZ01 3 rd December 2013

The Mathematica Journal A Generator of Rook Polynomials

Part 1. Using LabVIEW to Measure Current

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

Introduction to. Algorithms. Lecture 10. Prof. Constantinos Daskalakis CLRS

Each individual is to report on the design, simulations, construction, and testing according to the reporting guidelines attached.

Computer Graphics (CS/ECE 545) Lecture 7: Morphology (Part 2) & Regions in Binary Images (Part 1)

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 6 Lecture - 37 Divide and Conquer: Counting Inversions

Speeding up Lossless Image Compression: Experimental Results on a Parallel Machine

TASK NOP CIJEVI ROBOTI RELJEF. standard output

Outline of the Lecture

First Name: Last Name: Lab Cover Page. Teaching Assistant to whom you are submitting

Chapter 7: Sorting 7.1. Original

Sorting. APS105: Computer Fundamentals. Jason Anderson

Princeton ELE 201, Spring 2014 Laboratory No. 2 Shazam

MA/CSSE 473 Day 13. Student Questions. Permutation Generation. HW 6 due Monday, HW 7 next Thursday, Tuesday s exam. Permutation generation

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

The Eliot Bank and Gordonbrock Schools Federation. Calculation Policy. Addition Subtraction Multiplication Division Take away practically

Low Power R4SDC Pipelined FFT Processor Architecture

V out. V in VRM. I Load

HOMEWORK ASSIGNMENT 5

A Survey on Power Reduction Techniques in FIR Filter

LT Spice Getting Started Very Quickly. First Get the Latest Software!

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

Distributed Intelligence in Autonomous Robotics. Assignment #1 Out: Thursday, January 16, 2003 Due: Tuesday, January 28, 2003

Lecture #1. Course Overview

1 Permutations. 1.1 Example 1. Lisa Yan CS 109 Combinatorics. Lecture Notes #2 June 27, 2018

CS256 Applied Theory of Computation

The Theory Behind the z/architecture Sort Assist Instructions

ISSN Vol.03,Issue.02, February-2014, Pages:

Design and Implementation of High Radix Booth Multiplier using Koggestone Adder and Carry Select Adder

AC : MOTIVATING STUDENTS TO LEARN PROGRAMMING USING GAME ASSIGNMENTS

Introduction to. Algorithms. Lecture 10. Prof. Piotr Indyk

will talk about Carry Look Ahead adder for speed improvement of multi-bit adder. Also, some people call it CLA Carry Look Ahead adder.

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

ECE 242 Data Structures and Algorithms. Simple Sorting II. Lecture 5. Prof.

PRIORITY QUEUES AND HEAPS

Econ 172A - Slides from Lecture 18

West Windsor-Plainsboro Regional School District Advanced Topics in Computer Science Grades 9-12

Mahendra Engineering College, Namakkal, Tamilnadu, India.

Lab 1. CS 5233 Fall 2007 assigned August 22, 2007 Tom Bylander, Instructor due midnight, Sept. 26, 2007

COS 226 Algorithms and Data Structures Fall Midterm Exam

BMT 2018 Combinatorics Test Solutions March 18, 2018

Lecture 26 ANNOUNCEMENTS OUTLINE. Self-biased current sources BJT MOSFET Guest lecturer Prof. Niknejad

Computer Science Engineering Course Code : 311

EECS150 - Digital Design Lecture 23 - Arithmetic and Logic Circuits Part 4. Outline

COS 226 Algorithms and Data Structures Fall Midterm Exam

Administrivia. CS 188: Artificial Intelligence Spring Agents and Environments. Today. Vacuum-Cleaner World. A Reflex Vacuum-Cleaner

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences. Homework #9 Solution

Design and Implementation of Wallace Tree Multiplier Using Kogge Stone Adder and Brent Kung Adder

Transcription:

Collectives Pattern CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University of Oregon http://ipcc.cs.uoregon.edu/curriculum.html Lecture 8 Collective Pattern

Announcements No class next Thursday, September 21. Instructor will be traveling to a conference. It will be the second day of working on the lab project associated with today s topic. 2

Collectives Collective operations deal with a collection of data as a whole, rather than as separate elements Collective patterns include: Reduce Scan Partition Scatter Gather 3

Collectives Collective operations deal with a collection of data as a whole, rather than as separate elements Collective patterns include: Reduce Scan Partition Scatter Gather Reduce and Scan will be covered in this lecture 4

Reduce Reduce is used to combine a collection of elements into one summary value A combiner function combines elements pairwise A combiner function only needs to be associative to be parallelizable Example combiner functions: Addition Multiplication Maximum / Minimum 5

Reduce Serial Reduction Parallel Reduction 6

Reduce Vectorization 7

Reduce Tiling is used to break chunks of work up for workers to reduce serially 8

Reduce Add Example 1 2 5 4 9 7 0 1 9

Reduce Add Example 1 2 5 4 9 7 0 1 3 8 12 21 28 28 29 29 10

Reduce Add Example 1 2 5 4 9 7 0 1 11

Reduce Add Example 1 2 5 4 9 7 0 1 3 9 1 16 12 17 29 29 12

Reduce We can fuse the map and reduce patterns 13

Reduce Precision can become a problem with reductions on floating point data Different orderings of floating point data can change the reduction value 14

Reduce Example: Dot Product 2 vectors of same length Map (*) to multiply the components Then reduce with (+) to get the final answer Also: 15

Dot Product Example Uses Essential operation in physics, graphics, video games, Gaming analogy: in Mario Kart, there are boost pads on the ground that increase your speed red vector is your speed (x and y direction) blue vector is the orientation of the boost pad (x and y direction). Larger numbers are more power. How much boost will you get? For the analogy, imagine the pad multiplies your speed: If you come in going 0, you ll get nothing If you cross the pad perpendicularly, you ll get 0 [just like the banana obliteration, it will give you 0x boost in the perpendicular direction] Photo source Ref: http://betterexplained.com/articles/vector-calculus-understanding-the-dot-product/ 16

Dot Product Code Examples Dot product code examples available on csserver in directory /home/hwang/cs472/dotproduct 17

Scan The scan pattern produces partial reductions of input sequence, generates new sequence Trickier to parallelize than reduce Inclusive scan vs. exclusive scan Inclusive scan: includes current element in partial reduction Exclusive scan: excludes current element in partial reduction, partial reduction is of all prior elements prior to current element 18

Scan Example Uses Lexical comparison of strings e.g., determine that strategy should appear before stratification in a dictionary Add multi-precision numbers (those that cannot be represented in a single machine word) Evaluate polynomials Implement radix sort or quicksort Delete marked elements in an array Dynamically allocate processors Lexical analysis parsing programs into tokens Searching for regular expressions Labeling components in 2-D images Some tree algorithms e.g., finding the depth of every vertex in a tree 19

Scan Serial Scan Parallel Scan 20

Scan One algorithm for parallelizing scan is to perform an up sweep and a down sweep Reduce the input on the up sweep The down sweep produces the intermediate results Up sweep compute reduction Down sweep compute intermediate values 21

Scan Maximum Example 1 4 0 2 7 2 4 3 1 4 0 2 7 2 4 3 22

Scan Maximum Example 1 4 0 2 7 2 4 3 4 4 1 4 0 2 7 2 4 3 4 4 2 7 4 7 4 7 7 7 7 7 7 4 7 7 1 4 4 4 7 7 7 7 1 4 4 4 7 7 7 7 23

Scan Three phase scan with tiling 24

Scan 25

Scan Just like reduce, we can also fuse the map pattern with the scan pattern 26

Scan 27

Merge Sort as a reduction We can sort an array via a pair of a map and a reduce Map each element into a vector containing just that element <> is the merge operation: [1,3,5,7] <> [2,6,15] = [1,2,3,5,6,7,15] [] is the empty list How fast is this? 28

Right Biased Sort Start with [14,3,4,8,7,52,1] Map to [[14],[3],[4],[8],[7],[52],[1]] Reduce: [14] <> ([3] <> ([4] <> ([8] <> ([7] <> ([52] <> [1]))))) = [14] <> ([3] <> ([4] <> ([8] <> ([7] <> [1,52])))) = [14] <> ([3] <> ([4] <> ([8] <> [1,7,52]))) = [14] <> ([3] <> ([4] <> [1,7,8,52])) = [14] <> ([3] <> [1,4,7,8,52]) = [14] <> [1,3,4,7,8,52] = [1,3,4,7,8,14,52] 29

Right Biased Sort Continued How long did that take? We did O(n) merges but each one took O(n) time O(n 2 ) We wanted merge sort, but instead we got insertion sort! 30

Tree Shape Sort Start with [14,3,4,8,7,52,1] Map to [[14],[3],[4],[8],[7],[52],[1]] Reduce: (([14] <> [3]) <> ([4] <> [8])) <> (([7] <> [52]) <> [1]) = ([3,14] <> [4,8]) <> ([7,52] <> [1]) = [3,4,8,14] <> [1,7,52] = [1,3,4,7,8,14,52] 31

Tree Shaped Sort Performance Even if we only had a single processor this is better We do O(log n) merges Each one is O(n) So O(n*log(n)) But opportunity for parallelism is not so great O(n) assuming sequential merge Takeaway: the shape of reduction matters! 32