Sorting. Suppose behind each door (indicated below) there are numbers placed in a random order and I ask you to find the number 41.

Similar documents
January 11, 2017 Administrative notes

Past questions from the last 6 years of exams for programming 101 with answers.

DATA STRUCTURES USING C

Previous Lecture. How can computation sort data faster for you? Sorting Algorithms: Speed Comparison. Recursive Algorithms 10/31/11

Kenken For Teachers. Tom Davis January 8, Abstract

Teaching the TERNARY BASE

lecture notes September 2, Batcher s Algorithm

Arrays. Independent Part. Contents. Programming with Java Module 3. 1 Bowling Introduction Task Intermediate steps...

Sponsored by IBM. 2. All programs will be re-compiled prior to testing with the judges data.

FOURTH LECTURE : SEPTEMBER 18, 2014

Grade 7/8 Math Circles Game Theory October 27/28, 2015

Permutations. Example 1. Lecture Notes #2 June 28, Will Monroe CS 109 Combinatorics

Counting Problems for Group 2(Due by EOC Sep. 27)

CSc 110, Spring Lecture 40: Sorting Adapted from slides by Marty Stepp and Stuart Reges

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 6 Lecture - 37 Divide and Conquer: Counting Inversions

Searching Lesson Plan

Eleventh Annual Ohio Wesleyan University Programming Contest April 1, 2017 Rules: 1. There are six questions to be completed in four hours. 2.

CS3334 Data Structures Lecture 4: Bubble Sort & Insertion Sort. Chee Wei Tan

Solving the Rubik s Cube

The Theory Behind the z/architecture Sort Assist Instructions

Problem 4.R1: Best Range

Welcome to the Best of Poker Help File.

1 Permutations. Example 1. Lecture #2 Sept 26, Chris Piech CS 109 Combinatorics

CONNECT: Divisibility

Part II: Number Guessing Game Part 2. Lab Guessing Game version 2.0

Lecture 18 - Counting

Solutions to the European Kangaroo Pink Paper

4. Non Adaptive Sorting Batcher s Algorithm

Chapter 7: Sorting 7.1. Original

Econ 172A - Slides from Lecture 18

Acing Math (One Deck At A Time!): A Collection of Math Games. Table of Contents

Introduction to. Algorithms. Lecture 10. Prof. Piotr Indyk

MA 111 Worksheet Sept. 9 Name:

Discrete Structures Lecture Permutations and Combinations

The Exciting World of Bridge

GCSE Unit 2.1 Algorithms

A Mathematical Analysis of Oregon Lottery Win for Life

1 Permutations. 1.1 Example 1. Lisa Yan CS 109 Combinatorics. Lecture Notes #2 June 27, 2018

Mind Ninja The Game of Boundless Forms

NOTES ON SEPT 13-18, 2012

Poker Rules Friday Night Poker Club

LESSON 4. Second-Hand Play. General Concepts. General Introduction. Group Activities. Sample Deals

8 Fraction Book. 8.1 About this part. 8.2 Pieces of Cake. Name 55

STATION 1: ROULETTE. Name of Guesser Tally of Wins Tally of Losses # of Wins #1 #2

GENERALIZATION: RANK ORDER FILTERS

ECE 242 Data Structures and Algorithms. Simple Sorting II. Lecture 5. Prof.

improves your chances of winning and minimizes boredom looking at your opponents winning.

Programming Abstractions

COUNTING AND PROBABILITY

Laboratory 1: Uncertainty Analysis

Olympiad Combinatorics. Pranav A. Sriram

MATH 1324 (Finite Mathematics or Business Math I) Lecture Notes Author / Copyright: Kevin Pinegar

LESSON 3. Third-Hand Play. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 3. Developing Tricks the Finesse. General Concepts. General Information. Group Activities. Sample Deals

School of Computing and Information Technology. ASSIGNMENT 1 (Individual) CSCI 103 Algorithms and Problem Solving. Session 2, April - June 2017

Poker Hands. Christopher Hayes

Simple Counting Problems

Sorting. APS105: Computer Fundamentals. Jason Anderson

Lecture - 06 Large Scale Propagation Models Path Loss

LESSON 8. Putting It All Together. General Concepts. General Introduction. Group Activities. Sample Deals

MITOCW R7. Comparison Sort, Counting and Radix Sort

Introduction to Fractions

LEARN HOW TO PLAY MINI-BRIDGE

Lecture 20: Combinatorial Search (1997) Steven Skiena. skiena

ACTIVITY 6.7 Selecting and Rearranging Things

Poker: Further Issues in Probability. Poker I 1/29

Tribute to Martin Gardner: Combinatorial Card Problems

Squaring. Squaring, Cubing, and Cube Rooting

Evergreen Patient Attraction and Practice Growth Workbook A 30-Day Action Plan. Keith Rhys

Individual 5 th Grade

Math 1111 Math Exam Study Guide

CSE 373 DECEMBER 4 TH ALGORITHM DESIGN

St Thomas of Canterbury Catholic Primary School Where every child is special

TABLE OF CONTENTS TEXAS HOLD EM... 1 OMAHA... 2 PINEAPPLE HOLD EM... 2 BETTING...2 SEVEN CARD STUD... 3

Ovals and Diamonds and Squiggles, Oh My! (The Game of SET)

Getting Affiliates to Sell Your Stuff: What You Need To Know

Checkpoint Questions Due Monday, October 7 at 2:15 PM Remaining Questions Due Friday, October 11 at 2:15 PM

Maze Solving Algorithms for Micro Mouse

Session 5 Variation About the Mean

Here are two situations involving chance:

CS101 Lecture 28: Sorting Algorithms. What You ll Learn Today

LESSON 4. Eliminating Losers Ruffing and Discarding. General Concepts. General Introduction. Group Activities. Sample Deals

6. In how many different ways can you answer 10 multiple-choice questions if each question has five choices?

Programming Languages and Techniques Homework 3

PS 3.8 Probability Concepts Permutations & Combinations

STATISTICAL THINKING IN THE KITCHEN: SAMPLES, POPULATIONS, SAMPLE SIZE, AND REPRESENTATIVENESS 1

Problems involving remainders are popular on the SAT test.

STUDENT'S BOOKLET. Shapes, Bees and Balloons. Meeting 20 Student s Booklet. Contents. April 27 UCI

Contents of this Document [ntc2]

Solutions for the Practice Final

Maths games and activities to help your child s learning Enjoy!

Navy Electricity and Electronics Training Series

MILITARY PRODUCTION MINISTRY Training Sector. Using and Interpreting Information. Lecture 6. Flow Charts.

How To Use Your Blog To Generate Leads

Counting Things. Tom Davis March 17, 2006

1 Place value (1) Quick reference. *for NRICH activities mapped to the Cambridge Primary objectives, please visit

Summer Camp Curriculum

MITOCW ocw lec11

Probability Paradoxes

LESSON 5. Watching Out for Entries. General Concepts. General Introduction. Group Activities. Sample Deals

Transcription:

Sorting Suppose behind each door (indicated below) there are numbers placed in a random order and I ask you to find the number 41. Door #1 Door #2 Door #3 Door #4 Door #5 Door #6 Door #7 Is there an optimal strategy for attempting to find a particular number? What is the shortest number of guesses that it will take? What is the worst case scenario, i.e., what is the maximum number of guesses that it will take?

Now suppose I told you that the numbers are ordered from smallest to largest. What should your strategy be? Is it the same as before? Why or why not. Why did our Divide and Conquer routine work for finding a name in the phone book (from the first class)?

Nowe if the word, which thou art desirous to finde, begin with (a) then looke in the beginning of this Table, but if with (v) looke towards the end. Againe, if thy word beginne with (ca) looke in the beginning of the letter (c) but if with (cu) then looke toward the end of that letter. And so of all the rest. - Robert Cawdrey, A Table Alphabeticall (1604) Even at this time, people realized that a sorted list of items was much more useful than a random list. If you are going to search the list only once, then sorting is not necessary but typically we search lists over and over again. Sorting is at the heart of what many computer algorithms do. Sorting is essential to working with almost any kind of information whether it s finding the biggest or smallest, the most common or the rarest, flagging duplicates, or just looking for something specific. Lists need to be sorted to be useful.

Goals for Sorting and Searching: 1. We want to look at basic algorithms for sorting a list and compare the maximum amount of work required to sort the list. 2. We want to understand the differences between constant work, linear work and quadratic work. 3. We will look at a Brute Force and a Divide & Conquer for searching a sorted list. 4. The next part of this topic is searching the World Wide Web. We want to understand that when you enter search words it is impossible for a search engine to search the entire Web and provide your results in a matter of seconds. So our goal is to understand what is really happening when you do a search.

Sorting and the U.S. Census In the late 19th century the population of the U.S. was growing at about 30% per decade. In the 1870 census, there were 5 subjects of inquiry on the questionaire: race, nationality, sex, age, and occupation. In the 1880 census there were more than 200 inquiries which needed to be compiled. It took the Census Bureau 8 years to tabulate the results from the 1880 census. With such a large increase in population every decade, the system was about to collapse. As a result, the agency held a competition in 1888 to find a more efficient method to process and tabulate data. Contestants were asked to process the 1880 census data from four areas in St. Louis, MO. Whoever captured and processed the data fastest would win the contract for the 1890 census.

Three inventors met the challenge. The winner was an inventor by the name of Herman Hollerith who devised a system of punched manila cards to store information and a machine, called the Hollerith Machine, to count and sort the cards. He was able to beat the other two contestants by a factor of two in capturing the data and a factor of ten in tabulating the data. Thus he won the contract for tabulating and sorting the 1890 census results.

Basically, his machine read holes on paper punch cards. The census information had to be transferred from the census schedules to paper punch cards using special equipment; a person could do approximately 500 cards per day. Did Hollerith ever make any money from his invention? In 1911 his firm merged with several others to become the Computing-Tabulating-Recording Company and a few years later renamed to International Business Machine, i.e., IBM!

Actually, in the early days of computers, each line of a computer program was typed on a punch card with a special machine which punched appropriate holes in the card. Then a card reader was used to enter the computer program. We have come a long way since then!

The first stored computer program (1960s) was for sorting. Its ability to sort more efficiently than the Hollerith machine is what convinced the government to invest heavily in general purpose computers. After looking at various sorting routines we will look at Search Engines which return ranked results matching our search criteria. Even though we call them Search Engines, they are really Sort Engines.

Sorting gets more difficult as the size of problem increases The problem of sorting items is especially difficult because the problem gets much more costly to do as the size of the number of items to be sorted increases. Many problems we encounter in life exhibit the economy of numbers property. For example, cooking for two people is almost as easy as cooking for one, and clearly is far less work than cooking for one person twice. Manufacturing costs typically go down as quantities increase. Many retailers price items cheaper the more you buy. However, compare the following two problems involving sorting 50 books. Which do you think can be done quickest? 1. Sort ten bookshelves with 5 books each 2. Sort one bookshelf with 50 books

Sorting does not exhibit the economy of numbers property. Sorting gets much more difficult as the number of items we want to sort grows. Unfortunately, computers often have to sort millions of items at once. The Guinness Book of World Records attributes the record for sorting a deck of cards to a Czech magician who sorted a 52-card deck in 36.16 seconds. World records are interested in the best case scenario. To compare sorting algorithms we want to know the worst possible case. This allows us to say that the sort will take no more than this amount of work. How do we compare the amount of work different algorithms take to sort a list with N items? Let s look at some real life examples 1 to get a handle on how we compare the work. Suppose you are going to have a dinner party and you know that you have to clean the house before the party. It takes the same amount of work whether 1 Taken from Algorithms to Live By, Christian and Griffiths

you have one guest or ten guests so we say that it takes constant time. This means that it is fixed at some number. It can be a small number or a large number but it does not change based on the number of guests. Another example of constant time is if you are a musician and preparing for your Senior Recital. You prepare as much whether you have 20 people or 100 people in the audience. If we plot the amount of work required to do tasks of varying size if the work is constant time, then we just get a horizontal line. Time Size of problem

Now suppose you have made a beef roast for the main course of your dinner party and you want to pass the roast around the table. If you have ten guests, it takes twice as long to pass the roast as if you had five guests. Also it takes twice as long to pass the roast with twenty guests as with ten guests. We call this linear time, or a constant times the number of guests, N. The constant can be big or small; what matters is that the constant is fixed.

When we plot linear time, we get a straight line. Time Size of problem

What about an example of quadratic time using the dinner party scenario? Suppose as each guest arrives, he/she/they hug you and all other guests that have arrived. Assume you have four guests (and yourself). The first guest hugs you - 1 hug The second guest hugs you and the first guest - 2 hugs The third guest hugs you and the first and second guests - 3 hugs The fourth guest hugs you and the previous 3 guests - 4 hugs So with 4 guests (5 people) there are a total of 4 + 3 + 2 + 1 = 10 hugs Continuing in this we manner we see that with 5 guests there are 15 hugs, with 6 guests there are 21 hugs; with 7 guests 28 hugs, and with 8 guests 36 hugs. So when the guest size doubles to 8 there are now 36 hugs. At this point we really don t know whether it is linear or quadratic but if we continue to have more guests and plot the results, we see that it is quadratic.

There is an easy formula for finding the total number of hugs. Recall that for 5 guests we have 5+4+3+2+1 = 15, for 6 guests 6+5+4+3+2+1 = 21 so all we are doing is finding the sum of the first N integers. Here is a formula for doing this. 1 + 2 + 3 + 4 + + N = N(N + 1) 2 No. of Guests No. of Hugs No. of Guests No. of Hugs 2 1(2)=2 4 2(5)=10 8 4(9)=36 16 8(17)= 136 32 16(32)=512 64 32(65)= 2080 This is clearly not linear. Let s plot the points and then join them.

120 100 80 60 40 20

We say that the amount of work is quadratic in N. Why do we say this when the actual value is N(N + 1) = N 2 + N = N 2 2 2 2 + N 2 What about the one-half factor and the N/2 term? Remember that we are concerned with sorting large lists. The constant in front of the N 2 is not as important as the N 2 because it is fixed. Clearly a constant like 1/10 is better than 1/2 but not all that better. For large N, the term N 2 + N is not much different from N 2 as the following examples illustrate. If N = 100 then N 2 + N = 10, 000 + 100 = 10, 100 If N = 1000 then N 2 + N = 1, 000, 000 + 1000 = 1, 001, 000 If N = 10, 000 then N 2 + N = 100, 000, 000 + 10, 000 = 100, 010, 000

Exercise. Identify each curve as being constant, having linear dependence or quadratic dependence. 50 40 30 20 10

We have already seen that some algorithms require less work than linear but are not constant. For example, our Divide and Conquer algorithm for finding a name in the phone book required less work than linear; in fact, as the phone book doubled we only needed 1 more step. The work for this algorithm is represented by log N.

30 25 20 15 10 5

We will also see that some algorithms require more than linear work but less than quadratic work. These are represented by N log N. 50 40 30 20 10

Here is a table comparing some values of log N, N, N log N and N 2 N log N N N log N N 2 4 2 4 8 16 8 3 8 24 64 16 4 16 64 256 32 5 32 160 1024 64 6 64 384 4096 Example. Compare the work to sort a bookshelf with 50 books compared with 10 bookshelves with 5 books each if the sorting routine takes N 2 comparisons to sort the list. 1. Bookshelf with 50 books 2. Ten bookshelves with 5 books each 50 2 = 2, 500 comparisons Each bookshelf takes 5 2 = 25 and there are 10 shelves so 250 comparisons

We are going to look at four basic routines for sorting. To better understand the routines and get an idea of the amount of work involved, we will do some live sorting using student participation as well as viewing some videos. Sorting routines demonstrated: 1. Selection sort 2. Bubble sort 3. Insertion sort 4. Merge sort

Selection Sort In the first loop of this algorithm you simply compare all items and find the first item (whether it is the smallest number or the first letter) and swap that item with the first. In the second loop you compare the second through last items and find the next item; swap this with the second item. Continue doing this until there is only one item left.

Example. Sort the books on the bookshelf using Selection Sort Nine books. Authors: Huxley, Mitchell, Butler, Bradbury, Wyndham, Burgess, Orwell, Simmons, and Asimov

We compare the books to find the one which comes first alphabetically. Huxley comes before Mitchell, but Butler comes before Huxley so Butler is the first, so far. The next book is Bradbury which is before Butler alphabetically. Comparing Bradbury with Wyndham, Burgess, Orwell, Simmons we see that it is still first but when we compare with the next book we see it appears before Bradbury. Thus we swap the books by Huxley and Asimov. Sorted List Unsorted List

For the second step we start with Cloud Atlas by Mitchell and compare to the next book by Butler; since Butler comes before Mitchell we now compare Butler to the next author, Bradbury. Continuing in this way, we see that Bradbury is next in alphabetic order so we swap Cloud Atlas and The Martian Chronicles. Sorted List Unsorted List

Notice that we have 2 groups of books. The ones to the left have already been sorted and the ones to the right are unsorted. The next step starts by looking at the first book in the unsorted list which is by Butler. Comparing it to Mitchell, Wyndham and then Burgess we see that Burgess comes before Butler. Continuing down the list of books we see that Burgess is before every other author in the unsorted list. So we swap Dawn (by Butler) with A Clockwork Orange (by Burgess).

Sorted List Unsorted List

For the fourth step we swap Mitchell s book with Butler s book. Sorted List Unsorted List

For the fifth step we swap Wyndham s book with Huxley s book. Sorted List Unsorted List

How many more steps does it take to get the books in alphabetical order? How many total steps did we do?

Example. Apply the Selection Sort algorithm to order the following list of numbers. 15 7 41 12 3 Step 1: We compare the first number 15 with the second number 7 so we know 7 is smallest so far. We compare 7 with 41 and 7 is still the smallest. Comparing with 12 we see that 7 is still smallest but when we compare with 3 we see that 3 is the smallest. We swap the first and fifth numbers to get two lists, one sorted and one unsorted. 3 Sorted List 7 41 12 15 Unsorted List

Step 2: We compare the second through fifth numbers and see that the smallest is 7 and it is already in the second position so we don t have to swap any numbers but we move 7 to the sorted list. 3 7 Sorted List 41 12 15 Unsorted List Step 3: We compare the third through fifth numbers and see that the smallest is 12 (position 4) and so we swap the third and fourth items. 3 7 12 Sorted List 41 15 Unsorted List

Step 4: We compare the fourth through fifth items and see that the smallest is 15 (position 5) and so we swap the fourth and fifth numbers. 3 7 12 15 Sorted List 41 Unsorted List We only have one item left so we are done! How many comparisons did it take us to sort a list of length 5? In the first step we had to do 4 comparisons, in the second step 3 comparisons, in the third step 2 comparisons and in the fourth step 1 comparison for a total of 10 comparisons.

If we had an unsorted list of 10 numbers then at each step we have to do the following number of comparisons; at the end of the step we have the given sorted and unsorted sublists. Step 1, 9 comparisons (1 number in sorted list, 9 in unsorted ) Step 2, 8 comparisons (2 numbers in sorted list, 8 in unsorted) Step 3, 7 comparisons (3 numbers in sorted list, 7 in unsorted) Step 4, 6 comparisons (4 numbers in sorted list, 6 in unsorted) Step 5, 5 comparisons (5 numbers in sorted list, 5 in unsorted) Step 6, 4 comparisons (6 numbers in sorted list, 4 in unsorted) Step 7, 3 comparisons (7 numbers in sorted list, 3 in unsorted) Step 8, 2 comparisons (8 numbers in sorted list, 2 in unsorted) Step 9, 1 comparisons (9 numbers in sorted list, 1 in unsorted) So we did a total of 9 + 8 + 7 + + 2 + 1 = 9(10) 2 = 45 comparisons

In general, for a list of length N we do N(N 1) N + (N 1) + (N 2) + + 3 + 2 + 1 = comparisons 2 so Selection Sort is quadratic in the amount of work, i.e., a constant times N 2. Recall that this means if we have a list of 100 items it takes 10,000 units of work but if we have 10,000 items to sort it takes 10,000,000 units of work. Of the five types of work we considered, quadratic is the most costly.

How do we search a sorted list? Suppose that we are given a target value (like 41) and we want to find its location in the sorted list. The Brute Force approach would be to look at each item in the list and compare it with our target value until we find the one that matches. This approach takes at most N comparisons if our sorted list has N values in it. Thus the amount of work is linear. Binary Search is a Divide & Conquer approach which is sometimes called Half-interval Search. Binary Search compares the target value to the middle element of the list. If they are unequal, the half of the sorted list which the target can t be in is eliminated. So the search continues in this way until the location of the target value is found (assuming it is in the sorted list). As with our phone book example, this Divide & Conquer routine requires

log N which is much quicker than N comparisons that the Brute Force approach takes. Example. Use Binary Search to find the location of 41 in the sorted list 2, 5, 9, 11, 17, 18, 22, 25, 27, 31, 32, 38, 41, 43, 45, 46, 51, 54, 55 There are 19 items in our list. Step 1 The middle number is in location 10 and is 31 so we discard the left half of the list and keep the right half. We know that 41 is in the sorted list 32, 38, 41, 43, 45, 46, 51, 54, 55. Step 2 This list has 9 numbers in it so position 5 is the middle with number 45. We know that 45 > 41 so we discard the right half of the list and know that 41 is in the sorted list 32, 38, 41, 43. Step 3 This list has 4 numbers in it so we can either take the 2nd or 3rd positions. We take the 2nd which is the number 38. So we discard the left half of the list and keep the list 41, 43. Step 4 We have two numbers in this list so its middle can be either the 1st or

2nd entry. If we choose the first entry we get the target value. If we choose the 2nd entry we eliminate 43 and know that the target value is in the list 41. Exercises. 1. If you have a sorted list of numbers, how do you find the largest number? 2. At the end of the first step of Binary Search using a target value of 75, which half of the sorted list do you eliminate? 4, 7, 8, 10, 22, 25, 26, 33, 46, 55, 75, 88, 89, 95, 100 3. At the end of the first step of Binary Search using a target value of 22, which half of the sorted list do you eliminate? 4. For the sorted list 4, 7, 8, 10, 22, 25, 26, 33, 46, 55, 75, 88, 89, 95, 100 4, 7, 8, 10, 22, 25, 26, 33, 46, 55, 75, 88, 89, 95, 100

with 15 numbers, what is the maximum number of steps it would take to find a specific target value using the Brute Force approach? What target value requires the most number of steps? Which target value requires the least number of steps using Brute Force. 5. For the sorted list 4, 7, 8, 10, 22, 25, 26, 33, 46, 55, 75, 88, 89, 95, 100 of length 15, what target value would take the least number of steps using the Binary Search?

Bubble Sort / Comparison Sort Bubble/Comparison Sort is another simple type of sorting algorithm. It compares each pair of adjacent items and then they are swapped if they are not in order. You repeatedly go through the list comparing pairs of items until you don t have to do any swaps.

Example. Sort the following 9 books using Bubble Sort. Authors: Huxley, Mitchell, Butler, Bradbury, Wyndham, Burgess, Orwell, Simmons, Asimov

1st Loop through Books Step 1: Compare 1st 2 books; in order, so do nothing Step 2: Compare 2nd & 3rd books; not in order so swap

Step 3: Compare 3rd & 4th books; not in order so swap

Step 4: Compare 4th & 5th books; in order so do nothing Continuing in this manner, we have Step 5 Compare 5th (Wyndham) and 6th (Burgess) books; not in order so

swap; Wyndham now 6th book Step 6 Compare 6th (Wyndham) and 7th (Orwell) books; not in order so swap; Wyndham now 7th book Step 7 Compare 7th (Wyndham) and 8th (Simmons) books; not in order so swap; Wyndham now 8th book Step 8 Compare 8th (Wyndham) and 9th (Asimov) books; not in order so swap; Wyndham now 9th book

Configuration at end of first loop through books For the second loop through the books we start at the first two books and compare them. Clearly we have to interchange the books because Butler comes before Huxley. Since Huxley is the second book now, we compare Huxley with the next book by Bradbury. We interchange second and third

books. The first two steps of this loop are illustrated below. = = In the remainder of the steps for this loop, Butler and Bradbury will remain unchanged so we omit them from our schematic. For the next two steps we have

= = When we compare Mitchell with Orwell, we see that they are already in order; the same is true for Orwell and Simmons. So to finish this loop we have to compare the last three books - Simmons, Asimov and Wyndham. We have

= =

Configuration at end of second loop through books

Configuration at end of third loop through books

Configuration at end of fourth loop through books Every book is now in alphabetical order except for Asimov. How many more loops will it take to fix this?

Why is this approach called Bubble Sort? Notice in our example that Asimov moved one position each loop. In a way we can say that it rises to the top just like bubbles. Some people call this Sinking Sort because, e.g., Wyndham sank to the bottom each loop.

SOCRATIVE QUIZ Sorting Quiz1 CTISC1057 1. If you want to search a list many times then you should sort it first. 2. One of the first uses of sorting information by the U.S. government was for census data. 3. Sorting a bookshelf with 100 books takes the same amount of work as sorting ten bookshelves with 10 books each. 4. Sorting a list of 100 items is twice as hard as sorting a list of 50 items. 5. Which number is largest? (a) 100 2 (b) log 100 (c) 100 (d) 100 log 100

6. In general, an algorithm which requires linear work takes more time than an algorithm that takes quadratic work. 7. If we have 50 items to sort using the Selection Sort algorithm then it takes approximately how many units of work? (a) 50 (b) 100 (c) 250 (d) 2,500 8. At the first step of Selection Sort for sorting alphabetically the list of books by MacCormick, Fix, Strang, and Dahlquist we (a) interchange the books by Fix and MacCormick (b) interchange the books by Dahlquist and MacCormick (c) move the book by Fix ahead of the other three books (d) interchange Fix and MacCormick AND Strang and Dahlquist 9. At the first step of Bubble Sort for sorting alphabetically the list of books by MacCormick, Fix, Strang, and Dahlquist we

(a) interchange the books by Fix and MacCormick (b) interchange the books by Dahlquist and MacCormick (c) move the book by Fix ahead of the other three books (d) interchange Fix and MacCormick AND Strang and Dahlquist 10. If you have the sorted list of 9 numbers 2, 5, 8, 9, 13, 17, 18, 19, 20 and are searching for the number 18, then the first step of Binary Search throws away the numbers (a) 2, 5, 8, 9, 13 (b) 2, 5, 8, 9, 13, 17 (c) 19, 20 (d) 9, 13, 17

Summary of the last lecture and Goals for this lecture Last time we introduced two methods for sorting: Selection Sort and Bubble Sort We compared the differences among constant time, linear time, logarithmic time and quadratic time for completing an algorithm. We looked at Selection Sort in detail and showed that the amount of work required is quadratic. In this lecture we will do an additional example with Bubble Sort and examples with Insertion Sort and demonstrate that they are both quadratic in the amount of work required. Next we will look at the concept behind Merge Sort and do some examples. We will see that it is a more efficient algorithm than either of the other three.

Example. numbers. Apply the Bubble Sort algorithm to order the following list of 15 7 41 12 3 LOOP 1: Compare all adjacent entries in list Compare the first two numbers and if they are in order, do nothing. If not in order, interchange. 15 7 41 12 3 Swap 15 > 7 so SWAP position 7 15 41 12 3

Compare the numbers in the second and third positions and if they are in order, do nothing. If not in order, interchange. 7 15 41 12 3 Do nothing Compare the numbers in the third and fourth positions and if they are in order, do nothing. If not in order, interchange. 7 15 41 12 3 Swap 7 15 12 41 3

Compare the numbers in the fourth and fifth positions and if they are in order, do nothing. If not in order, interchange. 7 15 12 41 3 Swap 7 15 12 3 41 No more items in list so first loop is done Now we have to keep going through the list until all the numbers are in the correct order. For a computer to know this, we have to go through the numbers until there are no swaps in a loop.

Loop 2 7 15 12 3 41 Do nothing Swap 7 12 15 3 41 Swap 7 12 3 15 41 Do nothing At the end of Loop 2 we have

7 12 3 15 41 At the end of Loop 3 we have 7 3 12 15 41 At the end of Loop 4 we have 3 7 12 15 41 We don t know that all the numbers are in order so we have to go through and check a final time. Because there are no interchanges necessary we know that the list is ordered. How much work did it take us to sort a list of length 5? We did 5 loops. In each loop we have to do 4 comparisons. So we had a

total of 5 4 = 20 comparisons. If we had a list of 10 items then we would have to do at most 10 loops with 9 comparisons each time to give a total of 10 9 = 90 comparisons. In general, for a list of length N it takes us N (N 1) = N 2 N comparisons. Last time we said that for large numbers N 2 is much larger than N. For this reason, people just say that it takes at most N 2 comparisons for this algorithm. What is the least number of comparisons the algorithm takes? Exercise. Go through the loops to order the list 101, 45, 33, 91, 22, 24 using the Bubble Sort algorithm. How many comparisons are necessary?

Insertion Sort Insertion Sort combines some of the ideas behind the Bubble Sort and Selection algorithms. It keeps two lists - a sorted and an unsorted list - just like the Selection Sort but the strategy of comparing adjacent items like in the Bubble Sort is used.

Insertion sort with German folk dance

Example. Apply the Insertion Sort algorithm to order the following list of numbers. 15 7 41 12 3 36 8 Compare & Swap Step 1 - Compare the first two items. 7 < 15 so swap these two. We now have a sorted list of 1 item and an unsorted list of 6 items. Sorted List Unsorted List 7 15 41 12 3 36 8 Compare We now compare the next two items in the UNSORTED list. 15 < 41 so

we don t swap the items. Now we move 15 to the SORTED list and we see that 15 > 7 so we put 15 in the second position of the SORTED list. Sorted List Unsorted List 7 15 41 12 3 36 8 Compare & Swap We now compare the next two items in the UNSORTED list. Since 41 > 12 we swap the items. Now we move 12 to the SORTED list and we see that 12 < 15 so we must INSERT 12 before 15 in the SORTED list. We also have to see how 12 compares with the first item in the sorted list. Since 12 > 7 we don t have to do anything.

Sorted List Unsorted List 7 12 15 41 3 36 8 Compare & Swap We now compare the next two items in the UNSORTED list. Since 41 > 3 we swap the items. Now we move 3 to the SORTED list and we see that 3 < 15, 3 < 12 and 3 < 7 so we must INSERT 3 before 7 in the SORTED list. Sorted List Unsorted List 3 7 12 15 41 36 8 Compare & Swap At the next two steps we have

Sorted List Unsorted List 3 7 12 15 36 41 8 Compare & Swap Sorted List Unsorted List 3 7 8 12 15 36 41 Since there is only one number left, we are done and we just add it to the sorted list and we know that it is the largest number. Note that the largest number in the UNSORTED list is always the last number to add. Why? One cost that this algorithm has is that we insert the number. Imagine that

each number is in a slot. Then to insert a number at the first of the sorted list requires moving all the other numbers to the next slot to make room. In the Selection Sort we interchanged items so we only had to move two items. This algorithm is still quadratic in the amount of work it takes.

Exercise. Consider the list 14, 22, 19, 11, 26, 3 1. What will the configuration be at the end of the first step in Selection Sort? 2. What will the configuration be at the end of the first loop through the numbers in Bubble Sort? 3. What will the configuration be at the end of the first step in Insertion Sort?

How can we improve upon this N 2 amount of work? Suppose we want to merge and sort two SORTED lists. It turns out that this is much cheaper than merging and sorting two UNSORTED lists. Example. Merge and sort the following two UNSORTED lists. How much work does it take? 15, 12, 41, 7 and 21, 8, 3, 36 Merging the two lists gives us the list 15,12,41,7,21,8,3,36. So to sort it we have to use one of our previous algorithms which takes about 8 2 comparisons to get the sorted list 3,7,8,12,15,21,36,41. Example. Merge and sort the following two SORTED lists. How many comparisons are needed? 7, 12, 15, 41 and 3, 8, 21, 36

Because the two sublists are sorted, all we have to do at each step is compare the first number of each sublist and pick the smallest. Step 1. In our case the smallest number is 3 so that is the first number in our merged, sorted list. Sorted, merged list: 3 Sublists: 7,12,15,41 and 8,21, 36 Step 2. Comparing the first number in each of the two sublists we see that the smallest number is 7 so that is the second number in our merged, sorted list so we add it after 3. Sorted, merged list: 3,7 Sublists: 12,15,41 and 8,21, 36 Step 3. Comparing the first number in each of the two sublists we see that the smallest number is 8 so that is the third number in our merged, sorted list. Sorted, merged list: 3,7,8 Sublists: 12,15,41 and 21, 36

Step 41. The smallest number is 12 so that is the fourth number in our merged, sorted list. Sorted, merged list: 3,7,8,12 Sublists: 15,41 and 21, 36 Step 5. The smallest number is 15 so that is the fifth number in our merged, sorted list. Sorted, merged list: 3,7,8,12,15 Sublists: 41 and 21, 36 Step 6. The smallest number is 21 so that is the sixth number in our merged, sorted list. Sorted, merged list: 3,7,8,12,15,21 Sublists: 41 and 36 Step 7. The smallest number is 36 so that is the seventh number in our merged, sorted list. Sorted, merged list: 3,7,8,12,15,21,36 Sublists: 41 and 36

Step 8. There is only one number left so add it to our merged, sorted list. Final sorted, merged list: 3,7,8,12,15,21,36,41 At each of the first seven steps we had to do one comparison (compare the first number of each sorted, sublist). So we have a total of 7 comparisons. If we have two lists to merge into one list of length N then it takes no more than N comparisons, i.e., it is linear work. Of course, it took some work to create the sorted sublists in the first place. Merge Sort is based on the idea that it is quick to combine two sorted lists into a single large sorted list. What is the length of a list that is easiest to sort? Any list with only one item is already sorted, so the next easiest is a list with two items because we only have to do one comparison. We want to use the idea of merging sorted lists and breaking the lists down to two items each to begin with to create a new algorithm called Merge Sort.

Merge Sort All of the sorting routines we have encountered are not feasible for large data sets because the worst case scenario in sorting a list of length N is quadratic work, i.e., N 2. The Merge Sort algorithm decreases this to N log N by using a Divide and Conquer strategy. Merge Sort relies on the fact that merging two SORTED lists into a single SORTED list is easier to do than merging two UNSORTED lists into a single SORTED list. For simplicity of exposition, assume that the list contains an even number of items to sort. Merge Sort algorithm begins by putting each item in a sublist by itself but the sublists are still in the same order as given. Then it creates SORTED sublists of size 2 by merging two adjacent sublists. Once this is done, then it merges the sublists of size 2 to get SORTED sublists of size 4, etc.

Merge sort with German folk dance

Example. numbers Apply the Merge Sort algorithm to order the following list of 15, 7, 41, 12, 3, 36, 8, 21 Step 1. At the first step we merge and sort the single items into sublists of size 2. Creating each sorted list of size 2 requires 1 comparison so for this step we need 4 comparisons since we have 4 lists of size 2. 15 7 41 12 3 36 8 21 Compare & Swap Compare & Swap Compare & Do Nothing Compare & Do Nothing Step 2. Now we merge and sort the sublists of size 2 (starting at the left) to get sublists of size 4. Remembering that it is easy to merge lists that are already sorted. In fact, to merge two sorted lists of length 2 into a sorted list of length 4, we require 3 comparisons. So this step takes 2 3 = 6 comparisons.

7 15 12 41 3 36 8 21 Merge & sort lists Merge & Sort lists 7 12 15 41 3 8 21 36 Step 3. For our list, we need one more merge and sort to get a list of 8 numbers. To merge two lists of length 4 to get a sorted list of length 8 requires at most 7 comparisons. 3 7 8 12 15 21 36 41

Why is this faster than the other algorithms we have considered? To sort a list of length 8, the first step required 4 comparisons, the second step required 6 comparisons and the last step required 7 comparisons for a total of 4+6+7 = 17 comparisons. Recall that the other algorithms took N 2 /2 + N/2 comparisons so when N = 8 we have 64/2 + 8/2= 32+4=38 comparisons. Example. Sort the list of 16 numbers using Merge Sort. Determine the maximum number of comparisons that are needed and compare with a sorting algorithm which takes N 2 /2 + N/2. 7, 23, 10, 4, 36, 44, 19, 50, 9, 11, 8, 13, 28, 48, 5, 18 Step 1. Initially each number is in its own list. Then we merge and sort two adjacent numbers. To create the sorted lists of length 2, we need to do 8 comparisons. 7 23 10 4 36 44 19 50 9 11 8 13 28 48 5 18

7 23 4 10 36 44 19 50 9 11 8 13 28 48 5 18 Step 2. Now we have sorted lists of length 2. We now merge and sort adjacent lists. This requires 3 comparisons for each list of length 4 and we have 4 lists so a total of at most 12 comparisons. 7 23 4 10 36 44 19 50 9 11 8 13 28 48 5 18 4 7 10 23 19 36 44 50 8 9 11 13 5 18 28 48 Step 3. Now we have 4 sorted lists of length 4. We now merge and sort adjacent lists to create two sorted lists of length 8. To merge and sort two sorted lists of length 4 requires at most 7 comparisons for each for a total

of at most 14 comparisons. 4 7 10 23 19 36 44 50 8 9 11 13 5 18 28 48 4 7 10 19 23 36 44 50 5 8 9 11 13 18 28 48 Step 4. Our final step is to merge and sort the two sorted lists of length 8 to obtain a sorted list of length 16. To combine the lists we need at most 15 comparisons. 4 7 10 19 23 36 44 50 5 8 9 11 13 18 28 48

4 5 7 8 9 10 11 13 18 19 23 28 36 44 48 50 The maximum number of comparisons that we must do to sort 16 numbers is 8 + 12 + 14 + 15 = 49 comparisons This should be compared with N 2 /2+N/2 which when N = 16 is 256/2+16/2= 128+8 = 136 which is considerably more than Merge Sort. Why do we say at most this number of comparisons? When would it take less than 7 comparisons to merge two sorted lists of length 4 each?

Before we have the quiz for today, we will watch two videos. The first is for Obama s visit to Google and the second is a side-by-side comparison of the speed of sorting algorithms found at toptal.com/developers/sortingalgorithms.

SOCRATIVE QUIZ 2 CTISC1057 1. To create a sorted list by merging two sorted lists takes the same amount of time as creating a sorted list by merging two unsorted lists. 2. The Merge Sort algorithms takes approximately 100 2 comparisons to sort a list of length 100. 3. Consider the list 4, 2, 9, 1, 6, 3 At the end of the first step in the first loop through the numbers in Bubble Sort, the configuration will be (a) 1, 2, 9, 4, 6, 3 (b) 2, 4, 9, 1, 6, 3 (c) 1, 2, 3, 4, 6, 9 (d) 1, 4, 2, 9, 6, 3

4. Consider the list 4, 2, 9, 1, 6, 3 At the end of the first step in Selection Sort, the configuration will be (a) 1, 2, 9, 4, 6, 3 (b) 2, 4, 9, 1, 6, 3 (c) 1, 2, 3, 4, 6, 9 (d) 1, 4, 2, 9, 6, 3 5. Consider the list 4, 2, 9, 1, 6, 3 At the end of the first step of Insertion Sort, the configuration will be (a) 1, 2, 9, 4, 6, 3 (b) 2, 4, 9, 1, 6, 3 (c) 1, 2, 3, 4, 6, 9 (d) 1, 4, 2, 9, 6, 3 6. Consider the list 4, 2, 9, 1, 6, 3

At the end of the first step of Merge Sort, then how many sorted lists of two numbers each will there be? (a) 6 (b) 4 (c) 3 (d) 2 7. If we want to merge two SORTED lists of length 4 each, what is the maximum number of comparisons necessary to form a sorted list of length 8? (a) 6 (b) 7 (c) 9 (d) 10 8. Which algorithm typically takes the least amount of work to sort a list? (a) Selection Sort (b) Bubble Sort

(c) Insertion Sort (d) Merge Sort 9. Which algorithms take the same amount of work to sort a list? (a) Selection & Bubble Sort (b) Selection & Merge Sort (c) Insertion Sort & Merge Sort (d) Bubble Sort & Insertion Sort (e) both (a) and (d) 10. Which number is largest? Assume N represents a positive integer. (a) N (b) log N (c) N 2 (d) N log N