CSE465, Spring 2009 March 16 1 Bucket sort Bucket sort has two meanings. One is similar to that of Counting sort that is described in the book. We assume that every entry to be sorted is in the set {0, 1,..., m 1}. We sort array fragment < A,0, n >using array of buckets B[m]. Bucket_sort(A,n,B,m) { // distribution for (i = 0; i < m; i++) place A[i] in bucket B[A[i]]; // collection for (i = j = 0; j < m; j++ ) { while (B[j] is empty) x = removed from B[j], A[i++] = x; } }
CSE465, Spring 2009 March 16 2 Example. 3240152343 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 3 Distribution: 240152343 3 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 3 Distribution: 40152343 2 3 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 3 Distribution: 0152343 2 3 4 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 3 Distribution: 152343 0 2 3 4 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 3 Distribution: 52343 0 1 2 3 4 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 3 Distribution: 2343 0 1 2 3 4 5 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 3 Distribution: 343 0 1 2 3 4 5 2 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 3 Distribution: 43 0 1 2 3 4 5 2 3 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 3 Distribution: 3 0 1 2 3 4 5 2 3 4 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 3 Distribution: 0 1 2 3 4 5 2 3 4 3 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 4 Collection:: 0 1 2 3 4 5 2 3 4 3 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 4 Collection:: 01 2 3 4 5 2 3 4 3 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 4 Collection:: 012 3 4 5 2 3 4 3 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 4 Collection:: 0122 3 4 5 3 4 3 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 4 Collection:: 01223 4 5 3 4 3 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 4 Collection:: 012233 4 5 4 3 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 4 Collection:: 0122333 4 5 4 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 4 Collection:: 01223334 4 5 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 4 Collection:: 012233344 5 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 4 Collection:: 0122333445 B[0] B[1] B[2] B[3] B[4] B[5]
CSE465, Spring 2009 March 16 5 Counting sort We can use fragments of another array as buckets. If we place them appropriately, we do not need Collection stage instead, we need to perform Census stage to calculate the placement of the buckets. Counting_sort(A,B,n) { int C[m+1]; // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[A[i]]++; // bucket i will be <B,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) B[C[A[i]]++] = A[i]; } In this algorithm we are performing a constant number of work per each sorted number we look at it during Census of each bucket and move it during Transfer to buckets and for each bucket to Prepare counters and to compute the bucket limits. Thus the running time is Θ(m + n) Remark. This is a stable sorting method; we do not change relative positions in the order of numbers that are equal; this is important later.
CSE465, Spring 2009 March 16 6 Example. 3240152343 Array C[6]: 0123456 000000
CSE465, Spring 2009 March 16 7 After census of buckets: 3240152343 Array C[6]: 0123456 112321
CSE465, Spring 2009 March 16 8 Computing bucket s left ends: Array C[6]: Array C[6]: Array C[6]: Array C[6]: Array C[6]: Array C[6]: Array C[6]: 012345 6 11232110 012345 6 112329 10 012345 6 11237 9 10 012345 6 1124 7 9 10 012345 6 112 4 7 9 10 012345 6 1 1 2 4 7 9 10 012345 6 0 1 2 4 7 9 10
CSE465, Spring 2009 March 16 9 Transfer: 3240152343 Array C[6]: Array B[10]: 012345 6 01247910
CSE465, Spring 2009 March 16 9 Transfer: 240152343 Array C[6]: Array B[10]: 012345 6 01257910 3
CSE465, Spring 2009 March 16 9 Transfer: 40152343 Array C[6]: Array B[10]: 012345 6 01357910 2 3
CSE465, Spring 2009 March 16 9 Transfer: 0152343 Array C[6]: Array B[10]: 012345 6 01358910 2 3 4
CSE465, Spring 2009 March 16 9 Transfer: 152343 Array C[6]: Array B[10]: 012345 6 11358910 0 2 3 4
CSE465, Spring 2009 March 16 9 Transfer: 52343 Array C[6]: Array B[10]: 012345 6 12358910 012 3 4
CSE465, Spring 2009 March 16 9 Transfer: 2343 Array C[6]: Array B[10]: 01234 5 6 123581010 012 3 4 5
CSE465, Spring 2009 March 16 9 Transfer: 343 Array C[6]: Array B[10]: 01234 5 6 124581010 01223 4 5
CSE465, Spring 2009 March 16 9 Transfer: 43 Array C[6]: Array B[10]: 01234 5 6 124681010 012233 4 5
CSE465, Spring 2009 March 16 9 Transfer: 3 Array C[6]: Array B[10]: 01234 5 6 124691010 012233 445
CSE465, Spring 2009 March 16 9 Transfer: Array C[6]: Array B[10]: 01234 5 6 124791010 0122333445
CSE465, Spring 2009 March 16 10 Radix sort We will transform Counting Sort into sorting algorithm that is good for sorting numbers in {0, 1,..., m 3 1}. A number k from this range has three digits, k = digit(k,0) + m digit(k,1) + m 2 digit(k,2). Counting_sort(A,B,n) { int C[m+1]; // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[A[i]]++; // bucket i will be <B,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) B[C[A[i]]++] = A[i]; }
CSE465, Spring 2009 March 16 10 Radix sort We will transform Counting Sort into sorting algorithm that is good for sorting numbers in {0, 1,..., m 3 1}. A number k from this range has three digits, k = digit(k,0) + m digit(k,1) + m 2 digit(k,2). Radix_sort(A,B,n) // preliminary { int C[m+1], *S = A, *T = B, *temp; // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[digit(S[i],0)]++; // bucket i will be <T,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) T[C[digit(S[i],0)]++] = S[i]; temp = T, T = S, S = temp; } 0 1 2 3 4 5 6 7 8 9 213 352 144 40 501 205 32 3 154 433 will be transformed into 0 1 2 3 4 5 6 7 8 9 40 501 352 32213 3433 144 154 205
CSE465, Spring 2009 March 16 11 We can repeat what we did once more, but now wewill look at the next digit: Radix_sort(A,B,n) // still preliminary { int C[m+1], *S = A, *T = B, *temp; for (d = 0; d < 2; d++) { // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[digit(S[i],d)]++; // bucket i will be <T,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) T[C[digit(S[i],d)]++] = S[i]; temp = T, T = S, S = temp; } } 0 1 2 3 4 5 6 7 8 9 213 352 144 40 501 205 32 3 154 433 will be transformed into 0 1 2 3 4 5 6 7 8 9 40 501 352 32213 3433 144 154 205 will be transformed into 0 1 2 3 4 5 6 7 8 9 501 3205 213 32 433 40 144 352 154
CSE465, Spring 2009 March 16 12 Now wecan present the final version: Radix_sort(A,B,n) { int C[m+1], *S = A, *T = B, *temp; for (d = 0; d < 3; d++) { // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[digit(S[i],d)]++; // bucket i will be <T,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) T[C[digit(S[i],d)]++] = S[i]; temp = T, T = S, S = temp; } } 0 1 2 3 4 5 6 7 8 9 213 352 144 40 501 205 32 3 154 433 will be transformed into 0 1 2 3 4 5 6 7 8 9 40 501 352 32213 3433 144 154 205 will be transformed into 0 1 2 3 4 5 6 7 8 9 501 3205 213 32 433 40 144 352 154 will be transformed into 0 1 2 3 4 5 6 7 8 9 3 32 40 144 154 205 213 352 433 501
CSE465, Spring 2009 March 16 13 Final remarks on Radix sort. We can Radix sort with any number of digits. Because we compute digits very often, it is good to compute them fast. One good way is to look at the keys that are sorted as strings of unsigned characters. This way wehave mequal to 256 and we do not compute the digits, just read them. This is particularly good if the keys are indeed strings, e.g. names to be sorted alphabetically etc. Interestingly, we can use this approach to sort positive floating point numbers: exponent byte is the most significant, when exponents are equal, we should compare mantissas. If the set to be sorted has many thousands of elements (or millions) we prefer to have fewer passes. We can use pair of bytes/characters, and characters a, b define digit a + 256 b.
CSE465, Spring 2009 March 16 14 Lower bound on comparison sorting Counting sort and Radix sort select the place where a sorted number k should be moved based on a function of its value, e.g. digit(k, d). This assumes some knowledge about the range of objects that we are sorting, and is not necessarily useful in every possible range. Therefore we are interested in comparison sorting, i.e. in sorting algorithms in which we do not compute any fuctions of the values of the sorted objects other than comparisons, Boolean functions on pairs of objects. It is easy to show that an algorithm that sorts n numbers must perform, in the worst case, atleast log 2 (n!) comparisons. We may assume that all the objects in the input are distinct. Then there exists exactly one permutation π such that for input a 0, a 1,..., a n 1 the valid output is a π (0), a π (1),..., a π (n 1). There are n! possible permutation and anyone of them is needed for some input. Let Π be the set of permutations that are needed for one of the inputs that would give the answers to comparisons that we have seen so far. Initially, before performing any comparisons, Π = n! input is possible (and thus every permutation). because every Suppose now that we are about to perform a comparison, say a i < a j. Let Π yes be the set of permutations from Π that are consistent with the positive answer, and let Π no be the set of permutations from Π that are consistent with the negative answer. For some val {no, yes} the set Π val is at least as large as the other one. If Π is not empty, then it is possible that val is the answer to the comparison a i < a j. Thus it is possible that as the result of comparison a i < a j we change set Π into Π val and Π val Π 2 1. Consequently, it is possible that after performing k comparisons we have Π n!2 k. On the other hand, after sorting is completed, we have Π ={π} for a certain permutation π, sowehave Π = 1.
CSE465, Spring 2009 March 16 15 Therefore if k is the largest total number of permutation performed by our unknown algorithm for some input, then n!2 k 1 1 n!2 k 2 k n! k log 2 (n!) k Σ n log 2 n. i=2 On the other hand, we can estimate the latter summation as follows: n n Σ log 2 i > log 2 x dx = 1 n 1 ln 2 ln x dx = ln 2 [x ln x x]n 1 = i=2 1 ln 2 1 (n ln n n + 1) = n log 2 n n 1 ln 2. 1