Greedy Algorithms. Kleinberg and Tardos, Chapter 4

Similar documents
Lecture5: Lossless Compression Techniques

Module 3 Greedy Strategy

Module 3 Greedy Strategy

Slides credited from Hsueh-I Lu, Hsu-Chun Hsiao, & Michael Tsai

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

Huffman Coding - A Greedy Algorithm. Slides based on Kevin Wayne / Pearson-Addison Wesley

Introduction to Source Coding

Information Theory and Communication Optimal Codes

Coding for Efficiency

Lecture 20: Combinatorial Search (1997) Steven Skiena. skiena

CSE 417: Review. Larry Ruzzo

MITOCW watch?v=krzi60lkpek

Modeling, Analysis and Optimization of Networks. Alberto Ceselli

Notes for Recitation 3

Communication Theory II

1 This work was partially supported by NSF Grant No. CCR , and by the URI International Engineering Program.

Design of Parallel Algorithms. Communication Algorithms

Fast Sorting and Pattern-Avoiding Permutations

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains:

Hypercube Networks-III

Monday, February 2, Is assigned today. Answers due by noon on Monday, February 9, 2015.

Low-Latency Multi-Source Broadcast in Radio Networks

CSE 573 Problem Set 1. Answers on 10/17/08

Wednesday, February 1, 2017

A Brief Introduction to Information Theory and Lossless Coding

With Question/Answer Animations. Chapter 6

COMM901 Source Coding and Compression Winter Semester 2013/2014. Midterm Exam

Dijkstra s Algorithm (5/9/2013)

Lectures: Feb 27 + Mar 1 + Mar 3, 2017

# 12 ECE 253a Digital Image Processing Pamela Cosman 11/4/11. Introductory material for image compression

RMT 2015 Power Round Solutions February 14, 2015

Network-building. Introduction. Page 1 of 6

Cardinality revisited

Chapter 7: Sorting 7.1. Original

Greedy Flipping of Pancakes and Burnt Pancakes

Checkpoint Questions Due Monday, October 7 at 2:15 PM Remaining Questions Due Friday, October 11 at 2:15 PM

Lecture 2. 1 Nondeterministic Communication Complexity

Question Score Max Cover Total 149

Enumeration of Pin-Permutations

HUFFMAN CODING. Catherine Bénéteau and Patrick J. Van Fleet. SACNAS 2009 Mini Course. University of South Florida and University of St.

More Great Ideas in Theoretical Computer Science. Lecture 1: Sorting Pancakes

CSE 100: BST AVERAGE CASE AND HUFFMAN CODES

MITOCW ocw lec11

DECISION TREE TUTORIAL

lecture notes September 2, Batcher s Algorithm

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Analyzing Games: Solutions

Midterm for Name: Good luck! Midterm page 1 of 9

The tenure game. The tenure game. Winning strategies for the tenure game. Winning condition for the tenure game

MITOCW 6. AVL Trees, AVL Sort

Olympiad Combinatorics. Pranav A. Sriram

Topic 23 Red Black Trees

What is counting? (how many ways of doing things) how many possible ways to choose 4 people from 10?

Topics to be covered

Past questions from the last 6 years of exams for programming 101 with answers.

GENOMIC REARRANGEMENT ALGORITHMS

NOTES ON SEPT 13-18, 2012

CS103 Handout 25 Spring 2017 May 5, 2017 Problem Set 5

Homework Assignment #1

Lecture 14 Instruction Selection: Tree-pattern matching

The Theory Behind the z/architecture Sort Assist Instructions

CS 540-2: Introduction to Artificial Intelligence Homework Assignment #2. Assigned: Monday, February 6 Due: Saturday, February 18

AN INTRODUCTION TO ERROR CORRECTING CODES Part 2

Scheduling. Radek Mařík. April 28, 2015 FEE CTU, K Radek Mařík Scheduling April 28, / 48

CHAPTER 5 PAPR REDUCTION USING HUFFMAN AND ADAPTIVE HUFFMAN CODES

Lecture 13 Register Allocation: Coalescing

FACTORS, PRIME NUMBERS, H.C.F. AND L.C.M.

((( ))) CS 19: Discrete Mathematics. Please feel free to ask questions! Getting into the mood. Pancakes With A Problem!

GENERIC CODE DESIGN ALGORITHMS FOR REVERSIBLE VARIABLE-LENGTH CODES FROM THE HUFFMAN CODE

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON).

Counting. Chapter 6. With Question/Answer Animations

Optimisation and Operations Research

DVA325 Formal Languages, Automata and Models of Computation (FABER)

6.004 Computation Structures Spring 2009

MITOCW watch?v=-qcpo_dwjk4

Sec 5.1 The Basics of Counting

Broadcast Scheduling Optimization for Heterogeneous Cluster Systems

FIU Team Qualifier Competition

Heuristic Search with Pre-Computed Databases

MA/CSSE 473 Day 13. Student Questions. Permutation Generation. HW 6 due Monday, HW 7 next Thursday, Tuesday s exam. Permutation generation

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

UCSD ECE154C Handout #21 Prof. Young-Han Kim Thursday, April 28, Midterm Solutions (Prepared by TA Shouvik Ganguly)

4.4 Shortest Paths in a Graph Revisited

Advances in Ordered Greed

Chapter 12. Cross-Layer Optimization for Multi- Hop Cognitive Radio Networks

Minimal tilings of a unit square

Scheduling for Electricity Cost in Smart Grid. Mihai Burcea, Wing-Kai Hon, Prudence W.H. Wong, David K.Y. Yau, and Hsiang-Hsuan Liu*

Diffusion of Networking Technologies

Discrete Mathematics: Logic. Discrete Mathematics: Lecture 15: Counting

12. 6 jokes are minimal.

5.4 Imperfect, Real-Time Decisions

OVSF-CDMA Code Assignment in Wireless Ad Hoc Networks

Huffman Coding For Digital Photography

The Message Passing Interface (MPI)

COCI 2008/2009 Contest #3, 13 th December 2008 TASK PET KEMIJA CROSS MATRICA BST NAJKRACI

CSE 21 Practice Final Exam Winter 2016

Cooperative Wireless Charging Vehicle Scheduling

Approximation Algorithms for Conflict-Free Vehicle Routing

Chapter 5 Backtracking. The Backtracking Technique The n-queens Problem The Sum-of-Subsets Problem Graph Coloring The 0-1 Knapsack Problem

Transcription:

Greedy Algorithms Kleinberg and Tardos, Chapter 4 1

Selecting gas stations Road trip from Fort Collins to Durango on a given route with length L, and fuel stations at positions b i. Fuel capacity = C miles. Goal: make as few refueling stops as possible. C Fort Collins Durango 2

Selecting gas stations Road trip from Fort Collins to Durango on a given route with length L, and fuel stations at positions b i. Fuel capacity = C. Goal: makes as few refueling stops as possible. Greedy algorithm. Go as far as you can before refueling. In general: determine a global optimum via a number of locally optimal choices. C C C C Fort Collins C C C Durango 3

Selecting gas stations: Greedy Algorithm The road trip algorithm. Sort stations so that: 0 = b 0 < b 1 < b 2 <... < b n = L S {0} x 0 stations selected, we fuel up at home current distance while (x ¹ b n ) let p be largest integer such that b p if (b p = x) return "no solution" x b p S S È {p} return S x + C 4

Interval Scheduling Also called activity selection, or job scheduling... Job j starts at s j and finishes at f j. Two jobs compatible if they don't overlap. Goal: find maximum size subset of compatible jobs. a b c d e f g 0 1 2 3 4 5 6 7 8 9 10 11 h Time 5

Interval Scheduling: Greedy Algorithms Greedy template. Consider jobs in some natural order. Take each job provided it's compatible with the ones already taken. Possible orders: [Earliest start time] Consider jobs in ascending order of s j. [Earliest finish time] Consider jobs in ascending order of f j. [Shortest interval] Consider jobs in ascending order of f j s j. [Fewest conflicts] For each job j, count the number of conflicting jobs c j. Schedule in ascending order of c j. Which of these surely don't work? (hint: find a counter example) 6

Interval Scheduling: Greedy Algorithms Greedy template. Consider jobs in some natural order. Take each job provided it's compatible with the ones already taken. counterexample for earliest start time counterexample for shortest interval counterexample for fewest conflicts 7

Interval Scheduling: Greedy Algorithm Greedy algorithm. Consider jobs in increasing order of finish time. Take each job provided it's compatible with the ones already taken. Sort jobs by finish times so that f 1 f 2... f n. set of jobs selected A f for j = 1 to n { if (job j compatible with A) A A È {j} } return A Implementation. When is job j compatible with A? 8

Interval Scheduling: Greedy Algorithm Greedy algorithm. Consider jobs in increasing order of finish time. Take each job provided it's compatible with the ones already taken. Sort jobs by finish times so that f 1 f 2... f n. A {1} j=1 for i = 2 to n { if S i >=F j A A È {i} j i } return A Implementation. O(n log n). 9

Eg i 1 2 3 4 5 6 7 8 9 10 11 S i 1 3 0 5 3 5 6 8 8 2 12 F i 4 5 6 7 8 9 10 11 12 13 14

Eg i 1 2 3 4 5 6 7 8 9 10 11 S i 1 3 0 5 3 5 6 8 8 2 12 F i 4 5 6 7 8 9 10 11 12 13 14 A = {1,4,8,11} Greedy algorithms determine a globally optimum solution by a series of locally optimal choices. Greedy solution is not the only optimal one: A' = {2,4,9,11}

Greedy works for Activity Selection = Interval Scheduling Proof by induction BASE: Optimal solution contains activity 1 as first activity Let A be an optimal solution with activity k!= 1 as first activity Then we can replace activity k (which has F k >=F 1 ) by activity 1 So, picking the first element in a greedy fashion works STEP: After the first choice is made, remove all activities that are incompatible with the first chosen activity and recursively define a new problem consisting of the remaining activities. The first activity for this reduced problem can be made in a greedy fashion by the base principle. By induction, Greedy is optimal.

What did we do? We assumed there was a non greedy optimal solution, then we stepwise morphed this solution into a greedy optimal solution, thereby showing that the greedy solution works in the first place. This is called the exchange argument.

Extension 1: Scheduling all intervals Lecture j starts at s j and finishes at f j. Goal: find minimum number of classrooms to schedule all lectures so that no two occur at the same time in the same room. This schedule uses 4 classrooms to schedule 10 lectures: 4 e j 3 c d g 2 b h 1 a f i 9 9:30 10 10:30 11 11:30 12 12:30 1 1:30 2 2:30 Can we do better? 3 3:30 4 4:30 Time

Scheduling all intervals Eg, lecture j starts at s j and finishes at f j. Goal: find minimum number of classrooms to schedule all lectures so that no two occur at the same time in the same room. This schedule uses 3: 3 c d f j 2 b g i 1 a e h 9 9:30 10 10:30 11 11:30 12 12:30 1 1:30 2 2:30 3 3:30 4 4:30 Time 15

Interval Scheduling: Lower Bound Key observation. Number of classrooms needed ³ depth (maximum number of intervals at a time point) Example: Depth of schedule below = 3 Þ schedule is optimal. We cannot do it with 2. Q. Does there always exist a schedule equal to depth of intervals? (hint: greedily label the intervals with their resource) 3 c d f j 2 b g i 1 a e h 9 9:30 10 10:30 11 11:30 12 12:30 1 1:30 2 2:30 3 3:30 4 4:30 Time 16

Interval Scheduling: Greedy Algorithm Greedy algorithm. allocate d labels(d = depth) sort the intervals by starting time: I 1,I 2,..,I n for j = 1 to n for each interval I i that precedes and overlaps with I j exclude its label for I j pick a remaining label for I j 17

Greedy works allocate d labels (d = depth) sort the intervals by starting time: I 1,I 2,..,I n for j = 1 to n for each interval I i that precedes and overlaps with I j exclude its label for I j pick a remaining label for I j Observations: v v There is always a label for I j assume t intervals overlap with I j ; these pass over a common point, so t+1 < d, so there is one of the d labels available for I j No overlapping intervals get the same label by the nature of the algorithm

Huffman Code Compression

Huffman codes Say I have a code consisting of the letters a, b, c, d, e, f with frequencies (x1000) 45, 13, 12, 16, 9, 5 What would a fixed length binary encoding look like? a b c d e f 000 001 010 011 100 101 What would the total encoding length be? 100,000 * 3 = 300,000

Fixed vs. Variable encoding a b c d e f frequency(x1000) 45 13 12 16 9 5 fixed encoding 000 001 010 011 100 101 variable encoding 0 101 100 111 1101 1100 100,000 characters Fixed: 300,000 bits Variable? (1*45 + 3*13 + 3*12 + 3*16 + 4*9 + 4*5)*1000 = 224,000 bits 25% saving

Variable prefix encoding a b c d e f frequency(x1000) 45 13 12 16 9 5 fixed encoding 000 001 010 011 100 101 variable encoding 0 101 100 111 1101 1100 what is special about our encoding? no code is a prefix of another. why does it matter? We can concatenate the codes without ambiguities 001011101 = aabe

Two characters, frequencies, encodings Say we have two characters a and b, a with frequency f a and b with frequency f b e.g. a has frequency 70, b has frequency 30 Say we have two encodings for these, one with length l 1 one with length l 2 e.g. 101, l 1 =3, 11100, l 2 =5 Which encoding would we chose for a and which for b? if we assign a = 101 and b=11100 what will the total number of bits be? if we assign a = 11100 and b=101 what will the total number of bits be? Can you relate the difference to frequency and encoding length? 23

Frequency and encoding length Two characters, a and b, with frequencies f1 and f2, two encodings 1 and 2 with length l1 and l2 f1 > f2 and l1 > l2 I: a encoding 1, b encoding 2: f1*l1 + f2*l2 II: a encoding 2, b encoding 1: f1*l2 + f2*l1 Difference: (f1*l1 + f2*l2) - (f1*l2 + f2*l1) = f1*(l1-l2) + f2*(l2-l1) = f1*(l1-l2) - f2*(l1-l2) = (f1-f2)*(l1-l2) So, for optimal encoding: the higher the frequency, the shorter the encoding length 24

Cost of encoding a file: ABL For each character c in C, f(c) is its frequency and d(c) is the number of bits it takes to encode c. So the number of bits to encode the file is c in C f (c)d(c) The Average Bit Length of an encoding E: 1 ABL(E) = f (c)d(c) n c in C where n is the number of characters in the file

Huffman code An optimal encoding of a file has a minimal cost ie minimal ABL. Huffman invented a greedy algorithm to construct an optimal prefix code called the Huffman code. An encoding is represented by a binary prefix tree: intermediate nodes contain frequencies the sum frequencies of their children leaves are the characters + their frequencies paths to the leaves are the codes the length of the encoding of a character c is the length of the path to c:f c

Prefix tree for the variable encoding a : 0 100 b : 101 0/ \1 c : 100 / \ d : 111 a:45 55 e : 1101 / \ f : 1100 0/ \1 25 30 0/ \1 0/ \1 c:12 b:13 14 d:16 / \ 0/ \1 f:5 e:9

Optimal prefix trees are full The frequencies of the internal nodes are the sums of the frequencies of their children. A binary tree is full if all its internal nodes have two children. If the prefix tree is not full, it is not optimal. Why? If a tree is not full it has an internal node with one child labeled with a redundant bit. Check the fixed encoding: a:000 b:001 c:010 d:011 e:100 f:101 28

a: 000 100 b: 001 0/ \1 c: 010 / \ d: 011 86 14 e: 100 0/ \1 0 redundant f: 101 / \ 58 28 14 0/ \1 0/ \1 0/ \1 / \ / \ / \ a:45 b:13 c:12 d:16 e:9 f:5

Huffman algorithm Create C leaves, one for each character Perform C -1 merge operations, each creating a new node, with children the nodes with least two frequencies and with frequency the sum of these two frequencies. By using a heap for the collection of intermediate trees this algorithm takes O(nlgn) time. buildheap do C -1 times t1 = extract-min t2 = extract-min t3 = merge(t1,t2) insert(t3)

1) f:5 e:9 c:12 b:13 d:16 a:45 2) c:12 b:13 14 d:16 a:45 / \ f e 3) 14 d:16 25 a:45 / \ / \ f e c b 4) 25 30 a:45 / \ / \ c b 14 d / \ f e 5) a:45 55 / \ 25 30 / \ / \ c b 14 d / \ f e 6) 100 / \ a 55 / \ 25 30 / \ / \ c b 14 d / \ f e

Huffman is optimal Base step of inductive approach: Let x and y be the two characters with the minimal frequencies, then there is a minimal cost encoding tree with x and y of equal and highest depth (see e and f in our example above). How? The proof technique is the same exchange argument have we have used before: If the greedy choice is not taken then we show that by taking the greedy choice we get a solution that is as good or better.

Base of the inductive proof Let leaves x,y have the lowest frequencies. T Assume that two other characters a and b / \ with higher frequencies are siblings at the O x lowest level of the tree T / \ y O / \ a b Since the frequencies of x and y are lowest, the cost can only improve if we swap y and a, T and x and b: / \ why? O b / \ a O / \ y x

Proof of base T T / \ / \ O x O b / \ / \ y O a O / \ / \ a b y x Since the frequencies of x and y are lowest, the cost can only improve if we swap y and a, and x and b. We need to prove: cost left tree > cost right tree (a,y part of) cost of left tree: d 1 f y +d 2 f a, of right tree: d 1 f a +d 2 f y d 1 f y +d 2 f a - d 1 f a -d 2 f y = d 1 (f y -f a ) +d 2 (f a -f y ) = (d 2 -d 1 )(f a -f y ) > 0 same for x and b

Greedy works: base and step Base: we have shown that putting the lowest two frequency characters lowest in the tree is a good greedy starting point for our algorithm. Step: We create an alphabet C' = C with x and y replaced by a new character z with frequency f(z)=f(x)+f(y) with the induction hypothesis that encoding tree T' for C' is optimal, then we must show that the larger encoding tree T for C is optimal. (eg, T' is the tree created from steps 2 to 6 in the example)

Proof of step: by contradiction 1. cost(t) = cost(t')+f(x)+f(y): d(x)=d(y)=d(z)+1 so f(x)d(x)+f(y)d(y) = (f(x)+f(y))(d(z)+1) = f(z)d(z)+f(x)+f(y) ( because f(z)=f(x)+f(y) ) 2. Now suppose T is not an optimal encoding tree, then there is another optimal tree T''. We have shown (base) that x and y are siblings at the lowest level of T''. Let T''' be T'' with x and y replaced by z, then cost(t''') = cost(t'')-f(x)-f(y) < cost(t)-f(x)-f(y) = cost(t'). But that yields a contradiction with the induction hypothesis that T' is optimal for C'. Hence greedy Huffman produces an optimal prefix encoding tree.

Conclusion: Greedy Algorithms At every step, Greedy makes the locally optimal choice, "without worrying about the future". Greedy stays ahead. Show that after each step of the greedy algorithm, its solution is at least as good as any other. Show Greedy works by exchange / morphing argument. Incrementally transform any optimal solution to the greedy one without worsening its quality. Not all problems have a greedy solution. None of the NP problems (eg TSP) allow a greedy solution. 37