Dynamic Programming. Objective

Similar documents
Dynamic Programming. Objective

Chapter 12. Cross-Layer Optimization for Multi- Hop Cognitive Radio Networks

Column Generation. A short Introduction. Martin Riedler. AC Retreat

Two-stage column generation and applications in container terminal management

Transportation Timetabling

LEIBNIZ INDIFFERENCE CURVES AND THE MARGINAL RATE OF SUBSTITUTION

Uncertainty Feature Optimization for the Airline Scheduling Problem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Intelligent Agents & Search Problem Formulation. AIMA, Chapters 2,

37 Game Theory. Bebe b1 b2 b3. a Abe a a A Two-Person Zero-Sum Game

CHAPTER LEARNING OUTCOMES. By the end of this section, students will be able to:

SF2972: Game theory. Mark Voorneveld, February 2, 2015

6. FUNDAMENTALS OF CHANNEL CODER

(Refer Slide Time: 01:45)

Al-Jabar A mathematical game of strategy Designed by Robert P. Schneider and Cyrus Hettle

Game Theory and Randomized Algorithms

Joint Distributions, Independence Class 7, Jeremy Orloff and Jonathan Bloom

A Gentle Introduction to Dynamic Programming and the Viterbi Algorithm

Al-Jabar A mathematical game of strategy Cyrus Hettle and Robert Schneider

A Reconfigurable Guidance System

CS510 \ Lecture Ariel Stolerman

Lecture 20 November 13, 2014

Minmax and Dominance

Enhancing the Economics of Satellite Constellations via Staged Deployment

An Approximation Algorithm for Computing the Mean Square Error Between Two High Range Resolution RADAR Profiles

Locally Informed Global Search for Sums of Combinatorial Games

Assignment Problem. Introduction. Formulation of an assignment problem

Instability of Scoring Heuristic In games with value exchange, the heuristics are very bumpy Make smoothing assumptions search for "quiesence"

Decomposition Search A Combinatorial Games Approach to Game Tree Search, with Applications to Solving Go Endgames

UMBC 671 Midterm Exam 19 October 2009

Harold Benson American Economic Institutions Professor of Information Systems and Operations Management

Maximum Likelihood Sequence Detection (MLSD) and the utilization of the Viterbi Algorithm

Topic 1: defining games and strategies. SF2972: Game theory. Not allowed: Extensive form game: formal definition

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

Computationally Efficient Optimal Power Allocation Algorithms for Multicarrier Communication Systems

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

Lecture 6: Basics of Game Theory

Section Marks Agents / 8. Search / 10. Games / 13. Logic / 15. Total / 46

Statistical Analysis of Nuel Tournaments Department of Statistics University of California, Berkeley

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

Formal Verification. Lecture 5: Computation Tree Logic (CTL)

final examination on May 31 Topics from the latter part of the course (covered in homework assignments 4-7) include:

Backward Induction and Stackelberg Competition

Game Theory. Chapter 2 Solution Methods for Matrix Games. Instructor: Chih-Wen Chang. Chih-Wen NCKU. Game Theory, Ch2 1

A Reinforcement Learning Scheme for Adaptive Link Allocation in ATM Networks

An Interconnect-Centric Approach to Cyclic Shifter Design

CSCI 699: Topics in Learning and Game Theory Fall 2017 Lecture 3: Intro to Game Theory. Instructor: Shaddin Dughmi

A Toolbox of Hamilton-Jacobi Solvers for Analysis of Nondeterministic Continuous and Hybrid Systems

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Module 7-4 N-Area Reliability Program (NARP)

Synthesizing Interpretable Strategies for Solving Puzzle Games

Lecture Notes on Game Theory (QTM)

Flight Demonstration of the Separation Analysis Methodology for Continuous Descent Arrival

Announcements. Homework 1 solutions posted. Test in 2 weeks (27 th ) -Covers up to and including HW2 (informed search)

Modulation and Coding Tradeoffs

Georgia Tech HSMC 2010

Effective and Efficient: Large-scale Dynamic City Express

An Optimal Algorithm for a Strategy Game

Dynamic Programming in Real Life: A Two-Person Dice Game

Cognitive Radios Games: Overview and Perspectives

Trip Assignment. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1. 2 Link cost function 2

Characteristics of Routes in a Road Traffic Assignment

Goals: To study constrained optimization; that is, the maximizing or minimizing of a function subject to a constraint (or side condition).

Game-Playing & Adversarial Search

Dynamic Subcarrier, Bit and Power Allocation in OFDMA-Based Relay Networks

Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility

CSE 573 Problem Set 1. Answers on 10/17/08

Repeated Games. Economics Microeconomic Theory II: Strategic Behavior. Shih En Lu. Simon Fraser University (with thanks to Anke Kessler)

17.181/ SUSTAINABLE DEVELOPMENT Theory and Policy

MATEMATIKA ANGOL NYELVEN

Artificial Intelligence

IMAGE ENHANCEMENT IN SPATIAL DOMAIN

Outline. Communications Engineering 1

Foundations of Artificial Intelligence

Q-Learning Algorithms for Constrained Markov Decision Processes with Randomized Monotone Policies: Application to MIMO Transmission Control

Coding for Efficiency

Solutions to the problems from Written assignment 2 Math 222 Winter 2015

Optimization Techniques for Alphabet-Constrained Signal Design

Ar#ficial)Intelligence!!

OFDM Pilot Optimization for the Communication and Localization Trade Off

Kernels and Support Vector Machines

Design of intelligent surveillance systems: a game theoretic case. Nicola Basilico Department of Computer Science University of Milan

Page 1 of 52 Youtube.com/c/StayLearningNewdelhi

Control of the Contract of a Public Transport Service

Aesthetically Pleasing Azulejo Patterns

18.S34 (FALL, 2007) PROBLEMS ON PROBABILITY

COMP9414: Artificial Intelligence Adversarial Search

Department of Statistics and Operations Research Undergraduate Programmes

Time And Resource Characteristics Of Radical New Product Development (NPD) Projects And their Dynamic Control. Introduction. Problem Description.

Digital Fabrication Production System Theory: towards an integrated environment for design and production of assemblies

Adversarial Search 1

Dominant and Dominated Strategies

Resource Management in QoS-Aware Wireless Cellular Networks

In this lecture, we will learn about some more basic laws governing the behaviour of electronic circuits beyond that of Ohm s law.

Problem 1. (15 points) Consider the so-called Cryptarithmetic problem shown below.

The Case for Optimum Detection Algorithms in MIMO Wireless Systems. Helmut Bölcskei

Chapter 4. Linear Programming. Chapter Outline. Chapter Summary

3.5 Marginal Distributions

Production Functions. Production Function - Basic Model for Modeling Engineering Systems

Optimized Periodic Broadcast of Non-linear Media

Transcription:

Dynamic Programming Richard de Neufville Professor of Engineering Systems and of Civil and Environmental Engineering MIT Massachusetts Institute of Technology Dynamic Programming Slide 1 of 43 Objective To develop dynamic programming, the method used in lattice valuation of flexibility Its optimization procedure: Implicit Enumeration Assumptions: Separability and Monotonicity To show its wide applicability Analysis of Flexible Designs Sequential problems: routing and logistics; inventory plans; replacement policies; reliability Non-sequential problems: investments Massachusetts Institute of Technology Dynamic Programming Slide 2 of 43 Page 1

Outline 1. Why in this course? 2. Basic concept: Implicit Enumeration Motivational Example 3. Key Assumptions Independence (Separability) and Monotonicity 4. Mathematics Recurrence Formulas 5. Example 6. Types of Problems DP can solve 7. Summary Massachusetts Institute of Technology Dynamic Programming Slide 3 of 43 Why in this course? DP is used to find optimum exercise of flexibility (options) in lattice that generally has non-convex feasible region Why is this? Exponential growth ; also Flexibility to choose This presentation gives general method, so you understand it at deeper level DP used in lattice is simple version only 2 states compared at any time Massachusetts Institute of Technology Dynamic Programming Slide 4 of 43 Page 2

Motivational Example Consider Possible Investments in 3 Projects PROJECT 1 PROJECT 2 PROJECT 3 7 7 7 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0.5 1 1.5 2 2.5 3 3.5 INVESTMENT 0 0 1 2 3 4 INVESTMENT 0 0 0.5 1 1.5 2 2.5 3 3.5 INVESTMENT What is best investment of 1 st unit? P3 +3 Of 2 nd? 3 rd? P1 or P3 +2, +2 Total = 7 Massachusetts Institute of Technology Dynamic Programming Slide 5 of 43 Motivational Example: Best Solution PROJECT 1 PROJECT 2 PROJECT 3 7 7 7 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0.5 1 1.5 2 2.5 3 3.5 INVESTMENT 0 0 1 2 3 4 INVESTMENT 0 0 0.5 1 1.5 2 2.5 3 3.5 INVESTMENT Optimum Allocation is Actually (0, 2, 1) 8 Marginal Analysis misses this. because Feasible Region is not convex Massachusetts Institute of Technology Dynamic Programming Slide 6 of 43 Page 3

Point of Example Non-convex feasible region hides optimum Marginal analysis, hill climbing methods (such as linear programming) to search for optimum not appropriate in these cases Not appropriate for lattice models in particular We need to search entire space of possibilities This is what Dynamic Programming does to define optimum solution Massachusetts Institute of Technology Dynamic Programming Slide 7 of 43 Semantic Note Dynamic Programming so named because Originally associated with movements through time and space (e.g., aircraft gaining altitude, thus dynamic ) programming by analogy to linear programming and other forms of optimization Approach useful in many cases that are not dynamic such as motivational example Lattice model is dynamic as it models evolution across time Massachusetts Institute of Technology Dynamic Programming Slide 8 of 43 Page 4

Basic Solution Strategy Enumeration is basic concept This means evaluating all the possibilities Checking all possibilities, we must find best No assumptions about regularity of Objective Function Means that DP can optimize over Non-Convex Feasible Regions Discontinuous, Integer Functions Which other optimization techniques cannot do HOWEVER Massachusetts Institute of Technology Dynamic Programming Slide 9 of 43 Curse of Dimensionality Number of Possible Designs very large Example: a simple development of 2 sites, for 4 sizes of operations over 3 periods Number of Combinations in 1 period = 4 2 = 16 Possibilities over 3 periods = 16 3 = 4096 General size of design space is exponential = [ (Size) locations ] periods Actual enumeration is impractical In lattice model. See next slide Massachusetts Institute of Technology Dynamic Programming Slide 10 of 43 Page 5

The Curse -- in lattice model End states = N OUTCOME LATTICE 100.00 120.00 144.00 172.80 207.36 248.83 298.60 80.00 96.00 115.20 138.24 165.89 199.07 64.00 76.80 92.16 110.59 132.71 51.20 61.44 73.73 88.47 40.96 49.15 58.98 32.77 39.32 26.21 Total States ~ Order of only N 2 /2 Number of paths ~ Order of 2 N To reach each state at last stage = 1 + 6 +13 +16 + 13 + 6 +1 = 46 paths Massachusetts Institute of Technology Dynamic Programming Slide 11 of 43 Concept of Implicit Enumeration Complete Enumeration Impractical We use implicit enumeration (IE) IE considers all possibilities in principle without actually doing so (thus implicit ) Exploits features of problem to Identify groups of possibilities that are dominated sets that all demonstrably inferior Reject these groups -- not just single possibilities Vastly reduce dimensionality of enumeration Massachusetts Institute of Technology Dynamic Programming Slide 12 of 43 Page 6

Effect of Implicit Enumeration Because IE can reject groups of inferior (dominated) possibilities it does not have to examine all and reduces size of problem Specifically: Size of numeration for DP Order of (Size) (Locations)(Periods) Multiplicative size, not exponential This analysis computationally practical Examples illustrate what this means Massachusetts Institute of Technology Dynamic Programming Slide 13 of 43 Demonstration of IE Select a dynamic problem logistic movement from Seattle to Washington DC Suppose that there are 4 days to take trip Can go through several cities There is a cost for the movement between any city and possible city in next stage What is the minimum cost route? Massachusetts Institute of Technology Dynamic Programming Slide 14 of 43 Page 7

Possible routes through a node Many routes, with link costs as in diagram Consider Omaha 3 routes to get there, as shown 3 routes from there => 9 routes via Omaha Seattle 100 Boise 500 Fargo Detroit 200 250 Salt Lake 200 Omaha Memphis DC 300 150 Phoenix 200 Houston Atlanta Massachusetts Institute of Technology Dynamic Programming Slide 15 of 43 Notice that problem is a decision tree In first stage, 3 choices Another 3 in second, another 3 in third In all 27 different paths Same as a complicated decision tree Seattle Not all branches drawn for 3 rd stage Massachusetts Institute of Technology Dynamic Programming Slide 16 of 43 Page 8

Instead of Costing all Routes IE We find best cost to Omaha (350 via Boise) Salt Lake (400), Phoenix (450) routes dominated, pruned we drop routes with those segments Thus don t look at all Seattle to DC routes Seattle 100 Boise 500 Fargo Detroit 200 250 Salt Lake 200 Omaha Memphis DC 300 150 Phoenix 200 Houston Atlanta Massachusetts Institute of Technology Dynamic Programming Slide 17 of 43 Logic of pruning dropping of routes There are many Seattle-DC routes that go through Omaha (9) A set of them (3) are between Seattle and Omaha only one minimum cost level (usually only 1 route, but more might be equal) The Seattle-Omaha routes (2) that are not minimum cost are dominated The routes that contain dominated sections (2 x 3 = 6) can be dropped from consideration Massachusetts Institute of Technology Dynamic Programming Slide 18 of 43 Page 9

Result: Fewer Combinations Total Routes: = 3 to Omaha + 3 after = 6 = 3 x 2, not 9 = 3 2 Savings not dramatic here, illustrate idea Seattle 100 Boise 500 Fargo Detroit 200 250 Salt Lake 200 Omaha Memphis DC 300 150 Phoenix 200 Houston Atlanta Massachusetts Institute of Technology Dynamic Programming Slide 19 of 43 Dynamic Programming Definitions (1) The Objective Function is G(X) Where X = (X 1,. X N ) is a vector, the set of states at each of N stages You can think of X as a path through problem To find the optimum policy or design, we have to find the X J that optimize G(X) The object is to find the set of links that constitute the optimum overall route through problem Massachusetts Institute of Technology Dynamic Programming Slide 20 of 43 Page 10

Concept of Stages A Stage is a transition through problem From Seattle to first stop on the trip, for example Stages may have a physical meaning, as in example, or be conceptual (as the investments in later example) where a stage represents the next project or element or knob for system that we address Massachusetts Institute of Technology Dynamic Programming Slide 21 of 43 Dynamic Programming Definitions (2) In parallel with G(X) which gives overall value g i X J are the return functions They define effect of being in X J state at the i th stage g i X J denotes the functional form Such as the costs of going for each link in a stage X J the different states at the i th stage Such as being in Fargo, Omaha or Houston So that G(X) = [g 1 X 1,. g i X J...g M X N ] Massachusetts Institute of Technology Dynamic Programming Slide 22 of 43 Page 11

Concept of State A state is one of possible locations, levels, or outcome for a stage As a location: Fargo, Omaha or Houston As a level: the amount invested (see later example) Each g i X i is associated with a stage Example: 1 st Stage is from Seattle to Boise, etc Thus g 1 X J are costs from Seattle to Boise, etc and with a state for each stage It is the schedule of costs for stage 1, 2, etc Massachusetts Institute of Technology Dynamic Programming Slide 23 of 43 Examples of States For cross-country shipment, there are 3 states (of system, not as states of USA) for 1 st stage, Boise, Salt Lake and Phoenix For plane accelerating to altitude, a state might be defined by (speed, altitude) vector For investments, states might be $ invested If stage is knob we manipulate on system, state is the setting of the knob Massachusetts Institute of Technology Dynamic Programming Slide 24 of 43 Page 12

Stages and States Stages are associated with each move along trip Stage 1 consists of set of endpoints Boise, Salt Lake and Phoenix, Stage 2 the set of Fargo, Omaha and Houston; etc. States are possibilities in each Stage: Boise, Salt Lake, etc... Seattle 100 Boise 500 Fargo Detroit 200 250 Salt Lake 200 Omaha Memphis DC 300 150 Phoenix 200 Houston Atlanta Massachusetts Institute of Technology Dynamic Programming Slide 25 of 43 Solution depends on Decomposition Must be able to decompose objective function G(X) into functions of individual stages X i : G(X) = [g 1 X 1,. g M X N ] Example: cost of Seattle-DC trip can be decomposed into cost of 4 segments of which Seattle to Boise, Salt Lake or Phoenix is first This is the feature that permits us to consider stages 1 by 1, and thus to prune many logical possibilities Massachusetts Institute of Technology Dynamic Programming Slide 26 of 43 Page 13

Assumptions Needed Necessary conditions for decomposition: Separability Monotonicity Another condition needed for DP: No Cyclic movement (always forward ) Massachusetts Institute of Technology Dynamic Programming Slide 27 of 43 Separability Objective Function is separable if all g i X i are independent of g J X J for all J not equal to I In example, it is reasonable to assume that the cost of driving between any pair of cities is not affected by that between another pair However, not always so Massachusetts Institute of Technology Dynamic Programming Slide 28 of 43 Page 14

Monotonicity Objective Function is monotonic if: improvements in each g i X i lead to improvements in Objective Function, that is if given G(X) = [g i X i, G (X ) ] where X = [X i, X ] for all g i X i > g i X i where X i,x i different X i It is true that [g i X i, G (X) ] > [g i X i, G (X) ] For example Massachusetts Institute of Technology Dynamic Programming Slide 29 of 43 When are functions Monotonic? Additive functions always monotonic Multiplicative functions monotonic only if g i X i are non-negative, real Massachusetts Institute of Technology Dynamic Programming Slide 30 of 43 Page 15

Solution Strategy Two Steps Partial optimization at each stage Repetition of process for all stages This is the process used to value flexibility (options) through the lattice At each stage (period), for each state (possible outcome for system) Process chooses better of using flexibility (exercising option) -- or not using it Massachusetts Institute of Technology Dynamic Programming Slide 31 of 43 Cumulative Return Function Result of Optimization at each stage and state is the cumulative return function = f S (K) f S (K) denotes best value for being in state K, having passed through previous S stages Example: f 2 (Omaha) = 350 Defined in terms of best over previous stages and return functions for this stage, g i X J : f S (K) = Max or Min of [g i X J, f S-1 (K) ] (note: K understood to be a variable) Massachusetts Institute of Technology Dynamic Programming Slide 32 of 43 Page 16

Mathematics: Recurrence formulas Transition from one stage to next is via a recurrence formula or equivalent analysis (see lattice valuation) Formally, we seek the best we can obtain to any specified level K, by finding the best combination of possible g i X J and f S-1 (K) Massachusetts Institute of Technology Dynamic Programming Slide 33 of 43 Application of Recurrence formulas For Example: Consider the Maximization investments in independent projects Each project is a stage Amount of Investment in each is its state Objective Function Is Additive: Value = Σ (value each project) Recurrence formula: f i (K) = Max[g i X J + f i-1 (K- X J ) ] that is: optimum for investing K over i stages = maximum of all combinations of investing level X J in stage i and (K- X J ) in previous stages Massachusetts Institute of Technology Dynamic Programming Slide 34 of 43 Page 17

Application to Investment Example 3 Projects, 4 Investment levels (0, 1, 2, 3) Objective: Maximum for investing 3 units Stages = projects ; States = investment levels 7 6 5 4 3 2 1 0 gi(xi) return function I=1 gi(xi) return function I =2 gi(xi) return function I=3 Massachusetts Institute of Technology Dynamic Programming Slide 35 of 43 Dynamic Programming Analysis (1) At 1 st stage the cumulative return function identically equals return for X 1 That is, f 1 (X 1 ), the best way to allocate resource over only one stage g 1 X 1 There is no other choice So f 1 (0) = 0 f 1 (1) = 2 ; f 1 (2) = 4 ; f 1 (3) = 6 Massachusetts Institute of Technology Dynamic Programming Slide 36 of 43 Page 18

Dynamic Programming Analysis (2) At 2 nd stage, best way to spend: 0 : is 0 on both 1 st and 2 nd stage (= 0) = f 2 (0) 1 : either: 0 on 1 st and 1 on 2 nd stage (= 1) or: 1 on 1 st and 0 on 2 nd stage (= 2) BEST = f 2 (1) 2 : 2 on 1 st, and 0 on 2 nd stage (= 4) 1 on 1 st, and 1 on 2 nd stage (= 3) 0 on 1 st, and 2 on 2 nd stage (= 5) BEST = f 2 (2) 3: 4 Choices, Best allocation is (1,2) 7 = f 2 (3) These results, and the corresponding allocations, shown on next figures Massachusetts Institute of Technology Dynamic Programming Slide 37 of 43 Dynamic Programming Analysis (3) LH Column: 0 in no Project f 0 (0) = 0 2 nd Column: 0 3 in 1 st project, e.g.: f 1 (2) = 4 A B F f 0 (0)=0 f 1 (0)=0 f 2 (0)=0 C G f 1 (1)=2 f 2 (1)=2 D H f 1 (2)=4 f 2 (2)=5 E I M f 1 (3)=6 f 2 (3)=7 f 3 (3)=8 Massachusetts Institute of Technology Dynamic Programming Slide 38 of 43 Page 19

Dynamic Programming Analysis (4) For 3 rd stage (all 3 projects) we want optimum allocation of all 3 units: (0,2,1) f 3 (3) = 8 A B F f 0 (0)=0 f 1 (0)=0 f 2 (0)=0 C G f 1 (1)=2 f 2 (1)=2 D H f 1 (2)=4 f 2 (2)=5 E I M f 1 (3)=6 f 2 (3)=7 f 3 (3)=8 Massachusetts Institute of Technology Dynamic Programming Slide 39 of 43 Contrast DP and Marginal Analysis Marginal Analysis: reduces calculation burden by only looking at best slopes towards goal, discards others Misses opportunities to take losses for later gains approach 7 Dynamic Programming : Looks at all possible positions But cuts out combinations that are dominated Using independence return functions (value from a state does not depend on what happened before) Massachusetts Institute of Technology Dynamic Programming Slide 40 of 43 Page 20

Classes of Problems suitable for DP Sequential, Dynamic Problems -- aircraft flight paths to maximize speed, altitude -- movement across territory (example used) Schedule, Inventory (Management over time) Reliability -- Multiplicative example, see text Flexibility (options) analysis! Non-Sequential: Investment Maximizations Nothing Dynamic. Key is separability of projects Massachusetts Institute of Technology Dynamic Programming Slide 41 of 43 Formulation Issues No standard ( canonical ) form Careful formulations required (see text) DP assumes discrete states thus easily handles integers, discontinuity in practice does not handle continuous variables DP handles constraints in formulation Thus certain paths not defined or allowed Sensitivity analysis is not automatic Massachusetts Institute of Technology Dynamic Programming Slide 42 of 43 Page 21

Dynamic Programming Summary The method used to deal with lattices Solution by implicit enumeration Approach requires separability, monotonicity -- and no cycles Careful formulation needed Useful for wide range of issues -- particularly flexibility, options analyses! Massachusetts Institute of Technology Dynamic Programming Slide 43 of 43 Page 22