LECTURE 9 - LAGRANGE MULTIPLIERS CHRIS JOHNSON Abstract. In this lecture we ll describe a way of solving certain optimization problems subject to constraints. This method, known as Lagrange multipliers, gives us a way to algebraically solve such optimization problems and, unlike the previously described simplex algorithm, doesn t require that our function or constraints be linear.. Geometric Motivation Suppose that we want to solve the following problem: minimize xy subject to x y 3 + y = 2 That is, we want to find the (x, y) that makes xy as small as possible, but we only are conscerned with the points that satisfy x y 3 + y = 2. To get an idea for how to do this, suppose that we first graph the points satisfying x y 3 + y = 2. See Figure. Date: March 3, 203. 0.5 0.5.5 2 2.5 3-0.5 - -.5-2 Figure. The constraint x y 3 + y = 2.
2 CHRIS JOHNSON Our goal is to find the point on this curve, x y 3 + y = 2, which makes xy as small as possible. To do this we ll consider curves of the form xy = c, where c is some constant, and try to make this value c (which is what we re trying to minimize) as small as possible and still intersect the curve x y 3 + y = 2. Let s begin by setting c = ; so we plot the curve xy =. This gives us the red curve in Figure 2. 0.5-0.5 - -.5-2 0.5.5 2 2.5 3 Figure 2. The constraint curve together with the curve xy =. Notice that the red curve touches the blue curve in exactly two spots. The coordinates of these intersection points tell us which (x, y) we can plug in for xy to get the value and still satisfy the constraint x y 3 + y = 2. Since we want to make c as small as possible and still intersect the blue curve, let s repeat the above process with the value c =.2. So we plot xy =.2. To compare with the previous plot, for c =, both curves are shown together in Figure 3. Continuing this process, let s plot the curve xy = c for lots of values of c, and then try to find which of these c s is as small as possible with the curve xy = c still intersecting our constraint curve x y 3 + y = 2. This is shown in Figure 4. From the picture it appears that c = 2 is the smallest we can make xy = c and still intersect the curve x y 3 + y = 2. To find the (x, y)- coordinates of the point where this happens we d want to solve the system xy = 2 and x y 3 + y = 2. Let s make a few simple observations about the pictures we ve constructed above. The minimal value occured when the curves xy = c and x y 3 + y = 2 were tangent. This isn t a coincidence: if our value
LECTURE 9 - LAGRANGE MULTIPLIERS 3 0.5 0.5.5 2 2.5 3-0.5 - -.5-2 Figure 3. The constraint curve together with the curves xy = and xy =.2. 0.5 0.5.5 2 2.5 3-0.5 - -.5-2 -.2 -.0-2.2 -.4 -.6 -.8-2.0 Figure 4. The curves xy = c plotted for several values of c of c was almost optimal, then we d have two very close points of intersection. Pushing c a little bit closer to the optimal value makes the points move closer together, and when we get to the optimal value the two points collide and we have a point of tangency. Notice that this is not the same as saying there is exactly one point where the optimal value occurs, nor does it guarantee that a point of tangency is a global max/min.
4 CHRIS JOHNSON Even though we were solving a minimization problem above, we could try to solve the corresponding maximization problem in the exact same way: keep increasing c until we find the largest c so that the curves x y 3 + y = 2 and xy = c are tangent. 2. Gradients In order to solve these problems in a precise way, we need to make use of the gradient of a multivariable function. Given a differentiable function f(x, y), the gradient of f(x, y) is the (multivariable) vectorvalued function whose components are the partial derivatives of f(x, y). This function is denoted f(x, y) and is sometimes pronounced del of f(x, y) : f(x, y) = f x (x, y), f y (x, y). Example. Calculate the gradient of f(x, y) = 3xy 2 sin(x) + y x. ( f(x, y) = 3xy 2 sin(x) + y x), ( 3xy 2 sin(x) + y x) x y = 3y 2 cos(x) y x ln(y), 6xy xy x. We ll see the gradient several times through the semester, particularly in the next lecture when we talk about directional derivatives. Right now we care about the gradient because of the following theorem: Theorem. If f(x, y) is differentiable, then any vector tangent to the curve f(x, y) = c at (x 0, y 0 ) is orthogonal to f(x 0, y 0 ). Proof. By implicit differentiation, the slope dy of a curve f(x, y) = c is dx f x(x 0,y 0 ). One vector which obviously has this slope is f f y(x 0,y 0 ) y(x 0, y 0 ), f x (x 0, y 0 ). Any other tangent vector is thus a scalar multiple of this one. To check orthogonality we just take the dot prodct. f y (x 0, y 0 ), f x (x 0, y 0 ) f(x 0, y 0 ) = f y (x 0, y 0 ), f x (x 0, y 0 ) f x (x 0, y 0 ), f y (x 0, y 0 ) =f y (x 0, y 0 ) f x (x 0, y 0 ) f x (x 0, y 0 ) f y (x 0, y 0 ) =0 3. Lagrange Multipliers Now we need some way of taking the intuitive ideas above and putting them in a more formal framework. So let s suppose we have a more general optimization problem:
LECTURE 9 - LAGRANGE MULTIPLIERS 5 maximize f(x, y) subject to g(x, y) = k So of all the (x, y) pairs that satisfy g(x, y) = k, we want to pick the pair that makes f(x, y) as big as possible. In the motivating example above we considered curves f(x, y) = c and tried to make c as large as possible while still intersecting g(x, y) = k. When this happened, the curves g(x, y) = k and f(x, y) = c were tangent. So in general what we want to do is find the (x, y) so that the tangent vectors of g(x, y) = k and f(x, y) = c are parallel. Thus if f(x, y) and g(x, y) are both differentiable, then at any point where the curves f(x, y) = c and g(x, y) = k are tangent, the gradient vectors f(x, y) and g(x, y) have to be parallel. By the above theorem, to find these points of tangency, and hence to find the maximum (or minimum) of f(x, y) subject to g(x, y) = k, we need to find the (x, y) pairs that simultaneously solve f(x, y) = λ g(x, y) (because these vectors are parallel; they are scalar multiples) and g(x, y) = k (because the points have to satisfy our constraint). This turns an optimzation problem into a problem of algebra: solving a system of equations. This is known as the method of Lagrange multipliers. (The Lagrange multiplier is the value λ above.) maximize f(x, y) subject to g(x, y) = k Solve the system: f(x, y) = λ g(x, y) g(x, y) = k Notice that f(x, y) = λ g(x, y), when written out in components, really is just an algebraic system of equations. maximize f(x, y) subject to g(x, y) = k Solve the system: f x (x, y) = λg x (x, y) f y (x, y) = λg y (x, y) g(x, y) = k Example 2. Use the method of Lagrange multipliers to solve the following optimization problem: maximize 5x 3y subject to x 2 + y 2 = 36 Here f(x, y) = 5x 3y, g(x, y) = x 2 + y 2, and k = 36. First we find our gradients: f(x, y) = 5, 3 g(x, y) = 2x, 2y
6 CHRIS JOHNSON And so we want to solve the following system of equations: 5 =λ2x 3 =λ2y x 2 + y 2 =36 Solving the first two equations for x and y, we have x = 5 2λ y = 3 2λ. Notice that there s no possible way for λ to equal zero in our problem because of the equation 5 = λ2x; if we did have λ = 0, this equation would give us 5 = 0, which is certainly not true. Plugging these into the constraint x 2 + y 2 = 36, we have 25 4λ + 9 2 4λ = 36 2 = 34 4λ = 36 2 = λ 2 = 34 544 = 6 Thus λ = ± /4. If λ = /4, the first two equations become 5 = x and 3 = y, or 2 2 x = 0 and y = 6. If λ = /4, then we d have x = 0 and y = 6. So there are two solutions to our system of equations: (0, 6) and ( 0, 6). One of these is the maximum, and one is the minimum. To see which is which we have to plug them into our original function. Plugging in (0, 6) gives 50 + 8 = 68; plugging in ( 0, 6) gives 50 8 = 68. Thus (0, 6) is the max, and ( 0, 6) is the min. (The symmetry in the solutions here is not typical of these sorts of problems, but arises from the geometry of our optimization problem. Our constraint is a circle centered at the origin, while the objective function we re trying to optimize is linear. This means the coordinates of the max and min will be opposite points on the circle.) As usual, there s nothing really special about the fact that we re using functions two variables above. We could just as easily use Lagrange multipliers to solve optimization problems with several variables. maximize f(x, x 2,..., x n ) subject to g(x, x 2,..., x n ) = k Solve the system: f(x, x 2,..., x n ) = λ g(x, x 2,..., x n ) g(x, x 2,..., x n ) = k
LECTURE 9 - LAGRANGE MULTIPLIERS 7 Example 3. Find the dimensions of the largest box that can be made with 64 square feet of cardboard. Suppose the dimensions of our box are x, y, and z. Then we want to maximize V (x, y, z) = xyz subject to the constraint that g(x, y, z) = 2xy + 2yz + 2xz = 64. The gradient of our objective function is V (x, y, z) = yz, xz, xy. The gradient of our constrant function is g(x, y, z) = 2y + 2z, 2x + 2z, 2y + 2x. Thus our system of equations is yz =λ(2y + 2z) xz =λ(2x + 2z) xy =λ(2y + 2x) 2xy + 2yz + 2xz =64. To solv this equation notice that if we multiply the first equation by x, the second equation by y, and the third equation by z we have: xyz =λx(2y + 2z) xyz =λy(2x + 2z) xyz =λz(2y + 2x) Each of the right hand sides equals xyz, so the right hand sides are all equal: λx(2y + 2z) = λy(2x + 2z) = λz(2y + 2x). Now notice that for our problem λ can never be zero. If λ was zero, then V (x, y, z) = 0 at the maximum point. However, the only way V (x, y, z) = yz, xz, xy is zero is if at least two of our coordinates are zero. Since we re talking about the volume of a box, our coordinates are all strictly positive and this can t happen. Since λ we can divide out the λ s, and also divide out the 2 s to get xy + xz = xy + yz = yz + xz. For the first equation, xy + xz = xy + yz, subtracting xy from each side gives xz = yz. Now dividing by z (again, z 0 because the coordinates are all positive) gives x = y. Similarly, the second equation, xy + yz = yz + xz gives y = z. So x = y = z.
8 CHRIS JOHNSON Plugging y = x and z = x into our constraint equation gives 2x 2 + 2x 2 + 2x 2 = 64 = 6x 2 = 64 = x 2 = 64 6 = 32 3 32 = x = ± 3 Again, all of our coordinates are positive because of the physical interpretation of our problem, so x = y = z = 32 /3 are the dimensions that maximize the volume of a cube that can be made with 64 square feet of material. Example 4. Write down the system of equations, but do not solve (it s difficult!), for the following optimization problem: maximize + x 4 + y 4 4xy subject to x 2 + y 3 = 2. Here f(x, y) = 4xy x 4 y 4, g(x, y) = x 2 + y 3 and k = 2. Thus our gradient vectors are f(x, y) = 4y 4x 3, 4x 4y 3 g(x, y) = 2x, 3y 2 So the system of equations we wish to solve is 4y 4x 3 =λ2x 4x 4y 3 =λ3y 2 x 2 + y 3 =2