# Lagrange multipliers, using tangency to solve constrained optimization

- [Instructor] In the last video I introduced
a constrained optimization problem
where we were trying to maximize this function,
f of x, y equals x squared times y,
but subject to a constraint that your values of x and y
have to satisfy x squared plus y squared equals one.
And the way we were visualizing this
was to look at the x, y plane where this circle here
represents our constraint.
All of the points that make up this set,
x squared plus y squared equals one,
and then this curvy line here is one of the contours of f,
meaning, we're setting f of x, y equal to some constant.
And then I was varying around that constant c.
So for high values of c, the contour would look
something like this, this is where the value
of x squared times y is big.
And then for small values of c,
the contours would look like this.
So all the points on this line would be f of x, y
equals like 0.01 in this case, something like that.
Then the way to think about maximizing this function
is to try to increase that value of c
as much as you can without it falling off the circle.
And the key observation is
that happens when they're tangent.
So, you know, you might kind of draw this out
in a little sketch and say there's some curve
representing your constraint, which in this case
would be, you know, where our circle is.
And then the curve representing the contour
would just kiss that curve,
just barely touch it in some way.
Now, that's pretty, but in terms of solving the problem,
we still have some work to do.
And the main tool we're gonna use here is the gradient.
So let me go ahead and draw a lot more contour lines
than there already are for x squared times y.
So this is many of the contour lines,
between the gradient and contour lines.
And the upshot of it is that these gradient vectors,
every time they pass through a contour line,
they're perpendicular to it.
And the basic reason for that
is if you walk along the contour line,
the function isn't changing value,
so if you want it to change most rapidly,
you know, it kind of makes sense you should walk
in the perpendicular direction,
so that no component of the walk that you're taking
is, you know, useless, is along the line
where the function doesn't change.
But again, there's a whole video on that
that's worthy checking out if this feels unfamiliar.
For our purposes, what it means is that
when we're considering this point of tangency,
the gradient of f at that point is gonna be some vector
perpendicular to both the curves at that point.
So that little vector represents the gradient of f
at this point on the plane.
And we can do something very similar
to understand the other curve.
Right now I've just written it as a constraint,
x squared plus y squared equals one.
But you know, to give that function a name,
let's say we've defined g of x, y to be
x squared plus y squared,
x squared plus y squared.
In that case, this constraint is pretty much
just one of the contour lines for the function g,
and we can take a look at that.
If we go over here and we look at
all of the other contour lines for this function g,
and it should make sense that they're circles,
because this function is x squared plus y squared.
And if we took a look at the gradient of g,
it has that same property,
if it passes through a contour line,
is perpendicular to it.
So over on our drawing here,
the gradient vector of g would also be perpendicular
to both these curves.
And you know, maybe in this case,
it's not as long as the gradient of f, or maybe it's longer.
There's no reason that it would be the same length,
but the important fact is that it's proportional.
And the way that we're gonna write this in formulas
is to say that the gradient of f evaluated,
let's see, evaluated at whatever the maximizing value
of x and y are, so we should give that a name probably.
Maybe x sub m, y sub m, the specific values
of x and y that are gonna be at this point
that maximizes the function subject to our constraint.
So that's gonna be related to the gradient of g,
it's not gonna be quite equal,
so I'll leave some room here.
Related to the gradient of g,
evaluated at that same point, xm, ym.
And like I said, they're not equal, they're proportional.
So we need to have some kind
of proportionality constant in there.
You almost always use the variable lambda,
and this guy has a fancy name,
it's called a Lagrange multiplier.
Lagrange, Lagrange was one
of those famous French mathematicians.
I always get him confused with some of the other
French mathematicians at the time like Legendre or Laplace,
there's a whole bunch of things.
Let's see, multiplier, distracting myself talking here.
So Lagrange multiplier.
So there's a number of things in multivariable calculus
named after Lagrange, and this is one of the big ones.
This is a technique that he kind of developed
or at the very least popularized.
And the core idea is to just set
these gradients equal to each other,
'cause that represents when the contour line
for one function is tangent to the contour line of another.
So this, this is something that we can actually work with.
And let's start working it out, right,
let's see what this translates to in formulas.
So I already have g written here,
so let's go ahead and just evaluate
what the gradient of g should be.
And that's the gradient of x squared plus y squared.
And the way that we take our gradient is
it's gonna be a vector
whose components are all partial derivatives.
So the first component is the partial derivative
with respect to x.
So we treat x as a variable, y looks like a constant.
The derivative is two x.
The second component the partial derivative
with respect to y, so now we're treating y as the variable,
x is the constant, so the derivative looks like two y.
So that's the gradient of g.
It's gonna look like gradient of,
let's see, what is x?
What is f?
It's x squared times y.
So x squared times y.
We do the same thing.
First component partial derivative with respect to x,
x looks like a variable, so it's derivative is two times x
and then that y looks like a constant when we're up here.
But then partial derivative with respect to y,
that y looks like a variable, that x squared just looks
like a constant sitting in front of it.
So that's what we get.
And now if we kind of work out
this Lagrange mulitplier expression
using these two vectors, what we have written,
what we're gonna have written is that this vector,
two xy x squared
is proportional with the proportionality constant lambda to
So two x two y.
as two separate equations.
I mean right now it's one equation with vectors,
but really what this is saying is
you've got two separate equations,
two times xy is equal to lambda,
ah, gotta change colors a lot here.
Lambda times two x.
Hm, gonna be stickler for color.
Keep red all of the things associated with g.
And then,
this second equation is that x squared
is equal to lambda times two y.
And this might seem like a problem,
because we have three unknowns, x, y,
and this new lambda that we introduced.
Kind of shot ourselves in the foot
by giving ourselves a new variable to deal with.
But we only have two equations.
So in order to solve this, we're gonna need three equations.
And the third equation is something
that we've known the whole time.
It's been part of the original problem.
It's the constraint itself,
x squared plus y squared equals one.
So that, that third equation,
x squared plus y squared is equal to one.
So these are the three equations
that characterize our constrained optimization problem.
The bottom one just tells you
that we have to be on this unit circle here.
Allow me to just highlight it.
We have to be on this unit circle.
And then these top two tell us what's necessary
in order for our contour lines,
the contour of f and the contour of g
to be perfectly tangent with each other.
So, in the next video, I'll go ahead and solve this.
At this point, it's pretty much just algebra to deal with,
but it's worthy going through.
And then in the next couple ones,
I'll talk about a way that you can encapsulate
all three of these equations into one expression,
and also a little bit about the interpretation
of this lambda that we introduced.
'Cause it's not actually just a dummy variable,
it has a pretty interesting meaning in physical contexts
once you're actually dealing
with a constrained optimization problem in practice.
So I'll see you in the next video.