- [Voiceover] So in the last video, I introduced this
multi-variable chain rule and here I want to
explain a loose intuition for why it's true,
why you would expect something like this to happen.
So the way you think about an expression like this,
you have this multi-variable function
f of xy and you're plugging things in,
but just that function itself, you'll be
thinking of taking a two dimensional space
you know here's our xy plane,
and then mapping it to, you know, just a real number line
and I'll think of this as f, as the output.
So somehow our whole function takes things from
this two dimensional space and plugs it onto this output.
T you're thinking of just another
number line up here, so t, and then
you've got separate functions here,
you know x of t and y of t.
X of t and y of t.
Each of which take that same value for
a specific input, you know it's not that
they're acting on different inputs,
x of some other input t and y of some other input,
it's the same one and then they move that
somewhere to this output space
which itself get's moved over.
And in this way you're thinking of it
as just a single variable function
that goes from t and ultimately outputs f
it's just that there's multi-dimensional stuff
happening in between and now if we start
thinking about the derivative of it -
what does that mean, what does that mean for the
conception of the picture that we have going on here?
Well, that bottom part, that dt
you're thinking of as a tiny change to t, right?
So you're thinking of it as kind of a nudge,
I'll draw it as a sizable line here
for like moving from some original input over,
but you might in principal think of it as a
very, very tiny nudge in t.
And over here you'd say well, that's gonna move
your intermediary output in the xy plane
to, you know maybe it'll move it in some amount,
again imagine this is a very small nudge,
I'm going to give it some size here
just so I can write into it and
then whatever that nudge in the output space
right, it's a nudge in some direction
that's going to correspond to some change in f.
Some change based on the differential properties
of the multi-variable function itself.
And if we think about this, this change
you might break it into components
and say this shift here has some kind of dx,
some kind of shift in the x direction
and some kind of dy, some shift in the y direction.
But you can actually reason about what these should be
coz it's not just an arbitrary change in x
or an arbitrary change in y,
it's the one that was caused by dt.
So if I go over here, I might say that dx
is caused by that dt and the whole meaning
of the derivative, the whole meaning
of the single variable derivative
would be that when we take dx dt,
this is the factor that tells us, you know,
a tiny nudge in t, how much does that change
the x component and if you want you could
think of this as kind of cancelling out the dts
and you're just left with x, but really you're saying
there's a tiny nudge in t and that results in a
change in x and this derivative is what
tells you the ratio between those sizes.
And similarly, that change in y here,
that change in y is gonna be somehow
proportional to the change in t
and that proportion is given by the
derivative of y with respect to t
that's the whole point of the derivative,
no no, with respect to t and again
you can kind of think of it as if
you're cancelling out the ts and
this is why the fractional writing,
this Leibniz notation is actually pretty helpful.
You know, people will say, oh mathematicians would
like, share their heads at the idea of
treating these like fractions, but not only is it
a useful thing to do coz it is a
helpful mnemonic, it's reflective of what you're
gonna do when you make a very formal argument.
And I think I'll do that in one of the following videos,
I'll describe this in a very, a much more formal way
that's a little bit more airtight than the
kind of hand-waving nudging around.
But the intuition you get from just writing
this is a fraction is basically the scaffolding
for that formal argument, so it's a
fine thing to do, I don't think mathematicians
are shaking their heads every time that a
student or a teacher does this.
But anyway, so this is kind of gives you
what that dx is, what that dy is
and then over here if you're saying
how much does that change the ultimate output of the f?
You could say, well, your nudge of size dx over here,
you're wondering how much that changes the output of f,
that's the meaning of the partial derivative, right.
If we say we have the partial derivative
with respect to x, what that means,
is that if you take a tiny nudge of size x
this is giving you the ratio between that
and the ultimate change to the output that you want.
You could think of it like this partial x
is cancelling out with that dx if you wanted
or you could just say, this is a tiny nudge in x,
this is going to result in some change in f -
I'm not sure what - but the meaning of
the derivative is the ratio between those two
and that's what lets you figure it out.
And similarly, you might call this the change in f
caused by x, like, due to x.
Due to, I should say to dx.
But that's not the only thing changing the value of f right?
That's not the only change happening
in the input space, you also have another change in f
and this one I might say is due to dy.
Due to that tiny shift in y and what that's gonna be
we know it's going to be proportional to that
tiny shift in y and the proportionality constant -
this is the meaning of the partial derivative,
that when you nudge y in some way it
results in some kind of nudge in f and the ratio
between those two is what the derivative gives.
So ultimately, if you put this all together
what you'd say is there's two different things
causing an ultimate change to f.
So if you put these together, and you
want to know what the total change in f is -
so I might go over here and say
the total change in f, one of them is caused
by partial f, partial x - and I can multiply it
by dx here, but really, we know that dx,
the change there was in turn caused by dt
so that in turn is caused by the change
in the x component that was due to dt.
That was of course of size dt.
And then for similar reasons, the other way
that this changes in the y direction
is a partial of f with respect to y
but what caused that initial shift in y,
you'd say that was a shift in y that was due to t,
and that size is dy dt times dt, you could think of it.
So slight nudge in t causes a change in y,
that change in y causes the change in f
and when you add those two together that's
everything that's going on, that's everything
that influences the ultimate change in f.
So then if you take this whole expression
and you divide everything out by dt
so you know, kind of erase it from this side
and put it over here, dt,
this is your multi-variable chain rule,
and of course I've just written the same thing again
but hopefully this gives a little bit on intuition
for how you're composing different nudges
and why you wanna think about it that way.
Of course, you can see this, and you see
the partial f kind of cancels out with that dx
and this partial y kind of cancels out with that dy
and you're left with the two different things
that constitute a change in x,
you know this one is only partially the change in f,
this is also partially the change in f,
but together they give the ultimate change in f
and I think that gives a very strong reason,
if you break it down like that, why this should be true.