Visualizing the chain rule and product rule | Essence of calculus, chapter 4

In the last videos I talked about the derivatives of simple functions, things like powers of
x, sin(x), and exponentials, the goal being to have a clear picture or intuition to hold
in your mind that explains where these formulas come from.
Most functions you use to model the world involve mixing, combining and tweaking these
these simple functions in some way; so our goal now is to understand how to take derivatives
of more complicated combinations; where again, I want you to have a clear picture in mind
for each rule.
This really boils down into three basic ways to combine functions together: Adding them,
multiplying them, and putting one inside the other; also known as composing them.
Sure, you could say subtracting them, but that’s really just multiplying the second
by -1, then adding.
Likewise, dividing functions is really just the same as plugging one into the function
1/x, then multiplying.
Most functions you come across just involve layering on these three types of combinations,
with no bound on how monstrous things can become.
But as long as you know how derivatives play with those three types of combinations, you
can always just take it step by step and peal through the layers.
So, the question is, if you know the derivatives of two functions, what is the derivative of
their sum, of their product, and of the function compositions between them?
The sum rule is the easiest, if somewhat tounge-twisting to say out loud: The derivative of a sum of
two functions is the sum of their derivatives.
But it’s worth warming up with this example by really thinking through what it means to
take a derivative of a sum of two functions, since the derivative patterns for products
and function composition won’t be so straight forward, and will require this kind of deeper
thinking.
The function f(x) = sin(x) + x2 is a function where, for every input, you add together the
values of sin(x) and x2 at that point.
For example, at x = 0.5, the height of the sine graph is given by this bar, the height
of the x2 parabola is given by this bar, and their sum is the length you get by stacking
them together.
For the derivative, you ask what happens as you nudge the input slightly, maybe increasing
it to 0.5+dx.
The difference in the value of f between these two values is what we call df.
Well, pictured like this, I think you’ll agree that the total change in height is whatever
the change to the sine graph is, what we might call d(sin(x)), plus whatever the change to
x2 is, d(x2).
We know the derivative of sine is cosine, and what that means is that this little change
d(sin(x)) would be about cos(x)dx.
It’s proportional to the size of dx, with a proportionality constant equal to cosine
of whatever input we started at.
Similarly, because the derivative of x2 is 2x, the change in the height of the x2 graph
So, df/dx, the ratio of the tiny change to the sum function to the tiny change in x that
caused it, is indeed cos(x)+2x, the sum of the derivatives of its parts.
But like I said, things are a bit different for products.
Let’s think through why, in terms of tiny nudges.
In this case, I don’t think graphs are our best bet for visualizing things.
Pretty commonly in math, all levels of math really, if you’re dealing with a product
of two things, it helps to try to understand it as some form of area.
In this case, you might try to configure some mental setup of a box whose side-lengths are
sin(x) and x2.
What would that mean?
Well, since these are functions, you might think of these sides as adjustable; dependent
on the value of x, which you might think of as a number that you can freely adjust.
So, just getting the feel for this, focus on that top side, whose changes as the function
sin(x).
As you change the value of x up from 0, it increases in up to a length of 1 as sin(x)
moves towards its peak.
After that, it starts decreasing as sin(x) comes down from 1.
And likewise, that height changes as x2.
So f(x), defined as this product, is the area of this box.
For the derivative, think about how a tiny change to x by dx influences this area; that
resulting change in area is df.
That nudge to x causes the width to change by some small d(sin(x)), and the height to
change by some d(x2).
This gives us three little snippets of new area: A thin rectangle on the bottom, whose
area is its width, sin(x), times its thin height, d(x2); there’s a thin rectangle
on the right, whose area is its height, x2, times its thin width, d(sin(x)).
And there’s also bit in the corner.
But we can ignore it, since its area will ultimately be proportional to dx2, which becomes
negligible as dx goes to 0.
This is very similar to what I showed last video, with the x2 diagram.
Just like then, keep in mind that I’m using somewhat beefy changes to draw things, so
we can see them, but in principle think of dx as very very small, meaning d(x2) and d(sin(x))
are also very very small.
Applying what we know about the derivative of sine and x2, that tiny change d(x2) is
2x*dx, and that tiny change d(sin(x)) is cos(x)dx.
Dividing out by that dx, the derivative df/dx is sin(x) by the derivative of x2, plus x2
by the derivative of sine.
This line of reasoning works for any two functions.
A common mnemonic for the product rule is to say in your head “left d right, right
d left”.
In this example, sin(x)*x2, “left d right” means you take the left function, in this
case sin(x), times the derivative of the right, x2, which gives 2x.
Then you add “right d left”: the right function, x2, times the derivative of the
left, cos(x).
Out of context, this feels like kind of a strange rule, but when you think of this adjustable
box you can actually see how those terms represent slivers of area.
“Left d right” is the area of this bottom rectangle, and “right d left” is the area
of this rectangle on the right.
By the way, I should mention that if you multiply by a constant, say 2*sin(x), things end up
much simpler.
The derivative is just that same constant times the derivative of the function, in this
case 2*cos(x).
I’ll leave it to you to pause and ponder to verify that this makes sense.
Aside from addition and multiplication, the other common way to combine functions that
comes up all the time is function composition.
For example, let’s say we take the function x2, and shove it on inside sin(x) to get a
new function, sin(x2).
What’s the derivative of this new function?
Here I’ll choose yet another way to visualize things, just to emphasize that in creative
math, we have lots of options.
I’ll put up three number lines.
The top one will hold the value of x, the second one will represent the value of x2,
and that third line will hold the value of sin(x2).
That is, the function x2 gets you from line 1 to line 2, and the function sine gets you
from line 2 to line 3.
As I shift that value of x, maybe up to the value 3, then value on the second shifts to
whatever x2 is, in this case 9.
And that bottom value, being the sin(x2), will go over to whatever the sin(9) is.
So for the derivative, let’s again think of nudging that x-value by some little dx,
and I always think it’s helpful to think of x starting as some actual number, maybe
1.5.
The resulting nudge to this second value, the change to x2 caused by such a dx, is what
we might call d(x2).
You can expand this as 2x*dx, which for our specific input that length would be 2*(1.5)*dx,
but it helps to keep it written as d(x2) for now.
In fact let me go one step further and give a new name to x2, maybe h, so this nudge d(x2)
is just dh.
Now think of that third value, which is pegged at sin(h).
It’s change d(sin(h)); the tiny change caused by the nudge dh.
By the way, the fact that it’s moving left while the dh bump is to the right just means
that this change d(sin(h)) is some negative number.
Because we know the derivative of sine, we can expand d(sin(h)) as cos(h)*dh; that’s
what it means for the derivative of sine to be cosine.
Unfolding things, replacing h with x2 again, that bottom nudge is cos(x2)d(x2).
And we could unfold further, noting that d(x2) is 2x*dx.
And it’s always good to remind yourself of what this all actually means.
In this case where we started at x = 1.5 up top, this means that the size of that nudge
on the third line is about cos(1.52)*2(1.5)*(the size of dx); proportional to the size of dx,
where the derivative here gives us that proportionality constant.
Notice what we have here, we have the derivative of the outside function, still taking in the
unaltered inside function, and we multiply it by the derivative of the inside function.
Again, there’s nothing special about sin(x) and x2.
If you have two functions g(x) and h(x), the derivative of their composition function g(h(x))
is the derivative of g, evaluated at h(x), times the derivative of h.
This is what we call the “chain rule”.
Notice, for the derivative of g, I’m writing it as dg/dh instead of dg/dx.
On the symbolic level, this serves as a reminder that you still plug in the inner function
to this derivative.
But it’s also an important reflection of what this derivative of the outer function
actually represents.
Remember, in our three-lines setup, when we took the derivative of sine on the bottom,
we expanded the size of the nudge d(sin) as cos(h)*dh.
This was because we didn’t immediately know how the size of that bottom nudge depended
on x, that’s kind of the whole thing we’re trying to figure out, but we could take the
derivative with respect to the intermediate variable h.
That is, figure out how to express the size of that nudge as multiple of dh.
Then it unfolded by figuring out what dh was.
So in this chain rule expression, we’re saying look at the ratio between a tiny change
in g, the final output, and a tiny change in h that caused it, h being the value that
we’re plugging into g.
Then multiply that by the tiny change in h divided by the tiny change in x that caused
it.
The dh’s cancel to give the ratio between a tiny change in the final output, and the
tiny change to the input that, through a certain chain of events, brought it about.
That cancellation of dh is more than just a notational trick, it’s a genuine reflection
of the tiny nudges that underpin calculus.
So those are the three basic tools in your belt to handle derivatives of functions that
combine many smaller things: The sum rule, the product rule and the chain rule.
I should say, there’s a big difference between knowing what the chain rule and product rules
are, and being fluent with applying them in even the most hairy of situations.
I said this at the start of the series, but it’s worth repeating: Watching videos, any
videos, about these mechanics of calculus will never substitute for practicing them
yourself, and building the muscles to do these computations yourself.
I wish I could offer to do that for you, but I’m afraid the ball is in your court, my
friend, to seek out practice.
What I can offer, and what I hope I have offered, is to show you where these rules come from,
to show that they’re not just something to be memorized and hammered away; but instead
are natural patterns that you too could have discovered by just patiently thinking through
what a derivative means.
Thank you to everyone who supported this series, and once more I’d like to say a special
thanks to Brilliant.org.
For those of you who want to go flex those problem solving muscles, Brilliant offers
a platform aimed at training you to think like a mathematician.
I don’t know about you, but I’ve always found it all too easy to fall into the habit
of just reading math or watching lectures without taking the time to do some real problem-solving
in between, even though that’s always the part where I learn the most.
Brilliant is a great place to get that practice, and if you visit brilliant.org/3b1b, or more
simply follow the link on the screen and in the description, it lets them know you came
from this channel.
Their calculus material is a nice complement to this series, but some of my other favorites
are their probability and complex algebra sequences.