# The paradox of the derivative | Essence of calculus, chapter 2

The goal here is simple: Explain what a derivative is.
Thing is, though, there’s some subtlely to this topic, and some potential for paradoxes
if you’re not careful, so the secondary goal is that you have some appreciation for
what those paradoxes are and how to avoid them.
You see, it’s common for people to say that the derivative measures “instantaneous rate
of change”, but if you think about it, that phrase is actually an oxymoron: Change is
something that happens between separate points in time, and when you blind yourself to all
but a single instant, there is no more room for change.
You’ll see what I mean as we get into it, and when you appreciate that a phrase like
“instantaneous rate of change” is nonsensical, it makes you appreciate how clever the fathers
of calculus were in capturing the idea this phrase is meant to evoke with a perfectly
sensible piece of math: The derivative.
As our central example, imagine a car that starts at some point A, speeds up, then slows
to a stop at some point B 100 meters away, all over the course of 10 seconds.
This is the setup I want you to keep in mind while I lay out what exactly a derivative
is.
We could graph this motion, letting a vertical axis represent the distance traveled, and
a horizontal axis represent time.
At each time t, represented with a point on the horizontal axis, the height of the graph
tells us how far the car has traveled after that amount of time.
It’s common to name a distance function like this s(t).
I’d use the letter d for distance, except that it already has another full time job
in calculus.
Initially this curve is quite shallow, since the car is slow at the start.
During the first second, the distance traveled by the car hardly changes at all.
For the next few seconds, as the car speeds up, the distance traveled in a given second
gets larger, corresponding to a steeper slope in the graph.
And as it slows towards the end, the curve shallows out again.
If we were to plot the car’s velocity in meters per second as a function of time, it
might look like this bump.
At time t=0, the velocity is 0.
Up to the middle of the journey, the car builds up to some maximum velocity, covering a relatively
large distance in each second.
Then it slows back down to a speed of 0 meters per second.
These two curves are highly related to each other; if you change the specific distance
vs. time function, you’ll have some different velocity vs. time function.
We want to understand the specifics of this relationship.
Exactly how does velocity depend on this distance vs. time function.
It’s worth taking a moment to think critically about what velocity actually means here.
Intuitively, we all know what velocity at a given moment means, it’s whatever the
car’s speedometer shows in that moment.
And intuitively, it might make sense that velocity should be higher at times when the
distance function is steeper; when the car traverses more distance per unit time.
But the funny thing is, velocity at a single moment makes no sense.
If I show you a picture of a car, a snapshot in an instant, and ask you how fast it’s
going, you’d have no way of telling me.
What you need are two points in time to compare, perhaps comparing the distance traveled after
4 seconds to the distance traveled after 5 second.
That way, you can take the change in distance over the change in time.
Right?
That’s what velocity is, the distance traveled over a given amount of time.
So how is it that we’re looking at a function for velocity that only takes in a single value
for t, a single snapshot in time.
It’s weird, isn’t it?
We want to associate each individual point in time with a velocity, but computing velocity
requires comparing two points in time.
If that feels strange and paradoxical, good!
You’re grappling with the same conflict that the fathers of calculus did, and if you
want a deep understanding of rates of change, not just for a moving car, but for all sorts
of scenarios in science, you’ll need a resolution to this apparent paradox.
First let’s talk about the real world, then we’ll go into a purely mathematical one.
Think about what an actual car’s speedometer might be doing.
At some point, say 3 seconds into the journey, the speedometer might measure how far the
car goes in a very small amount of time, perhaps the distance traveled between 3 seconds and
3.01 seconds.
Then it would compute the speed in meters per second as that tiny distance, in meters,
divided by that tiny time, 0.01 seconds.
That is, a physical car can sidestep the paradox by not actually computing speed at a single
point in time, and instead computing speed during very small amounts of time.
Let’s call that difference in time “dt”, which you might think of as 0.01 seconds,
and call the resulting difference in distance traveled “ds”.
So the velocity at that point in time is ds over dt, the tiny change in distance over
the tiny change in time.
Graphically, imagine zooming in on the point of the distance vs. time graph above t=3.
That dt is a small step to the right, since time is on the horizontal axis, and that ds
is the resulting change in the height of the graph, since the vertical axis represents
distance traveled.
So ds/dt is the rise-over-run slope between two very close points on the graph.
Of course, there’s nothing special about the value t=3, we could apply this to any
other point in time, so we consider this expression ds/dt to be a function of t, something where
I can give you some time t, and you can give back to me the value of this ratio at that
time; the velocity as a function of time.
So for example, when I had the computer draw this bump curve here representing the velocity
function, the one you can think of as the slope of this distance vs. time function at
each point, here’s what I had computer do: First, I chose some small value for dt, like
0.01.
Then, I had the computer look at many times t between 0 and 10, and compute the distance
function s at (t + dt), minus the value of this function at t.
That is, the difference in the distance traveled between the given time t, and the time 0.01
seconds after that.
Then divide that difference by the change in time dt, and this gives the velocity in
meters per second around each point in time.
With this formula, you can give the computer any curve representing the distance function
s(t), and it can figure out the curve representing the velocity v(t).
So now would be a good time to pause, reflect, make sure this idea of relating distance to
velocity by looking at tiny changes in time dt makes sense, because now we’re going
to tackle the paradox of the derivative head-on.
This idea of ds/dt, a tiny change in the value of the function s divided by a tiny change
in the input t, is almost what the derivative is.
Even though out car’s speedometer will look at an actual change in time like 0.01 seconds
to compute speed, and even though my program here for finding a velocity function given
a position function also uses a concrete value of dt, in pure math, the derivative is not
this ratio ds/dt for any specific choice of dt.
It is whatever value that ratio approaches as the choice for dt approaches 0
Visually, asking what this ratio approaches has really a nice meaning: For any specific
choice of dt, this ratio ds/dt is the slope of a line passing through two points on the
graph, right?
Well, as dt approaches 0, and those two points approach each other, the slope of that line
approaches the slope of a line tangent to the graph at whatever point t we’re looking
at.
So the true, honest to goodness derivative, is not the rise-over-run slope between two
nearby points on the graph; it’s equal to the slope of a line tangent to the graph at
a single point.
Notice what I’m not saying: I’m not saying that the derivative is whatever happens when
dt is infinitely small, nor am I saying that you plug in 0 for dt.
This dt is always a finitely small, nonzero value, it’s just approaching 0 is all.
So even though change in an instant makes no sense, this idea of letting dt approach
0 is a really clever backdoor way to talk reasonably about the rate of change at a single
point in time.
Isn’t that neat?
It’s flirting with the paradox of change in an instant without ever needing to touch
it.
And it comes with such a nice visual intuition as the slope of a tangent line at a single
point on this graph.
Since change in an instant still makes no sense, I think it’s healthiest for you to
think of this slope not as some “instantaneous rate of change”, but as the best constant
approximation for rate of change around a point.
It’s worth saying a few words on notation here.
Throughout this video I’ve been using “dt” to refer to a tiny change in t with some actual
size, and “ds” to refer to the resulting tiny change in s, which again has an actual
size.
This is because that’s how I want you to think about them.
But the convention in calculus is that whenever you’re using the letter “d” like this,
you’re announcing that the intention is to eventually see what happens as dt approaches
0.
For example, the honest-to-goodness derivative of the function s(t) is written as ds/dt,
even though the derivative is not a fraction, per se, but whatever that fraction approaches
for smaller and smaller nudges in t.
A specific example should help here.
You might think that asking about what this ratio approaches for smaller and smaller values
of dt would make it much more difficult to compute, but strangely it actually makes things
easier.
Let’s say a given distance vs. time function was exactly t3.
So after 1 second, the car has traveled 13=1 meters, after 2 seconds, it’s traveled 23=8
meters, and so on.
What I’m about to do might seem somewhat complicated, but once the dust settles it
really is simpler, and it’s the kind of thing you only ever have to do once in calculus.
Let’s say you want the velocity, ds/dt, at a specific time, like t=2.
And for now, think of dt having an actual size; we’ll let it go to 0 in just a bit.
The tiny change in distance between 2 seconds and 2+dt seconds is s(2+dt)-s(2), and we divide
by dt.
Since s(t) = t3, that numerator is (2+dt)3 - 23.
Now this, we can work out algebraically.
And again bear with me, there’s a reason I’m showing you the details.
Expanding the top gives 23 + 3*22dt + 3*2*(dt)2 + (dt)3 - 23.
There are several terms here, and I want you to remember that it looks like a mess, but
it simplifies.
Those 23 terms cancel out.
Everything remaining has a dt, so we can divide that out.
So the ratio ds/dt has boiled down to 3*22 + two different terms that each have a dt
in them.
So as dt approaches 0, representing the idea of looking at smaller and smaller changes
in time, we can ignore those!
By eliminating the need to think of a specific dt, we’ve eliminated much of the complication
in this expression!
So what we’re left with is a nice clean 3*22.
This means the slope of a line tangent to the point at t=2 on the graph of t3 is exactly
3*22, or 12.
Of course, there was nothing special about choosing t=2; more generally we’d say the
derivative of t3, as a function of t, is 3*t2.
That’s beautiful.
This derivative is a crazy complicated idea: We’ve got tiny changes in distance over
tiny changes in time, but instead of looking at any specific tiny change in time we start
talking about what this thing approaches.
I mean, it’s a lot to think about.
Yet we’ve come out with such a simple expression: 3t2.
In practice, you would not go through all that algebra each time.
Knowing that the derivative t3 is 3t2 is one of those things all calculus students learn
to do immediately without rederiving each time.
And in the next video, I’ll show ways to think about this and many other derivative
formulas in nice geometric ways.
The point I want to make by showing you the guts here is that when you consider the change
in distance of a change in time for any specific value of dt, you’d have a whole mess of
algebra riding along.
But by considering what this ratio approaches as dt approaches 0, it lets you ignore much
of that mess, and actually simplifies the problem.
Another reason I wanted to show you a concrete derivative like this is that it gives a good
example for the kind of paradox that come about when you believe in the illusion of
an instantaneous rate of change.
Think about this car traveling according to this t3 distance function, and consider its
motion at moment t=0.
Now ask yourself whether or not the car is moving at that time.
On the one hand, we can compute its speed at that point using the derivative of this
function, 3t2, which is 0 at time t=0.
Visually, this means the tangent line to the graph at that point is perfectly flat, so
the car’s quote unquote “instantaneous velocity” is 0, which suggests it’s not
moving.
But on the other hand, if it doesn’t start moving at time 0, when does it start moving?
Really, pause and ponder this for a moment, is that car moving at t=0?
Do you see the paradox?
The issue is that the question makes no sense, it references the idea of of change in a moment,
which doesn’t exist.
And that’s just not what the derivative measures.
What it means for the derivative of the distance function to be 0 at this point is that the
best constant approximation for the car’s velocity around that point is 0 meters per
second.
For example, between t=0 and t=0.1 seconds, the car does move... it moves 0.001 meters.
That’s very small, and importantly it’s very small compared to the change in time,
an average speed of only 0.01 meters per second.
What it means for the derivative of this motion to be 0 is that for smaller and smaller nudges
in time, this ratio of change in distance over change in time approaches 0, though in
this case it never actually hits it.
But that’s not to say the car is static.
Approximating its movement with a constant velocity of 0, after all, just an approximation.
So if you ever hear someone refer to the derivative as an “instantaneous rate of change”,
a phrase which is intrinsically oxymoronic, think of it as a conceptual shorthand for
“the best constant approximation for the rate of change”
In the following videos I’ll talk more about the derivative; what does it look like in
different contexts, how do you actually compute it, what’s it useful for, things like that,
focussing on visual intuition as always.
As I mentioned last video, this channel is largely supported by the community through
Patreon, where you can get early access to future series like this as I work on them.
One other supporter of the series, who I’m incredibly proud to feature here, is the Art
of Problem Solving.
Interestingly enough, I was first introduced to the Art of Problem Solving by my high school
calculus teacher.
It was the kind of relationship where I’d frequently stick around a bit after school
to just chat with him about math.
He was thoughtful and encouraging, and he once gave me a book that really had an influence
on me back then.
It showed a beauty in math that you don’t see in school.
The name of that book?
The Art of Problem Solving.
Fast-forward to today, where the Art of Problems Solving website offers many many phenomenal
resources for curious students looking to get into math, most notably their full courses.
This ranges from their newest inspiring offering to get very young students engaged with genuine
problem solving, called Beast Academy, up to higher level offerings that cover the kind
of topics that all math curious students should engage with, like combinatorics, but which
very few school include in their curriculum.
Put simply, they’re one of the best math education companies I know, and I’m proud
to have them support this series.
You can see what they have to offer by following the link in the screen, also copied in the
video description.