Cookies   I display ads to cover the expenses. See the privacy policy for more information. You can keep or reject the ads.

Video thumbnail
Here, we look at the math behind an animation like this, what’s known as a “complex
Fourier series”. Each little vector is rotating at some constant integer frequency, and when
you add them all together, tip to tail, they draw out some shape over time. By tweaking
the initial size and angle of each vector, we can make it draw anything we want, and
here you’ll see how.
Before diving in, take a moment to linger on just how striking this is. This particular
animation has 300 rotating arrows in total. Go full screen for this is you can, the intricacy
is worth it. Think about this, the action of each individual arrow is perhaps the simplest
thing you could imagine: Rotation at a steady rate. Yet the collection of all added together
is anything but simple. The mind-boggling complexity is put into even sharper focus
the farther we zoom in, revealing the contributions of the littlest, quickest arrows.
Considering the chaotic frenzy you’re looking at, and the clockwork rigidity of the underlying
motions, it’s bizarre how the swarm acts with a kind of coordination to trace out some
very specific shape. Unlike much of the emergent complexity you find elsewhere in nature, though,
this is something we have the math to describe and to control completely. Just by tuning
the starting conditions, nothing more, you can make this swarm conspire in all the right
ways to draw anything you want, provided you have enough little arrows. What’s even crazier,
as you’ll see, is the ultimate formula for all this is incredibly short.
Often, Fourier series are described in terms of functions of real numbers being broken
down as a sum of sine waves. That turns out to be a special case of this more general
rotating vector phenomenon that we’ll build up to, but it’s where Fourier himself started,
and there’s good reason for us to start the story there as well.
Technically, this is the third video in a sequence about the heat equation, what Fourier
was working on when he developed his big idea. I’d like to teach you about Fourier series
in a way that doesn’t depend on you coming from those chapters, but if you have at least
a high-level idea of the problem form physics which originally motivated this piece of math,
it gives some indication for how unexpectedly far-reaching Fourier series are.
All you need to know is that we had this equation, describing how the temperature on a rod will
evolve over time (which incidentally also describes many other phenomena unrelated to
heat), and while it’s hard to directly use it to figure out what will happen to an arbitrary
heat distribution, there’s a simple solution if that initial function looks like a cosine
wave with a frequency tuned to make it flat at each endpoint. Specifically, as you graph
what happens over time, these waves simply get scaled down exponentially, with higher
frequency waves decaying faster.
The heat equation happens to be what’s known in the business as a “linear” equation,
meaning if you know two solutions and you add them up, that sum is also a new solution.
You can even scale them each by some constant, which gives you some dials to turn to construct
a custom function solving the equation.
This is a fairly straightforward property that you can verify for yourself, but it’s
incredibly important. It means we can take our infinite family of solutions, these exponentially
decaying cosine waves, scale a few of them by some custom constants of our choosing,
and combine them to get a solution for a new tailor-made initial condition which is some
combination of cosine waves.
Something important I want you to notice about combining the waves like this is that because
higher frequency ones decay faster, this sum which you construct will smooth out over time
as the high-frequency terms quickly go to zero, leaving only the low-frequency terms
dominating. So in some sense, all the complexity in the evolution that the heat equation implies
is captured by this difference in decay rates for the different frequency components.
It’s at this point that Fourier gains immortality. I think most normal people at this stage would
say “well, I can solve the heat equation when the initial temperature distribution
happens to look like a wave, or a sum of waves, but what a shame that most real-world distributions
don’t at all look like this!”
For example, let’s say you brought together two rods, each at some uniform temperature,
and you wanted to know what happens immediately after they come into contact. To make the
numbers simple, let’s say the temperature of the left rod is 1 degree, and the right
rod is -1 degree, and that the total length L of the combined rod is 1. Our initial temperature
distribution is a step function, which is so obviously different from sine waves and
sums of sine waves, don’t you think? I mean, it’s almost entirely flat, not wavy, and
for god’s sake, it’s even discontinuous!
And yet, Fourier thought to ask a question which seems absurd: How do you express this
as a sum of sine waves? Even more boldly, how do you express any initial temperature
distribution as a sum of sine waves?
And it’s more constrained than just that! You have to restrict yourself to adding waves
which satisfy a certain boundary condition, which as we saw last video means working only
with these cosine functions whose frequencies are all some whole number multiple of a given
base frequency.
(And by the way, if you were working with a different boundary condition, say that the
endpoints must stay fixed, you’d have a different set of waves at your disposal to
piece together, in this case simply replacing the cosine functions with sines)
It’s strange how often progress in math looks like asking a new question, rather than
simply answering an old one.
Fourier really does have a kind of immortality, with his name essentially synonymous with
the idea of breaking down functions and patterns as combinations of simple oscillations. It’s
really hard to overstate just how important and far-reaching that idea turned out to be,
well beyond anything Fourier could have imagined. And yet, the origin of all this is in a piece
of physics which upon first glance has nothing to do with frequencies and oscillations. If
nothing else this should give a hint and how generally applicable Fourier series are.
“Now hang on,” I hear some of you saying, “none of these sums of sine waves being
shown are actually the step function.” It’s true, any finite sum of sine waves will never
be perfectly flat (except for a constant function), nor discontinuous. But Fourier thought more
broadly, considering infinite sums. In the case of our step function, it turns out to
be equal to this infinite sum, where the coefficients are 1, -⅓, +⅕, -1/7 and so on for all
the odd frequencies, all rescaled by 4/pi. I’ll explain where these numbers come from
in a moment.
Before that, I want to be clear about what we mean with a phrase like “infinite sum”,
which runs the risk of being a little vague. Consider the simpler context of numbers, where
you could say, for example, this infinite sum of fractions equals pi / 4. As you keep
adding terms one-by-one, at all times what you have is rational; it never actually equals
the irrational pi / 4. But this sequence of partial sums approaches pi / 4. That is to
say, the numbers you see, while never equal to pi / 4, get arbitrarily close to that value,
and stay arbitrarily close to that value. That’s a mouthful, so instead we abbreviate
and say the infinite sum “equals” pi / 4.
With functions, you’re doing the same thing but with many different values in parallel.
Consider a specific input, and the value of all these scaled cosine functions for that
input. If that input is less than 0.5, as you add more and more terms, the sum will
approach 1. If that input is greater than 0.5, as you add more and more terms it would
approach -1. At the input 0.5 itself, all the cosines are 0, so the limit of the partial
sums is 0. Somewhat awkwardly, then, for this infinite sum to be strictly true, we do have
to prescribe the value of the step function at the point of discontinuity to be 0.
Analogous to an infinite sum of rational number being irrational, the infinite sum of wavy
continuous functions can equal a discontinuous flat function. Limits allow for qualitative
changes which finite sums alone never could.
There are multiple technical nuances I’m sweeping under the rug here. Does the fact
that we’re forced into a certain value for the step function at its point of discontinuity
make any difference for the heat flow problem? For that matter what does it really mean to
solve a PDE with a discontinuous initial condition? Can we be sure the limit of solutions to the
heat equation is also a solution? Do all functions have a Fourier series like this? These are
exactly the kind of question real analysis is built to answer, but it falls a bit deeper
in the weeds than I think we should go here, so I’ll relegate that links in the video’s
The upshot is that when you take the heat equation solutions associated with these cosine
waves and add them all up, all infinitely many of them, you do get an exact solution
describing how the step function will evolve over time.
The key challenge, of course, is to find these coefficients? So far, we’ve been thinking
about functions with real number outputs, but for the computations I’d like to show
you something more general than what Fourier originally did, applying to functions whose
output can be any complex number, which is where those rotating vectors from the opening
come back into play.
Why the added complexity? Aside from being more general, in my view the computations
become cleaner and it’s easier to see why they work. More importantly, it sets a good
foundation for ideas that will come up again later in the series, like the Laplace transform
and the importance of exponential functions. The relation between cosine decomposition
and rotating vector decomposition We’ll still think of functions whose input
is some real number on a finite interval, say the one from 0 to 1 for simplicity. But
whereas something like a temperature function will have an output confined to the real number
line, we’ll broaden our view to outputs anywhere in the two-dimensional complex plane.
You might think of such a function as a drawing, with a pencil tip tracing along different
points in the complex plane as the input ranges from 0 to 1. Instead of sine waves being the
fundamental building block, as you saw at the start, we’ll focus on breaking these
functions down as a sum of little vectors, all rotating at some constant integer frequency.
Functions with real number outputs are essentially really boring drawings; a 1-dimensional pencil
sketch. You might not be used to thinking of them like this, since usually we visualize
such a function with a graph, but right now the path being drawn is only in the output
When we do the decomposition into rotating vectors for these boring 1d drawings, what
will happen is that all the vectors with frequency 1 and -1 will have the same length, and they’ll
be horizontal reflections of each other. When you just look at the sum of these two as they
rotate, that sum stays fixed on the real number line, and oscillates like a sine wave. This
might be a weird way to think about a sine wave, since we’re used to looking at its
graph rather than the output alone wandering on the real number line. But in the broader
context of functions with complex number outputs, this is what sine waves look like. Similarly,
the pair of rotating vectors with frequency 2, -2 will add another sine wave component,
and so on, with the sine waves we were looking at earlier now corresponding to pairs of vectors
rotating in opposite directions.
So the context Fourier originally studied, breaking down real-valued functions into sine
wave components, is a special case of the more general idea with 2d-drawings and rotating
At this point, maybe you don’t trust me that widening our view to complex functions
makes things easier to understand, but bear with me. It really is worth the added effort
to see the fuller picture, and I think you’ll be pleased by how clean the actual computation
is in this broader context.
You may also wonder why, if we’re going to bump things up to 2-dimensions, we don’t
we just talk about 2d vectors; What’s the square root of -1 got to do with anything?
Well, the heart and soul of Fourier series is the complex exponential, e^\{i * t\}. As
the value of t ticks forward with time, this value walks around the unit circle at a rate
of 1 unit per second.
In the next video, you’ll see a quick intuition for why exponentiating imaginary numbers walks
in circles like this from the perspective of differential equations, and beyond that,
as the series progresses I hope to give you some sense for why complex exponentials are
You see, in theory, you could describe all of this Fourier series stuff purely in terms
of vectors and never breathe a word of i. The formulas would become more convoluted,
but beyond that, leaving out the function e^x would somehow no longer authentically
reflect why this idea turns out to be so useful for solving differential equations. For right
now you can think of this e^\{i t\} as a notational shorthand to describe a rotating vector, but
just keep in the back of your mind that it’s more significant than a mere shorthand.
I’ll be loose with language and use the words “vector” and “complex number”
somewhat interchangeably, in large part because thinking of complex numbers as little arrows
makes the idea of adding many together clearer.
Alright, armed with the function e^\{i*t\}, let’s write down a formula for each of these
rotating vectors we’re working with. For now, think of each of them as starting pointed
one unit to right, at the number 1.
The easiest vector to describe is the constant one, which just stays at the number 1, never
moving. Or, if you prefer, it’s “rotating” at a frequency of 0. Then there will be a
vector rotating 1 cycle every second which we write as e^\{2pi * i * t\}. The 2pi is there
because as t goes from 0 to 1, it needs to cover a distance of 2pi along the circle.
In what’s being shown, it’s actually 1 cycle every 10 seconds so that things aren’t
too dizzying, but just think of it as slowed down by a factor of 10.
We also have a vector rotating at 1 cycle per second in the other direction, e^\{negative
2pi * i * t\}. Similarly, the one going 2 rotations per second is e^\{2 * 2pi * i * t\}, where that
2 * 2pi in the exponent describes how much distance is covered in 1 second. And we go
on like this over all integers, both positive and negative, with a general formula of e^\{n
* 2pi * i * t\} for each rotating vector.
Notice, this makes it more consistent to write the constant vector is written as e^\{0 * 2pi
* i * t\}, which feels like an awfully complicated to write the number 1, but at least then it
fits the pattern.
The control we have, the set of knobs and dials we get to turn, is the initial size
and direction of each of these numbers. The way we control that is by multiplying each
one by some complex number, which I’ll call c_n.
For example, if we wanted that constant vector not to be at the number 1, but to have a length
of 0.5, we’d scale it by 0.5. If we wanted the vector rotating at one cycle per second
to start off at an angle of 45o, we’d multiply it by a complex number which has the effect
of rotating it by that much, which you might write as e^\{pi/4 * i\}. If it’s initial length
needed to be 0.3, the coefficient would be 0.3 times that amount.
Likewise, everyone in our infinite family of rotating vectors has some complex constant
being multiplied into it which determines its initial angle and magnitude. Our goal
is to express any arbitrary function f(t), say this one drawing an eighth note, as a
sum of terms like this, so we need some way to pick out these constants one-by-one given
data of the function.
The easiest one is the constant term. This term represents a sort of center of mass for
the full drawing; if you were to sample a bunch of evenly spaced values for the input
t as it ranges from 0 to 1, the average of all the outputs of the function for those
samples will be the constant term c_0. Or more accurately, as you consider finer and
finer samples, their average approaches c_0 in the limit. What I’m describing, finer
and finer sums of f(t) for sample of t from the input range, is an integral of f(t) from
0 to 1. Normally, since I’m framing this in terms of averages, you’d divide this
integral by the length of the interval. But that length is 1, so it amounts to the same
There’s a very nice way to think about why this integral would pull out c0. Since we
want to think of the function as a sum of these rotating vectors, consider this integral
(this continuous average) as being applied to that sum. This average of a sum is the
same as a sum over the averages of each part; you can read this move as a subtle shift in
perspective. Rather than looking at the sum of all the vectors at each point in time,
and taking the average value of the points they trace out, look at the average value
for each individual vector as t goes from 0 to 1, and add up all these averages.
But each of these vectors makes a whole number of rotations around 0, so its average value
as t goes from 0 to 1 will be 0. The only exception is that constant term; since it
stays static and doesn’t rotate, it’s average value is just whatever number it started
on, which is c0. So doing this average over the whole function is sort of a way to kill
all terms that aren’t c0.
But now let’s say you wanted to compute a different term, like c_2 in front of the
vector rotating 2 cycles per second. The trick is to first multiply f(t) by something which
makes that vector hold still (sort of the mathematical equivalent of giving a smartphone
to an overactive child). Specifically, if you multiply the whole function by e^\{negative
2 * 2pi*i * t\}, think about what happens to each term. Since multiplying exponentials
results in adding what’s in the exponent, the frequency term in each of the exponents
gets shifted down by 2.
So now, that c_\{-1\} vector spins around -3 times, with an average of 0. The c_0 vector,
previously constant, now rotates twice as t ranges from 0 to 1, so its average is 0.
And likewise, all vectors other than the c_2 term make some whole number of rotations,
meaning they average out to 0. So taking the average of this modified function, all terms
other than the second one get killed, and we’re left with c_2.
Of course, there’s nothing special about 2 here. If we replace it with any other n,
you have a formula for any other term c_n. Again, you can read this expression as modifying
our function, our 2d drawing, so as to make the n-th little vector hold still, and then
performing an average so that all other vectors get canceled out. Isn’t that crazy? All
the complexity of this decomposition as a sum of many rotations is entirely captured
in this expression.
So when I’m rendering these animations, that’s exactly what I’m having the computer
do. It treats this path like a complex function, and for a certain range of values for n, it
computes this integral to find each coefficient c_n. For those of you curious about where
the data for the path itself comes from, I’m going the easy route having the program read
in an svg, which is a file format that defines the image in terms of mathematical curves
rather than with pixel values, so the mapping f(t) from a time parameter to points in space
basically comes predefined.
In what’s shown right now, I’m using 101 rotating vectors, computing values of n from
-50 up to 50. In practice, the integral is computed numerically, basically meaning it
chops up the unit interval into many small pieces of size delta-t and adds up this value
f(t)e^\{-n * 2pi * i * t\} * delta-t for each one of them. There are fancier methods for
more efficient numerical integration, but that gives the basic idea.
After computing these 101 values, each one determines an initial position for the little
vectors, and then you set them all rotating, adding them all tip to tail, and the path
drawn out by the final tip is some approximation of the original path. As the number of vectors
used approaches infinity, it gets more and more accurate.
Relation to step function To bring this all back down to earth, consider
the example we were looking at earlier of a step function, which was useful for modeling
the heat dissipation between two rods of different temperatures after coming into contact.
Like any real-valued function, and step function is like a boring drawing confined to one-dimension.
But this one is and especially dull drawing, since for inputs between 0 and 0.5, the output
just stays static at the number 1, and then it discontinuously jumps to -1 for inputs
between 0.5 and 1. So in the Fourier series approximation, the vector sum stays really
close to 1 for the first half of the cycle, then really quickly jumps to -1 for the second
half. Remember, each pair of vectors rotating in opposite directions correspond to one of
the cosine waves we were looking at earlier.
To find the coefficients, you’d need to compute this integral. For the ambitious viewers
among you itching to work out some integrals by hand, this is one where you can do the
calculus to get an exact answer, rather than just having a computer do it numerically for
you. I’ll leave it as an exercise to work this out, and to relate it back to the idea
of cosine waves by pairing off the vectors rotating in opposite directions.
For the even more ambitious, I’ll also leave another exercises up on screen on how to relate
this more general computation with what you might see in a textbook describing Fourier
series only in terms of real-valued functions with sines and cosines.
By the way, if you’re looking for more Fourier series content, I highly recommend the videos
by Mathologer and The Coding Train on the topic, and the blog post by Jezzamoon.
So on the one hand, this concludes our discussion of the heat equation, which was a little window
into the study of partial differential equations.
But on the other hand, this foray into Fourier series is a first glimpse at a deeper idea.
Exponential functions, including their generalization into complex numbers and even matrices, play
a very important role for differential equations, especially when it comes to linear equations.
What you just saw, breaking down a function as a combination of these exponentials, comes
up again in different shapes and forms.