Today I want to share with you a neat way to solve the towers of Hanoi puzzle

just by counting in a different number system,

and surprisingly this stuff relates to finding a curve that fills Sierpinski triangle.

I learned about this from a former CS lecturer of mine, his name is Keith Schwarz.

And I've got to say, this man is one of the best educators that I've ever met.

I actually recorded a bit of the conversation where he showed me this stuff,

so you guys can hear some of what he described directly.

It's weird, I'm not normally the sort of person who likes little puzzles and games,

but I just love looking at the analysis of puzzles and games,

and I love just looking at mathematical patterns and (ask): where does that come from?

In case you aren't unfamiliar,

let's just lay down what the towers of Hanoi puzzle actually is.

So you have a collection of three pegs, and you have these discs of descending size.

You think of these disks as having a hole in the middle, so that you can fit them onto a peg.

The set I pictured here has five discs, which I'll label 0, 1, 2, 3, 4,

but in principle you could have as many discs as you want.

So they all start up stacked up from biggest to smallest on one spindle,

and the goal is to move the entire tower from one spindle to another.

The rule is you can only move one disc at a time,

and you can't move a bigger disc on top of a smaller disk.

For example, your first movemust involve moving disk 0,

since any other disk has stuff on top of it

that needs to get out of the way before it can move.

After that, you can move disc 1,

but it has to go on whatever peg doesn't currently have disk 0,

since otherwise you'll be putting a bigger disk on a smaller one, which isn't allowed.

If you've never seen this before,

I highly encourage you to pause and pull out some books of varying sizes

and try it out for yourself, just kind of get a feel for what the puzzle is:

if it's hard, why it's hard, if it's not, why it's not, that kind of stuff.

Now, Keith showed me something truly surprising about this puzzle,

which is that you can solve it just by counting up in binary

and associating the rhythm of that counting with a certain rhythm of disc movements.

For anyone unfamiliar with binary,

I'm going to take a moment to do a quick overview here first.

Actually, even if you are familiar with binary,

I want to explain it with a focus on the rhythm of counting,

which you may or may not have thought about before.

Any description of binary typically starts off

with an introspection about our usual way to represent numbers -

what we call base-10 - since we use ten separate digits, 0123456789.

The rhythm of counting begins by walking through all ten of these digits,

Then, having run out of new digits, you express the next number, ten, with two digits: 10.

You say that one is in the tens place,

since it's meant to encapsulate the group of 10 that you've already counted up to so far,

while freeing the ones place to reset to zero.

The rhythm of counting repeats like this:

counting up nine, rolling over to the tens place,

counting up nine more, rolling over to the tens place, etc.

Until after repeating that process nine times,

you roll over to a hundreds place,

a digital that keeps track of how many groups of 100 you've hit,

freeing up the other two digits to reset to zero.

In this way, the rhythm of counting is kind of self-similar:

even if you zoom out to a larger scale, the process looks like

doing something, rolling over, doing that same thing, rolling over,

and repeat nine times before an even larger roll over.

In binary, also known as base-2, you limit yourself to two digits, 0 and 1,

commonly called 'bits', which is short for binary digits.

The result is that when you're counting, you have to roll over all the time.

After counting 01 you've already run out of bits, so you need to roll over to a two's place,

writing '10' and resisting every urge in your base-10-trained brain to read this as ten,

and instead understand it to mean one group of 2 plus 0.

Then, increment up to 11, which represents three,

and already you have to roll over again,

and since there's a one in that two's place, that has to roll over as well,

giving you 100, which represents one group of four plus 0 groups of two plus 0,

in the same way that digits in base-10 represent powers of 10,

bits in base-two represent different powers of 2.

So instead of talking about a ten's place, a hundred's place, a thousand's place, things like that,

you talk about a two's place, a four's place and an eight's place.

The rhythm of counting is now a lot faster,

but that almost makes it more noticeable:

Flip the last, roll over once.

Flip the last, roll over twice.

Flip the last, roll over once.

Flip the last, roll over three times.

Again, there's a certain self-similarity to this pattern:

at every scale the process is to do something, roll over, then do that same thing again.

At the small-scale, say counting up to three, which is 11 in binary,

this means flip the last bit, roll over to the two's,

then flip the last bit.

At a larger scale, like counting up to fifteen, which is 1111 in binary,

the process is to let the last three count up to seven,

roll over to the eight's place,

then let the last three bits count up again.

Counting up to 255, which is eight successive ones,

this looks like letting the last seven bits count up till they're full,

rolling over to the 128's place,

then letting the last seven bits count up again.

Alright, so with that mini introduction,

the surprising fact that Keith showed me

is that we can use this rhythm to solve the towers of Hanoi.

You start by counting from zero.

Whenever you're only flipping that last bit from a 0 to a 1,

move disc 0 one peg to the right.

If it was already on the rightmost peg, you just loop it back to the first peg.

If in your binary counting,

you roll over once to the two's place, meaning you flip the last two bits,

you move disc number 1.

"Where do you move it?" you might ask.

Well, you have no choice.

You can't put it on top of disk 0 and there's only one other peg,

so you move it where you're forced to move it.

So after this, counting up to 11, that involves just flipping the last bit,

so you move disk 0 again.

Then, when your binary counting rolls over twice to the four's place, move disc number 2,

and the pattern continues like this:

flip the last, move disk 0, flip the last 2, move disc 1,

flip the last, move disk 0.

And here we're gonna have to roll over three times to the eight's place,

and that corresponds to moving disc number 3.

There's something magical about it,

like when I first saw this, like this can't work.

I don't know how this works, I don't know why this works.

Now I know, but it's just magical when you see it

and I remember putting together animation for this when I was teaching this,

and just like... you know, I know how this works, I know all the things in it,

it's still fun to just sit and just like you know...

-Watch it play out? -Oh yeah.

I mean, it's not even clear at first that this is always going to give legal moves.

For example, how do you know that every time you're rolling over to the eight's place,

the disc 3 is necessarily going to be freed up to move?

At the same time the solution just immediately raise these questions like:

where does this come from, why does this work,

and is there a better way of doing this then having to do 2^(n-1) steps?

It turns out not only does this solve towers of Hanoi,

but it does it in the most efficient way possible.

Understanding why this works and how it works and what the heck is going on

comes down to a certain perspective on the puzzle -

what CS folk might call a recursive perspective.

Disc 3 is thinking, okay 2 1 and 0, you have to get off of me,

I can't really function under this much weight and pressure.

And so just from disc 3's perspective,

if you want to figure out how is disc 3 going to get over here,

Somehow, I don't care how, disc 2 1 0 have to get to spindle B.

That's the only way they can move,

If any of these are on top of 3, I can't move it,

any of these are at spindle C, it can't move there.

So somehow we have to get 2, 1 and 0 off.

Having done that then we can move disc 3 over there.

And then disc 3 says, I'm set,

you never need to move me again,

everyone else just figure out how to get here.

And in a sense you now have a smaller version of the same problem:

now you've got disc 0, 1 and 2 sitting on spindle B, we gotta get them to C.

So the idea is that if I just focus on one disc

and I think about what I'm going to have to do to get this disc to work,

I can turn my bigger problem into something slightly smaller.

And then how do I solve that?

Well it's exactly the same thing,

disc 2 is going to say, disc 1 and disc 0, you need to, you know,

it's not you, it's me, I just need some space, get off.

They need to move somewhere, then disc 2 can move to where it needs to go,

then disc 1 and 0 can do this.

But the interesting point is that every single disc pretty much has the exact same strategy:

They all say, everybody above me, get off, then i'm going to move,

ok everyone come back on.

When you know that insight you can code up something that will solve towers of Hanoi

in I think five or six lines of code,

which probably has the highest ratio of intellectual investment to lines of code ever.

And if you think about it for a bit,

it becomes clear that this has to be the most efficient solution.

At every step you're only doing what is forced upon you.

You have to get discs 0 through 2 off before you can move disc 3,

and you have to move disc three,

and then you have to move disk 0 through 2 back on to it.

There's just not any room for inefficiency from this perspective.

So why does counting in binary capture this algorithm?

Well what's going on here is that this pattern of solving a subproblem,

moving a big disk, then solving a subproblem again,

is perfectly paralleled by the pattern of binary counting:

count up some amount, roll over, count up to that same amount again.

And this towers of Hanoi algorithm and binary counting are both self similar processes,

in the sense that if you zoom out and count up to a larger power of 2,

or solve towers of Hanoi with more discs,

they both still have that same structure:

Subproblem, do a thing, subproblem.

For example, at a pretty small scale, solving towers of Hanoi for two discs,

move disc 0, move disc 1, move disc 0,

is reflected by counting up to three in binary:

flip the last bit, roll over once, flip the last bit.

At a slightly larger scale, solving towers of Hanoi for three discs looks like:

doing whatever it takes to solve two discs, move disc number 2,

then do whatever it takes to solve two discs again.

Analogously counting up to 111 in binary involves counting up to three,

rolling over all three bits, then counting up three more.

At all scales, both processes have this same breakdown.

So in a sense, the reason that this binary solution works,

at least an explanation, I feel like there's no one explanation, but,

I think the most natural one is that

the pattern you would use to generate these binary numbers

has exactly the same structure as the pattern you would use for towers of Hanoi,

which is why if you look at the bits flipping, you're effectively reversing this process.

You're saying what process generated these,

like if I were trying to understand how these bits were flipped to give me this thing,

you're effectively reverse engineering the recursive algorithm for tower of Hanoi,

which is why it works out.

That's pretty cool, right?

But it actually gets cooler,

I haven't even gotten to how this relates to Sierpinski triangle.

and that's exactly what I'm going to do in the following video, part 2.

Many thanks to everybody who is supporting these videos on Patreon.

I just finished the first chapter of Essence of Calculus,

and I'm working on the second one right now,

and Patreon supporters are getting early access to these videos

before I publish the full series in a few months.

this video and the next one are also supported by Desmos.