- I'm sure that none of you really need an introduction
for our closing plenary speaker.
I know everyone is really tired
but if you could all get up some energy, put it together
for Chandler Carruth, who is going to
scare the heck out of us about the things
that can get into our systems.
- All right everybody.
How are folks doing, folks like energized,
you sticking strong to the end of the conference?
It's been a long week.
I'm here to talk to you about Spectre.
How many folks here do not know anything about Spectre,
have no idea why this is even an interesting talk?
It's okay, you can put your hand out, it's not a problem.
A few people.
Spectre is a big security issue that was,
kind of, uncovered over a year ago.
It seemed really interesting to me
to come and give a talk about this, in part because,
last year, I was up on a stage here giving a talk.
I was really hoping to actually roll the video for you
but I don't think we managed
to get all of the technical issues sorted out here.
But the key thing here is, the very last question.
In my talk last year, I didn't give a very good answer to.
Someone asked me,
the whole talk was about speculative execution.
If you haven't seen it, it's a great talk,
not to self-promote but it's a great talk.
At the end of it, someone asked,
what happens to instructions that are speculatively executed
if they would, like crash or do something weird?
Very fortunately for me, that was at the end of the talk,
I said that, I don't know and I kind of blew it off
and I said the session was over
and we wrapped up for the day.
That's not a great response and so,
I'm gonna give you an entire talk instead.
Before we get too far into it,
we gotta set some ground rules.
I'm talking about security issues today
and I'm actually not a security person,
you may not know this, I'm not a security expert.
I'm not gonna know all of the answers
that I'm talking about here okay, and that's okay.
That's part of why a bunch of the Q&A is gonna be in a panel
that we have after the talk
so I can bring some other experts
who've been working on Spectre with me
and with a lot of other people in the industry
up onto the stage and they can help me out
in answering your questions.
But we need some ground rules
'cause security can be tricky to talk about.
A good friend of mine who's also been working on this
was at a conference and he was talking about security issues
and they were having a great holiday conversation
and he ended up tweeting about,
I think I can probably attack
this particular vulnerability this way
and didn't really give a lot of context, it was a tweet,
we've all made tweets without adequate context.
And so, the Las Vegas Police Department
came and talked to him about exactly why he was figuring out
how to attack people at this conference.
I had to have a very long conversation with them,
I don't want any of you to have that conversation,
I don't wanna have that conversation.
So, we're gonna try and use careful words,
I don't really want to talk about exploiting things,
I want to talk about vulnerabilities.
I don't want to talk about attackers,
I wanna talk about threat actors.
Sometimes, these people are actually white hats,
they're actually working for the good people,
they're trying to find vulnerabilities.
I'm not gonna be perfect but I just wanna encourage people
to really think about what words we use
as we're talking about this stuff.
The other thing I gotta tell you is, unfortunately,
with a sensitive topic like security,
I am not gonna be able to say everything that I know.
I couldn't say it last year
and I'm still not gonna be able to say everything I know.
I'm gonna do my very best but please understand,
when I have to cut you off or say like I'm really sorry,
I can't talk about that, please be respectful, understand,
I'm doing everything I can but there are restrictions.
These deal with issues spanning multiple companies,
sometimes intellectual property issues
and also security issues where we can't disclose things
without kind of, responsible time for people to get patched.
And that last point brings me to another thing.
If you're out here and there's some really
brilliant people in the room, I'm sure,
if you think I've got it,
I totally see this way more awesome way
to break through that system, to find a new vulnerability,
I would ask you, don't come up to the microphone
in public with that, right here and now
because none of the security people
really like to have instantaneous disclosure,
they like responsible disclosure.
I'm happy to talk to you offline,
I'm happy to point you at other people
who can talk to you offline and figure out
how to go through that process.
That said, I do want you to ask questions,
especially at the panel,
please come up to the microphone with questions,
just understand, if we have to push back a little bit,
we're doing what we can to try and keep this discussion
at the right level 'cause we're talking about very recent
and very current events.
With that, let's get started.
When I first started working on this,
I actually had a hard time even following the discussions,
I felt like I was a kid, I didn't know what I was doing
and a lot of that was because there's background
and terminology that I simply didn't have.
I can't give you all of that,
I don't have all of it myself, I'm not a security researcher
but I'm gonna try and give you enough for this talk.
First off, we have vulnerabilities, this is a common thing
it's pretty obvious, it's some way you can take a system
and cause it to behave in an unexpected
and unintended manner.
Not too fancy.
But a gadget is a weird thing, in a security context,
we mean something very specific, by the term gadget.
We mean, some pattern of code, some thing in a program
that you can actually leverage to actually,
make a vulnerability work.
These tend to be the little building blocks
So, whenever you hear security people talking about a gadget
in the code, that's what we mean.
Let's get to slightly more interesting terminology.
An information leak.
This is a kind of vulnerability.
There's a very classic example, Heartbleed.
What does an information leak do?
Well, it takes information that you shouldn't have access to
and it gives you access to it.
But I don't think, talking about it,
it's the easiest way to figure this out,
let's see if we can actually,
show you what an information leak looks like.
Hopefully, my live demo actually works here.
I've written, probably the world's
most simple information leak that you'll ever find.
We have some lovely data here including hello world
and hello all of you, but we also have a secret,
something we don't want to share publicly.
We have a main function that's gonna go through
and process some arguments.
This could be just any old API that takes untrusted input
and it tries to validate it.
We try and make sure that we actually,
if we don't have the argument, we give it a nice default,
if we do, we actually set it from the command line,
we extract this length and then we even bounds check it,
but we wrote our bounds check in a really funny way.
Some of you may be reading this bounds check
and just being like, uh-uh,
this isn't gonna end well for you buddy.
Unfortunately, it's not.
Let's take a look at how this actually works.
So, if I run this program,
it doesn't do anything interesting, it has defaults,
it says, hello world.
But I don't like just talking to the world,
let's talk to you all of you all, hi everybody.
This all fine.
And we see we have a length of 13 that's our default.
If I give it a small length, it just truncates it off
But what happens if I give it too long of a length?
This is because my bounds check isn't very good.
And if I give it a long enough length,
it's actually going to print out all of this secret.
It wasn't intended, I didn't write any code
that would've allowed it naturally,
to go and read that secret.
If I try and just give it a higher index,
it's like no, you can't read it.
But because there's a bug in my code,
I could have an information leak, and this is literally
the core bug behind Heartbleed,
this is how Heartbleed happened.
Is everybody happy with information leaks?
Let's talk about side channels.
Side channels is the next core component of this.
A side channel is some way of conveying information
using the natural behavior of the system,
without setting up some kind of explicit
communication channel, can we embed a communication
inside of something that's already taking place
that's routine and common and expected to take place.
You'll see in some discussions, this gets kind of
muddied with the term covert channel.
I don't particularly like using that term
for things like Spectre.
A covert channel, I understand much better
by thinking about old-fashioned spy,
who here likes spy movies?
I've got some people who like spy movies.
Covert channels are like spy movies.
That's like when you say,
when I raise my blinds on the third Wednesday of the month,
we meet, that's a covert channel.
It's not a normal thing,
I'm not always raising and lowering my blinds,
it's just that, it doesn't look like
a communication mechanism
but it is intentionally set up as a communication mechanism
and used for that purpose.
A side channel is not something we intentionally set up,
it's just something we can take advantage of
that was already happening.
Let's look at a side channel.
Again, I think seeing this stuff is a lot better
than just describing it.
I built a little side channel demo for you all
but unfortunately, this is gonna be a lot more code
so, I'm gonna try and step through it.
It's okay if you don't understand everything, like I said,
we're gonna have a whole panel,
but I'm gonna try and give you at least,
the gist of how this works.
The first thing I have is a secret.
The secret is just a string, it's nothing too fancy
and I have some code that does force reading,
and I have some timing code, I have some random math code
that's not super important.
The main body of this is this leak bytes thing.
The very first line of this, up at the top,
I have a timing array and this timing array is
a big array of memory that I can access in different ways
to access different cache lines on a modern processor.
I then extract this string view, this nice string view
which tells me about this in bounds range of text
and I build some data structures to collect information,
latency and scores.
And then we start runs, and we do a bunch of runs
until we get enough information to believe
that we have actually found some information embedded in
another medium, in this case, in a timing side channel.
First thing we do is, we flush all of the memory
then we force a read but not just any read.
We load the information out of data
then we use that to access the timing array.
And we access it not just locally but at strides.
And so this means that, for different values
in this data array, I'm gonna access different cache lines.
Then, I have to go and see whether that was successful
and in order to see whether it was successful,
I have a loop down here which kind of, looks for,
which kind of shuffles the way I access memory
and then, accesses each and every cache line
in this timing array, does a read
and computes the latency of that read.
It is just timing each cache line in a way that lets us see
whether one of these cache lines was faster
than all of the others, because we've already accessed it,
we accessed it right before, in the previous loop.
Makes some sense?
And then, we go and we find the average latency
because we don't wanna hard-code any constants here.
If one of the latencies,
if one of the latencies for one of the cache lines
was substantially below the average, then we think,
cool, that was probably a signal
embedded in our timing side channel, we bump the score
and if we get the score high enough, down here at the bottom
if we get the score high enough, we gain confidence,
yeah, we found our signal, we've actually
found the information.
Makes sense to folks?
Let's see how this works.
If I run this, that was pretty fast.
If I run this, you're gonna see, it's gonna print out
each of those characters.
And each one of those, it's not actually looking
at the character, it's timing the access to memory.
Makes some sense?
It's actually that simple.
There's not more, I don't have anything up my sleeves.
Like I promised, this is like a real, this is a real demo.
You have one more key piece of,
quarter knowledge here and that's speculative execution.
We talked a lot about speculative execution in
the talk I gave last year, I'm not gonna try and give you
a full rundown on how processors do speculative execution,
the key thing is, that it allows them to execute
instructions way past what the program currently is at
and sometimes, with interesting assumptions.
Because in order to execute further along
than the program currently has,
the processor has to make predictions.
These predictions, are really more like guesses.
And sometimes, it guesses wrong
and it makes an incorrect prediction
but it continues to speculatively execute
and it just unwinds all of that later.
But when you have this misspeculation
and you combine it with a side channel,
it allows you to leak information that was only visible
during that speculative execution.
And that speculative execution may have occurred
with strange invariants, with invariants simply not holding
and so, you can actually observe behavior from a program
that violates the fundamental invariance the program set up.
And that's Spectre and that's why Spectre is so confusing.
You wrote the code and it clearly, only does one thing
but observation shows something else.
Let's see if we can map this on.
My demo for this one is going to be essentially, Spectre v1.
But I've tried to make it as similar
to the previous two demos as I could.
Just like last time, I have a text table with three strings.
I've hard coded it to try and read using this second string
we can jump down to the main function and you can see
what it's actually doing here.
We actually are going to, always use this text table one.
That's the only thing we hand to leak byte.
We do not hand the second, like the third entry
in our text table to this routine
and we hand it to string view with a bound in it.
And then this loop is essentially, computing
an out-of-bounds index into this thing
and we're parsing this index.
But this I is always going to be out of bounds.
We're computing it based on a totally different string.
This index is never in bounds.
Once we get up to the leak byte,
we have a slightly different routine, we have the same setup
with one small difference.
We put the size of our string view into memory.
This is me cheating so that it fits on a slide but,
the idea being that your size might not be sitting
in a register, it might be slow to access.
Then we have our runs.
Getting a good demo of this is a bit tricky.
One thing we need to do, is we have to essentially,
train the entire system using correct executions
before we can get it to predict an incorrect execution.
And so, I build a safe index into my buffer of text
and this is always gonna be in bounds,
this index is totally fine.
But it's important to note,
this index is not stable, each run gets a different one,
it's not at all going to be useful
for extracting any information from this routine.
The only thing it's useful for is actually,
accessing my data in a safe manner.
Then I am going to flush the size out of my cache.
It doesn't matter that I'm flushing it
or doing something else, all I really need to do
is make size very slow to compute.
Then I wait a while.
Turns out that this little stall here is important
or it doesn't tend to work.
And I compute this weird local index.
This local index is essentially,
the training and then the attack.
For the first nine runs, we just access
a perfectly safe index, but then on the tenth run,
we switch to the index the user parsed in.
So, just nine good, a tenth one bad.
Then we do a bounds check.
I wanna be really clear,
we always do a bounds check
and this is a correct bounds check.
We make sure that the index is smaller than the size
and that means, we will never access the data
out of bounds here.
We hid it in a string view, a safe entity.
Herb has told us all about how safe string view is
but then when I come down here,
I'm going to access it using a local index
and the problem is that this access right here
using the index may happen speculatively and it may happen
before the bounds check finishes
and when the bounds check was going to fail.
So, it accesses an out-of-bounds piece of memory,
it uses that, scaled up, to access the timing array
then we read through that yet again
and all of a sudden, we have leaked information,
we've actually accessed our side Channel.
The rest of this is the exact same code.
We go through and we measure all the times to see
like yes, did we in fact find one of these cache lines
being slower, and if so, we compute it,
there's nothing else different from this example
and the previous one.
And when I run this,
it's actually going to print the string.
And we never accessed this memory.
If I made this example a lot more complicated
and move that memory into a separate page,
I could even protect the page so that any access would fault
the program would run fine.
Because we never access the memory directly,
we leaked it through a side channel, so that is Spectre.
I know this is an uncooked thing,
I just ran it on an Intel laptop here.
If we make really good time,
I'm happy to try and actually show you this actually working
on a non-Intel machine as well.
I have it but unfortunately, we had some AV issues and so,
I'd have to sit here and type in passwords
like half a minute, it's not really fun.
Let's for now, kind of go back to presentation.
We've gone through and we've looked at all
this speculative execution, we've looked at Spectre
and misspeculative execution,
but if this were just one issue maybe,
it wouldn't be that bad.
It isn't just one issue.
This is an entirely new class of security vulnerabilities.
No one had really thought about what would happen
if you combine speculative execution and information leaks.
They had no idea that there was something interesting here
and as a consequence, we have had a tremendous
new set of security issues coming in
and I'm gonna try and give you a rough timeline
of all of this.
It started off last year in June,
when Project Zero at Google informed vendors of various CPUs
and other parties about the first two variants of Spectre
which are called bounds check bypass
and branch target injection or variants 1 and 2.
Then, a few weeks later, they found a third variant,
it's called variant 3 or rogue data cache load
or much more popularly, Meltdown.
And vendors were working furiously for the rest of the time
until January, when these were finally disclosed publicly
as variants 1 and 2 of Spectre
and variant 3 of meltdown.
During this time period,
they were found by other researchers
who were looking in the same areas, kind of concurrently
and all of the researchers kind of held their findings
in order to have a coordinated disclosure here
because this was such a big and disruptive change
to how people thought about security.
Most of the companies working in this space
actually didn't have teams set up in the right place
or with the right expertise
to even address these security issues.
So, it was a very, very disruptive
and very challenging endeavor because it was the first time
and a totally new experience.
But we weren't done.
After this, we started to see more things.
The next one was in March, called BranchScope.
BranchScope wasn't a new form of attack, it was actually,
a new side channel, instead of using cash timings
that pointed out that you could use
the branch predictor itself to exfiltrate data
from inside a speculative execution to a normal execution,
just a different side channel.
We also started to see issues coming up
which had nothing to do with Spectre
but were unfortunately, often grouped with Spectre
because this stuff is complicated.
I don't know about you all,
but I think this stuff is complicated,
the press thinks this stuff is complicated
and they ended up merging things together, understandably.
And so, there were issues around
POP and MOV SS which are weird, Intel and x86 instructions
that have a surprising semantic property that essentially,
every operating system vendor failed to notice
when reading the spec.
And unfortunately, those bugs persisted for a long time
but now that people were looking at CPUs
and CPU vulnerabilities,
they were able to uncover these and get them fixed.
They don't have anything to do with speculative execution
There's also Glitch, again,
doesn't have anything to do speculative execution on CPUs
But there was another interesting one in May
and this is two things, variant 3a,
was a very kind of, obscure variation on variant 3
and then variant 4.
Variant 4 was really interesting, and I mean,
This one's called speculative store bypass.
This was also discovered by Project Zero
and by other researchers concurrently.
And this one made Spectre even worse than it already was.
So, this really kind of,
amplified everything we were dealing with.
And we still weren't done.
The next issues were Lazy FPU save and restore
which we saw in June.
This was super easy to fix, it's kind of a legacy thing
that hadn't been turned off everywhere it should have been
and it turns out there's a bug.
During speculative execution,
you may be able to access FPU state.
That the operating system has kind of left there
from when the previous process was running.
With the idea being, that it has an,
it's gonna trap if you actually access it,
and once it traps, it'll save it,
it'll restore your FPU state
and then let your execution proceed.
But the trap happens after speculative execution.
And so, you can speculate right past it,
access the FPU state and leak it.
This is an arbitrary memory
but it ends up still being fairly scary because,
inside of the FPU state, includes, things that are part of,
that are used by Intel's encryption instructions.
And so, you would actually put private key data
in the exact place that you leaked
which was really unfortunate.
Again, this was mostly a legacy thing,
very quickly and easily turned off.
Intel and other vendors have been providing
better mechanisms than this for a long time
but we hadn't turned it off everywhere that we needed.
We have another kind of mistaken entity in this,
we got a new side channel attack
that had nothing to do with speculative execution.
It's just a traditional side channel attack
on cryptographic libraries, called TLBleed,
it's a very interesting attack,
it's very interesting research
but it doesn't have a lot to do with Spectre.
And apparently, I have...
Then in July, we start to solve even more interesting things
in my opinion, even more interesting things coming up.
These ones are called variants 1.1, 1.2.0 and 1.2.1
or collectively, bounds check bypass store,
which is a, kind of a mouthful
but this was a big, big thing.
This essentially, extended variant 1
in really exciting ways that we're gonna look at.
Then later in July, we got still more good news.
We got to hear about SpectreRSB and ret2spec,
yet more variations on this.
And then in July, we got the worst news, for me at least,
which was NetSpectre.
NetSpectre was not a new vulnerability,
it was not a new variation on Spectre,
it was a really, exemplary demonstration
that all of the Spectre things we're looking at
can be leveraged remotely.
It does not require local access.
So, the NetSpectre paper actually used this remotely.
Oh sorry, and one more thing, L1 Terminal Fault.
This one was extremely scary but fortunately,
has relatively little impact
outside of operating system vendors
so, we're not gonna spend too much time on that one.
But there was yet another one that happened pretty recently.
I don't think that we're over.
This timeline is going to keep going as time passes.
We're going to keep seeing more things come up
as the researchers and the vendors
kind of explore this new space,
so, you should not expect this to stop.
That doesn't mean that the sky is falling,
it's just that we have to keep exploring this space
and understanding the security issues within it.
And this is gonna keep going for some time.
But for now, let's try and dig into these things
and understand how they work in a little bit more detail,
especially outside of the one example
that I've kind of shown you already.
Let's look at the broader scope of variant 1,
because variant 1, I've shown you just
bypassing a bounds check, but variant 1 is actually,
a much more general problem.
Any predicate that the processor can predict can be bypassed
and if that predicate guards unexpected behavior
by setting up some invariants or assumptions,
which most predicates do,
you may have very surprising consequences.
As an example, we might have,
a small string optimized representation here,
where we have a different representation for a long string
and a short string.
Up here, we have a predicate, is this long,
is this in the long representation?
And you might actually train
and the branch predictor might think, this is probably long
or it might think, this is probably short.
Turns out, short strings are the most common cases,
the branch feature will predict that this is probably
going to be short.
Unfortunately, a lot of short string optimization strings,
the pointer to the short string is inside the object itself
often on the stack, where there are other things
that are really, really interesting to look at
adjacent to the string object.
And so, if we predict that this is short,
we're going to get the short pointer
'cause it's actually just a pointer to the stack
and we're going to start speculating on it
and if we speculate far enough to find
some information leak, this can be exploited.
Then you have another interesting case.
What about virtual functions, what about type hierarchies?
Here, we have a type hierarchy, we have some base class
for implementing key data and hashing of the key data
and then we have public keys
where we don't have to worry about leaking the public key,
and we have a private key where we have to worry
about leaking the key data.
We have this virtual dispatch here and what happens,
if we've been hashing public keys
over and over and over again, and then we predict
that in fact, we think we have another public key
when we don't.
We may dispatch it to the wrong routine,
to the non-constant time one, speculate it
and run right across the cryptography bug
that this whole thing was designed to prevent.
Again, the invariance you expect in your software,
don't hold one speculative execution starts,
that's what makes it so hard to reason about.
There are also other variant 1 derivatives.
So far, we've looked at cases where you
speculate parse some predicate
and you immediately find an information leak.
But, there aren't that many information leak code patterns
in your software maybe, so, that might be relatively rare.
But that's where the the variants 1.1, 1.2
or the bounds check bypass variants came into the picture.
Here, we have some delightful code
which has some untrusted size.
We're gonna come in and we're gonna have
an out-of-bounds access here,
and once we have this out-of-bounds access,
we're actually going to copy into a local buffer
on our stack, data that has been given to us by the attacker
because we've got an out-of-bounds store
that we can also speculatively execute.
This speculatively stores attacker data over the stack.
And if this happens, then later on,
we're going to potentially, return from this function
and when we return from this function,
the return address is stored on the stack
but we've speculatively written over it,
this is a classic stack smashing bug
now come back to haunt us in the speculative domain.
Even though the bounds check is correct, it didn't help,
we were still able to conduct a speculative stack smash.
And this in speculative execution to an arbitrary address
controlled by the attacker.
Before I go on, it's important to really think about
why, sending control to an arbitrary address is so scary.
We've had bugs involving stack smashing forever,
it's one of the most common security vulnerabilities
but once you do that, you tend to want to
build some kind of, remote code execution,
you wanna build logic and trigger logic out of that.
The best way to do this is to find the logic you want
inside the existing executable
and just send the return to that location.
It's called return-oriented programming.
You take the binary
and you analyze all of the code patterns in the binary
to find little pieces of code that implement
the functionality you want.
And then, you string them together with returns
by smashing the stack and going to the first one
which does something and then goes to the second one
and so on and so on.
The most amazing thing to me, again,
I'm not a security researcher so when I heard about this,
it just like, blew my mind.
The most amazing thing is that,
some very, very delightful individuals have built a compiler
that analyzes an arbitrary binary to build
a Turing complete set of these gadgets
and then, emit a particular set of data values
and a start point which can implement any program,
which is a little bit frustrating.
And then you realize,
that it's actually easier in the speculative domain.
It doesn't matter if it crashes
after I do my information leak.
For a real code execution, I don't just have to
execute the code I want, I also probably,
wanna keep the service running for a while,
like I wanna, set it aside and not disturb it too much.
Don't need to do that,
I just need to hit my information leak,
it can do whatever it wants, it can crash,
it can do anything.
And this means, if the attacker can get to this return,
They have so much power,
because we have this long history of work
figuring out how to use this return
to do really, really bad stuff to the program.
But there are more ways you can do this.
You can imagine, you have again,
some type with some virtual interface.
And you have this virtual function you created on your stack
but then you process some code, also on the stack
but with an attacker-controlled offset
that may be mispredicted.
And then, you're going to use that offset to index
and this can index from one object on the stack to another
because it can go out of bounds,
'cause we're in speculative execution.
And then, we can potentially write attacker data
over the stack, and this might write over
the actual V pointer,
that points the vtable for this object.
It's all gonna get rolled back eventually
but if we then hand control, off to some other function
and this other function doesn't use the derived type,
it uses the base class to access it,
it's going to use that V pointer to load the virtual table
to load a function out of it and call that.
But you just got to point it at any memory you want
which means you get to send this virtual function call
anywhere you want in the speculative domain.
It's just like the return, except this time,
with the virtual function call.
And I can keep going, there are a bunch of different
permutations of how you can hijack control flow here.
But the easiest way to hijack control flow
and send it to your information leak gadget
was in variant 2.
And this is why variate 2 was extra scary
until it got mitigated.
Variant 2 works something like this.
Again, we have our class hierarchy, we have some,
sorry, not class hierarchy, we have a function pointer here,
just any indirect function call,
doesn't matter how you get there.
We're gonna call through it.
Well, how does this actually get implemented
in the hardware?
To really understand variant 2,
we've gotta start dropping down a few layers into hardware.
We're gonna drop into x86 assembly at this point.
This is actually the x86 assembly produced by Clang
a little while ago for that C++ kit.
Right here we have this call with the weird syntax,
we're actually calling, like through memory.
And what this is doing, it's actually loading an address
out of the virtual, sorry, out of the state function
and then calling through it.
This is an indirect call.
This is really hard on the processor because it doesn't know
where this call is going to go and it wants to predict it,
that's how we got into speculative execution.
But the implementation of this predictor
has a special problem.
This is my world's worst diagram for it
but it gets the point across.
The implementation of this predictor
is essentially, a hash table.
It's a hash table that maps from the program counter
or the instruction pointer of the indirect call
to a particular target that we want to predict.
But it doesn't map it to the actual target address, oh no,
it maps it to a relative displacement
from the current location because that's smaller,
we can encode that in a lot fewer bits.
And then you realize something else.
This is a really size constrained thing,
this is literally, a hash table implemented in silicon.
And so, in order to implement this,
the hash function actually has to reduce this key by a lot,
it doesn't use most of the bits
and the hash function is really straightforward
in a lot of cases.
And so, there are collisions in these hash tables
all the time.
They're tiny, you would expect collisions and that's okay.
So long as the collisions are infrequent enough,
the performance is still good.
But if you can kind of try out the collisions long enough,
you can figure out how to cause a collision reliably
in this hash table.
If you can cause a collision reliably,
you can train this predictor to go to your displacement.
And then, when we do this call, we look up in the hash table
we hit a collision, we get the wrong displacement
and we go to the wrong location.
And it turns out, this is really easy.
The only thing you have to have in the victim code here
is an indirect call and that's everywhere.
Or even just a jump table to implement a switch,
is enough to trigger the same behavior.
That makes this really, really easy to exploit and actually,
take and send control flow to wherever you want.
But it's worse than that.
There's another kind of indirect branch in x86 code,
if you have a return.
Returns on x86 get implemented
with some instruction sequences that look like this.
And again, we don't have a specific destination here,
the destination's in memory, it's on the stack.
And so, when you go to return,
the processor has to predict it somehow.
For calls and returns to processors,
all have very specialized predictors
that are super, super accurate, typically called,
the return stack buffer.
Unfortunately, sometimes, these predictors run out.
They may not have enough information to predict it
and on some processors, when that happens they fall back to
the exact same hash table solution
as we solve for virtual calls and for jump tables.
And so, even a return can, in some cases,
trigger this behavior.
That means, it's actually pretty easy to find these in code.
That's variant 2.
I'm gonna keep going.
I'm skipping over variant 3 because variant 3
was completely addressed by the operating system,
user code does not need to worry about variant 3.
So, let's look next at variant 4.
Variant 4 is called speculative store bypass.
This is actually pretty easy to understand what it does.
It's exactly what it says in the name.
Sometimes, when you read from memory,
instead of reading memory that was just stored
at that location, you will read speculatively, an old value.
That's really it.
The problem here, is that the processor may not know
whether the addresses of these loads and stores match.
And so, instead of waiting to see if they match,
they'll guess, they'll predict.
If they mispredict, they may predict that the store
and the load don't have the same address.
And if it predicts they don't have the same address,
it may speculatively execute the load
with whatever was there before, that store.
That's pretty simple and you can imagine how this works.
Imagine you have an application
which runs some sandbox code in the callback here
and hands that sandbox code, a specific private key.
We don't ever want to hand a private key
to the wrong callback here.
One of these callbacks owns one of the keys,
another callback owns a different key.
But when we're going through this loop,
the key gets parsed by value and that means,
this is a bit big to fit into registers,
we're going to store a copy of this key onto the stack,
then we're gonna call the function
with the pointer to that entry on the stack.
It's gonna finish, come back, we go to the next one,
we store the next key onto the stack
and call the next function.
But if that function happens to speculatively execute
in the right way, its loads may not observe that stored key,
it may observe the previous function's stored key.
And then be able to leak it
and we have another information leak.
It turns out, that this is the fastest
of the information leaks that we have found.
If you can hit this reliably,
you can extract data at an unbelievable rate
with this particular technique.
This technique caused tremendous problems for
browsers and other people doing sandboxing as a consequence.
But there's also is other implications.
You can imagine a variant 1-style information leak
that's actually powered by variant 4.
So here, we have a vector that we're returning
from some function, which means we're gonna store a pointer
like some pointers but also a size into memory here.
Then, when we come down to our bounds check,
we may be reading size out of memory
and if we're reading size out of memory
and it happens to be slow, it may not see the store
just before this in size.
And so, it may speculate instead,
reading whatever was on the stack before the store,
which might just be a random collection of bytes
probably a very large number,
means this bound check will parse,
but it's using the wrong bound.
It's not that we've bypassed the bounds check,
the bounds check occurred, it just used the wrong bound.
And again, we get into
the classic information leak as a consequence.
Variant 3, like I said,
this is mostly about operating systems.
I can explain if you folks want,
but I'm just gonna keep moving for the sake of time.
We also have Lazy FPU save and restore,
I mentioned kind of how this worked.
But again, this was largely fixed by operating systems
since the operating system is the one switching context,
it can change its behavior and prevent application code
from having to worry about this.
An L1 Terminal Fault.
The way L1 Terminal Fault works is amazing.
There are certain kinds of faults that, when they happen
speculative execution can again, occur.
And if you arrange everything just right,
especially with page tables and other aspects of your system
you can essentially read arbitrary data out of the L1 cache
while this terminal fault is being handled
and leak it with speculative execution.
And there are a bunch of different ways to observe this,
there is a great paper that introduced this
called Foreshadow and showed, that this actually works
inside of Intel's secure Enclave SGX.
And yes, it just allows you to read the entirety of L1.
If you haven't seen it yet, go and look for the video
online about this.
You can actually find one of the researchers
which has a window at the bottom of a Windows machine
and as they type in the administrator password,
the window shows the administrator password in real time.
It's really, really effective.
But again, this is mostly an operating system concern
and so, operating system changes and hardware changes
are being used to address this.
Application code doesn't have to deal with this directly.
I don't know about all of you,
but I think that was too much information.
So, I'm gonna try and summarize
in a way that you can kind of wrap your head around.
This is gonna be the most busy slide I have.
This is the summary slide, of essentially,
all of this background information.
We have four variations on Spectre v1.
There's v1, 1.1, 1.2, ret2spec,
which I just didn't have time to show you all.
These are all taking advantage of the same
and they have very similar properties.
They can impact application code,
they can impact operating system code.
They don't require to be using hyper threading
or simultaneous multi-threading in your CPU.
We have really slow software fixes that none of us like
and we don't have any realistic hardware fix on the horizon.
These are actually the thing I'm gonna talk about most,
because these are for me, the most scary.
Note that red column on the right.
We also have variant 2 which actually,
is the primary variant 2 but also, SpectreRSB
which helps show how you can actually
get variant 2 to work on returns.
These are a bit different.
While they impact both the application
and the operating system code,
they do require some things to be true.
For you to attack from one application to another,
you really have to be using hyper threads or SMT.
The other nice thing is that, we have some
much better hope of fixing these.
We have a very good software fix for variant 2,
we don't have a great software fix for SpectreRSB
or variant 2 when it's hit with the return instruction
but there's some stuff you can do,
but it's not as satisfying.
But we do have good Hardware fixes on the horizon,
future Intel hardware, future other,
future hardware from other vendors
is going to do a very good job of defending against this.
Then, we have variant 4.
Variant 4 looks, in terms of the risk, more like Spectre v1
but with less hope of mitigating it.
It impacts applications, it impacts operating systems,
it does not require hyper threading
for one application to attack another.
We have absolutely no hope of fixing this in software
and so far, the hardware fixes are proving problematic.
There is one that's slow
and the browser vendors aren't using it
and have some concerns about it,
and so, this one's still pretty fuzzy.
And then we have a bunch of things at the bottom
that I really view very differently from the rest
because these are fundamentally, CPU bugs
that just interacted very poorly with speculative execution
and the Spectre techniques.
And these, I think are going to very consistently,
get fixed rapidly.
I think these are in some ways, the least scary
for application developers.
Most of them don't impact applications at all,
you don't have to change your code at all.
They're only in the OS.
We have a great software fix for Lazy FPU,
so good that no one is going to try and fix the hardware
and we have great hardware fixes for the other ones.
And so, I think these are generally speaking,
going very well.
I'm gonna really focus on Spectre variant 1, variant 2
and variant 4 because those are the things that are really
continuing to impact software today.
To really talk about what you need to know in this space,
we need to have a threat model.
If you went to one of the earlier talks
at the conference about security,
there was a great discussion
around how you do threat modeling.
that person is actually a security researcher and I'm not.
And I'm certainly not your security researcher
and so, I can't help you build a threat model
and that's not what I'm gonna do up here.
But I can give you some questions you can use
when building your own threat model
to really understand the implications of Spectre
and speculative execution attacks
on your particular software system.
First off, does your service have any data
that is confidential?
Because if not, it doesn't matter
if you have an information leak vulnerability,
it's a very simple, simple answer.
I love this threat model.
Next, does your service interact
with any untrusted services or inputs?
Is there any input you don't fully trust?
Is there any entity that talks to you in some way
that you would not want to share
all of the information you have with?
If the answer's again, no, then, you're fine.
This gives you a nice simple rule
that fortunately excludes, I think,
the majority of software we have out there.
If you have nothing to steal or no one to steal it,
you have nothing to secure,
from information leaks.
This is a pretty solid, mental model to use
when coming up with your threat model.
Unfortunately, we do still have a lot of software
that doesn't fit this model.
So, let's talk about how we can dig through
those pieces of software.
Do you run untrusted code in the same address space
as you have confidential information stored?
Do you have some information there
and you're gonna run untrusted code right next to it?
If this is the case, you have a hard problem.
We do not know how to solve Spectre effectively
for this case,
outside of isolating your entire code
from your confidential information.
This is the case that browsers are in.
You're going to see browsers increasingly dealing
with this particular case.
If you hit this, almost nothing else
about the questions here matters,
you're going to have the highest risk from Spectre.
But maybe you don't have untrusted code
running in the same address space,
there's a lot of software that doesn't run untrusted code,
which is good.
Now you need to ask yourself, does an attacker have
access to your executable?
Can they actually look at your binary and reason about it
in some way?
Can they steal a copy of it easily?
Is it distributed in some way that they would have access?
That's gonna really change the threat model.
If no one has access to your executable,
they're going to have an extremely hard time
using these techniques.
It's not impossible, but it becomes incredibly difficult.
However, you wanna be a little bit careful here
because they don't need access to the entire executable.
If you use common open source libraries,
and if you link them in and if you build them
with common flags, then, they have access
to part of your executable.
If you run on a distribution and you dynamically link
the common distribution shared objects,
they may have the exact same distribution
and they'll have access to some of the executable
and they don't need access to all of it
to mount a successful attack.
So, you wanna be a little bit careful
how you think about this but it does
really dramatically influence how open you are
to these kinds of risks.
The next question is,
does any untrusted code run on the same physical machine?
Because if the answer here is, no,
you're really looking at a single mechanism for attack
and that's the ones presented in NetSpectre.
That's the way you're going to be seeing this.
NetSpectre gives us pretty clear bandwidth rules
and it turns out, the bandwidth is low and so,
if you don't have untrusted code running on the same machine
there's some very specific questions you wanna ask.
How many bits need to be leaked for this information leak
to actually be valuable to someone else?
How many bits are at risk?
If you have a bunch of data,
if you have the next manuscript for,
I guess Harry Potter is over,
but whatever the next fancy book is,
leaking that manuscript's going to be really hard, it's big.
You don't need to worry about someone leaking
the next video game
that you've got a copy of on your machine,
that's gonna be really slow.
But if you have a cryptographic key,
that may only be a few thousand bits.
If you have an elliptic curve cryptography key,
that may only be 100 or 200 bits
before it's compromised.
And worse with cryptographic issues,
you may not need all the bits for it to be valuable.
So, you really wanna think about this.
Another thing to think about is,
how long is this data accessible?
If it's in the same place for one request in your service
and then you throw it away
and then it shows up somewhere else, then,
you may not have big problems here because,
it may be very hard to conduct all of the things necessary
while the data is in the same place.
You also wanna look at what kind of timings
that someone can get in the NetSpectre style of attack.
You wanna look at, what is the latency of your system?
How low is that latency, how low can they get it?
And you also want to look at,
just how many different systems, have the same information?
So, if you have, for example, a cryptographic key
that is super important and you have distributed it
across thousands and thousands of machines
and all of those machines can all be attacked simultaneously
you have a much bigger bandwidth problem
than if it only exists on one machine, because then,
the bandwidth is much narrower.
These are key things to think about around bandwidth.
And really, NetSpectre is all about this.
You're essentially, always going to be making this bandwidth
risk, value and complexity trade-off because,
it's going to be very hard to mitigate this otherwise,
so, you want to think very carefully about this.
But what if you do run untrusted code on the same machine?
There are a lot of shared machines that actually have
shared users here, and I don't mean in the cloud, since,
if you have separate VMs, that's enough.
Like you can think of those as separate machines,
but what if you're actually,
really running on the same machine?
Then you have to ask more questions.
Do you run untrusted code on the same physical core?
And this may not always be obvious.
If you don't have hyper threading
or simultaneous multi-threading, then, you clearly don't run
untrusted code on the same physical core simultaneously.
But there are other ways you may get here,
you may partition your workload across different cores.
There are a lot of ways that may influence this
and all of the variant 2-style attacks
from application to application,
rely on running on the same physical core and so,
in a lot of ways, if you can exclude this
you get to take out an entire variant from your threat model
and that's really, really useful.
With that, we've kind of talked about
all of the different things you wanna think about
from threat modeling.
I do wanna re-emphasize, this is about applications.
Operating systems and hypervisors
have totally different challenges here,
I'm not covering them.
They're there, they're very real risks
but I'm not covering them.
If you wanna know all about operating systems
and hypervisors, you can come and ask all about them
at the panel but, I'm actually not the expert there
and it's a very different thing
and it seemed like a different crowd
that might be more interested in that.
I'm focusing on application issues here.
With that, let's move over to talking about mitigations.
How do we actually cope with this?
First things first,
you have to mitigate your operating system otherwise,
none of this matters.
If you do not deploy the operating system mitigations
that your operating system vendor is providing,
you cannot do anything useful here.
These are essential.
So, please, especially now, it's increasingly important
that you have a way to update your operating system
and that your operating system vendor is
actively providing you updates.
If they aren't, you should probably
look for a different operating system vendor.
This stuff is important.
Let's assume you've gotten
all of your operating system mitigations
and all of your operating system updates and so you're good.
And let's talk about how you can mitigate
your application code.
First off, there are some x86 kind of operating system
and hardware-based mitigations for application code.
These come in three flavors.
They have again, weird acronyms.
IBRS which is, indirect branch reduced speculation.
IBPB, which I missay every time I try, which is,
indirect branch prediction barrier.
And STIBP, which is the,
single threaded indirect branch prediction feature.
Your operating system and your hardware can turn these on.
When they do, they can provide certain levels of protection
from some of these variants.
But an important thing to realize, for an application,
these do not help with variants 1 or 4.
They're exclusively helping with variant 2.
They also, may be very slow in some cases.
These are especially slow on current and older CPUs.
We're expecting newer CPUs to increasingly,
make these things fast and for them to be essentially,
unobservable in terms of performance.
But if you have the older CPUs,
even turning these on with your operating system
may be a very significant performance hit
and there are some alternatives.
But the alternatives are software-based,
and so, we need to talk about how we can use software
to go after mitigation.
The first one is called Retpolines.
This was developed by Google, a colleague of mine.
The idea is, well, we can recompile our source code
to our application, is we wanted to see,
is there something we could change in the source code
that could be effective at mitigating at least,
some of the most risky variations on this.
Notably, variant 2 which is far and away
the easiest to attack in a working system.
It seemed like something we really wanted to mitigate
in software, given the performance impact we were seeing
from the OS hardware-based mitigations.
It does require recompiling your source,
which can be painful, but if you can,
this mitigates Specter variant 2
and SpectreRSB in restricted cases
but there're a bunch of asterisks and hedges there.
And it's usually going to be faster than STIBP
on current CPUs and older CPUs
for mitigating your current application.
Not always, but there's a decent chance
you probably want to look at it.
Going forward, in the future, we do expect this
to become less and less relevant
because the hardware is really catching up.
We're expecting in the future,
this is just going to work on hardware
and you're not going to need to worry about this.
But for now, you might want to worry about this
if you have a service that is at risk here.
How does this work?
We have some indirect call, just like the previous one
but when you compile your code with Retpolines,
we don't emit these instructions,
we emit a different set of instructions.
Here, we've taken this address that you wanted to call
and we've put it into a register r11.
And then we've transformed the call
into a call to this helper routine, llvm retpoline r11.
If we look at this routine, this is a very,
very strange function.
The first thing it does is a call
but it doesn't call a function, it calls a label,
a basic block inside of itself.
And once it does that,
it then takes the address you wanted to call
and smashes the stack with it, this is a stack smash,
this clobbers the return address
with this address you wanted to call
and then it uses a return
to actually branch to that location.
So, that's a pretty weird thing to do.
The key idea here, is that by doing a call
followed by a return, we put a particular address,
an unambiguous address into the call and return predictor,
the return stack buffer.
And this predictor is really fast and really good
so, the processor prefers it anytime it can use it.
And in the vast majority of cases,
it's going to be able to use it here, and when it does,
if it speculates this return, it actually ends up here,
because the speculative return can't see
that stack smash operation.
So, when it speculates return, it goes here
which then goes to this weird pause instruction.
How many folks here have used the x86 pause instruction?
I don't know what kinda code you people are writing
except for this one over here.
I know what you're doing too.
The pause instruction's super weird,
I never even knew what this was,
I thought this was like something from old, old, old
x86 days but no, it actually has lots of uses,
and in this case it is the cheapest possible way
to abort speculative execution.
And we want to abort it because speculative execution
consumes resources, like power.
And so, we don't want to abort it,
and so, we cut it off here.
Unfortunately, pause doesn't do that on AMD processors,
it only does it on Intel processors.
After we pause, we then do an LFENCE and this,
will actually work on AMD processors
once you install your operating system updates.
Finally, just in case all of this magic fails,
we make this into an infinite loop.
You're not getting out of here, this is keeping
the speculative execution in a safe, predictable place.
This essentially, turns off speculative execution
and branch prediction for indirect calls
and indirect branches, but that protects us from variant 2.
The overhead of doing this is remarkably small.
This is about your worst case scenario,
we built very large C++ servers with this enabled
and the overhead was under 3%, reliably under 3%,
but it does require that you use some pretty advanced
You need to be using profile guided optimizations,
you need to be using ThinLTO or some other form of LTO.
I can't emphasize that enough, but when you use them,
you can keep the overhead here very, very low.
And if you're working in something very specialized
like some really specialized code at code or a kernel,
you can usually avoid the indirect branches
and indirect calls, manually, with essentially,
no measurable performance overhead by introducing kind of,
good guesses for what the direct code should be
and a test to make sure that that's correct,
rather than relying on indirect calls and indirect branches.
We've been able to use this to make
our operating system mitigations
incredibly inexpensive, as a consequence.
But this is only for variant 2 and maybe variant 2
is gonna be fixed in future hardware
and maybe, you're not even subject to it.
So, what about the other variants?
That's where things start to get bad.
You can manually harden your branches for variant 1,
which is nice.
But it can be a bit painful.
Intel and AMD are suggesting
that you use the LFENCE instruction right after a branch.
And actually, while we're here, I think we have enough time.
Everybody likes live demos,
let's see if we can actually just do this.
I come down here.
And after my branch,
I do an LFENCE.
We would expect this to
mitigate things, hopefully it does.
This is gonna run really slow but it's also,
not gonna produce my string.
Nothing's happening here and that's a good thing,
I can even build the debug version if you're all are worried
that I'm being sneaky here.
I have a debug version that actually prints out stuff
while it's going.
We're trying to leak it, it's a secret
and you're seeing what it's finding here
and it's not finding any character data from the secret.
And just so that we're all clear,
I don't have anything up my sleeve.
Comment this out.
No, have to rebuild.
Goes right back to working.
LFENCE works, that's nice.
We like mitigations that work.
But it is a bit slow and it can be really expensive
and there're cheaper ways to do the same thing
if you can go through and mitigate
each and every one of your branches.
With Google and ARM have been looking at
building APIs to do this in a more efficient way
and in a little bit more clear way in the source code
because an LFENCE feels pretty magical to just like,
oh no, no, I just put an LFENCE here, I'm good.
We can do something a little bit better with an API.
There's a lot of work to do that though,
I've got links up on the slides if you wanna go to them.
This is gonna show you, kind of,
where these different organizations are looking
to build APIs, but we don't have anything
that's really production quality
and that you can reach out and use today.
The best you can do right now is actually something like
LFENCE, I think ARM has a similar thing to LFENCE
that they suggest with an intrinsic as well.
But, this doesn't scale well.
You have to manually do this to every single point
in your code, that's really, really painful.
Maybe you can use a static analysis tool to automate this
but what we found is that the static analysis tools
either cannot find the interesting gadgets
that look like Spectre variant 1
because they're very careful and accurate
and they leave lots of unmitigated code
or they find hundreds and hundreds and hundreds of gadgets
that are completely impossible to actually reach
with any kind of real-world scenario.
You can't actually get there and use them
to conduct a Spectre, kind of, information leak.
So, this means that they're not super satisfying to use,
they're better than the alternatives of doing it manually
without static analysis tool,
but they still pose real scalability problems.
Ultimately, my conclusion is that,
this isn't going to continue to scale up
to larger and larger applications.
We're already right about at the threshold
of how much we can do with static analysis tools
and manual mitigations
when we're working on large applications.
So, we need an alternative.
There's another system called speculative load hardening,
this is also developed by Google
and this is an automatic mitigation of variant 1.
This is not related to the Spectre flag
in Microsoft's compiler.
That is not automatic mitigation of variant 1 in all cases,
that handles specific cases that they've taught it about.
Other kinds of variant 1, other instances of variant 1
aren't caught by it, which makes it,
potentially, risky to use.
But this is a categorically different thing.
This is a transformation that removes
the fundamental exploitable entity of variant 1
from your code, and it does it systematically
across every single piece of code you compile.
You still have to recompile your code
but you can deploy this to get kind of,
comprehensive mitigation of variant 1.
Just so you are aware, this is incredibly complex,
it's still very, very brittle, this has been
something that we're working on for a long time
but I don't want you to get the impression that, this is
production quality, ready to go right out the door.
We're all still, really working on this,
but I wanna try to explain how this can work.
Let's take an example.
This is a little bit simplified version of
the Spectre variant 1 example from the original paper.
We have a function, except some untrusted offset,
some arrays and it's going to try and do a bounds check.
So, we come down, we do a bounds check,
we potentially bypass this bounds check.
Let's look at how this bypass will bounds check
is actually implemented in x86.
If we compile this code down,
we get the instructions on the right.
These instructions are going to compare
whether we're below the bound.
If we're greater than or equal to the bound,
we're going to skip this body of code.
That's what this does.
When we're going to use speculative load hardening,
we need to somehow transform this so that a branch predictor
predicting that the bound is within,
that the index is within the bound
and predicting that we enter the code from working.
The way we do this is by,
instead of generating the code on the right,
we generate the code on the left.
So, let's try and walk through this code on the left.
This is for the same C++ pattern
and understand how it works.
First we need to build what we call, a misspeculation mask.
So, it's just all ones.
We're going to use this whenever we detect misspeculation
in order to harden the behavior of the program.
We also need to extract the caller's mask
because speculative execution can move across function calls
it could be interprocedural.
So, we want to the caller pass in
any speculation state that it has
and we parse it in the high bit of the stack pointer.
This transforms the hide it in the stack pointer
into a mask of either, all ones or all zeros.
And in a normal program, you'd expect this to all be zeros
and in a misspeculated execution, this is going to be
all ones just like our misspeculation mask.
Now, we do our comparison just like we did before,
we have our branch just like we did before
and we may mispredict this branch.
If we mispredict the branch though,
we're going to enter this basic block,
when the condition is actually greater than or equal to.
And so, in that case, we have a CMOV instruction
and CMOV instructions today, are not predicted
by any x86 hardware, and so, as a consequence
we can write the CMOV, using the same flag,
greater than or equal to.
And if we enter this block when that flag is set,
which should never happen,
we write the misspeculation mask over our predicate state,
over this state that we got from the caller.
This essentially collapses us to the all ones
if we ever misspeculate this branch.
Then we come down and we load some memory just like normal,
but keep in mind, this may have loaded leakable bits,
these bits may actually be, something that can get leaked
in some kind of actual attack scenario.
There are some operations on this that we actually allow.
These are data invariant operations,
these are the same kinds of operations we would allow
on private keys,
if we were implementing a cryptographic algorithm.
They do not exhibit any change in behavior
based on the data that they observe
and so, they're safe to run over this data.
They just move things around
and there's nothing that you can glean from these.
But before we actually use this piece of data
to index another array, we mask it with our predicate state
or all of those bits over the data that we loaded.
And because of this, if we misspeculated,
all of the bits are now all ones,
none of what we loaded is observable.
And so, the fact that we then do this data-dependent load
This is the core transformation
of speculative load hardening.
And we do this for every single predictable branch
in the entire program, and we do this hardening
for every single piece of loaded data in the entire program.
It's very, very comprehensive.
There aren't these huge gaps in what gets hardened
and what doesn't get hardened.
But there is a catch.
The overhead is nuts, it's just beyond belief, it's huge.
30 to 40% CPU overhead is a best-case, medium-case scenario.
Worst-case scenario is even worse than this.
If you don't access a lot of memory,
then it can be lower overhead than this but, I don't know,
you don't access a lot of memory, which is weird.
For most applications,
we expect this overhead to be very large.
We've built a very large service with this,
we've actually like had them test it,
in a live situation so we can actually measure
the real-world performance overhead,
this is a very realistic performance overhead you can expect
from deploying speculative load hardening to your service.
I am very aware that this is not an acceptable
amount of overhead for most systems.
They probably don't have the CPU just kicking around.
If they're latency-sensitive,
this is actually going to impact your latency.
If you're not latency-sensitive, you're still going to need
a 30 to 40% increase in capacity of CPU to handle this
or 30 to 40% decrease in the amount of battery you have
if you're running on a device.
This is a really, really problematic overhead.
Unfortunately, this is the best that we know how to do
while still being, truly comprehensive.
The only things we know to really reduce this at this point
also open up exposure to various forms of attack
and that's not what we want,
that's not the trade-off we wanna make.
So, what else can we do?
This has been a grim list of, stories about mitigation.
The other thing you can do, is you can isolate
your secret data from the risky code.
Sandbox any, and this is actually the thing that works
even for untrusted code.
When you have sandbox code, you have to actually separate it
from the data with some kind of processor level
security abstraction, typically separate processes
on a modern operating system.
That's, really the only thing that's enough
for untrusted code, because this is the only mitigation
we realistically have for variant 4.
This is what all the browsers are working on
in order to mitigate variant 4, long-term.
Everything else looks short-term, too expensive
or doesn't work in enough cases.
The other interesting thing is, if you do this,
this protects against all of the other variants of Spectre.
If you actually, can separate your code in this way,
you are truly protected from Spectre, and it gets better.
You're also protected from bugs like Heartbleed.
It's now, very hard to leak information at all
because the attacker doesn't have access to the
program that actually is touching the secret data.
So, the extent to which you can design your system this way
it can really, really increase the security of your system,
it can really make it hard to suffer from information leak
vulnerabilities in general.
We really do think this is a powerful mitigation approach.
Ultimately, you're going to need
some combination of approaches targeted to your application,
oh, I almost forgot, sorry.
I forgot, we actually can live demo this too.
Just so that we're all on the same side.
I build this and you can see there's a little,
there's an extra flag in there and now when I run it,
whoa, that's not good.
Helps, if you run the right program.
So, when I actually run the mitigated one,
it doesn't leak anything.
This is just like linking random bytes of data.
If you want, I can open up the binary, we can stare at it,
it's gonna look a lot like what I presented.
But, this actually does work.
You do want to expect to like,
need some mixture of these things.
You've got to look at your application, your threat model,
your performance characteristics,
how much of an overhead you can take
to pick some approach here.
There's not this, oh yeah, you do this, this, this,
you're done, go home, everything is easy.
That's why I gave a long presentation about it.
This isn't sadly, the easy, easy case.
There's also some stuff I want to see in the future
because like I said, we're not done here,
we're not finished.
So, I've got three things
that I would really, really like to see.
Number one, we have to have a cheaper
operating system and hardware solution
for sandboxing protections, like the last one I mentioned
because that's the most durable protection,
provides the most value by far.
We need an easier way to do this.
The browser vendors are really struggling doing this today
and we should make that much, much better
so that we can deploy it more widely.
The second thing is, cryptography really needs to change.
The idea that you do cryptography
with a long-lived private key that you keep in your memory,
is a very bad idea.
We need to go and make sure
every single cryptographic system
is separating the long-lived private key data
into a separate subsystem and a separate process,
potentially, leaving it on disk until it needs it
because this is too high risk.
We have the cryptographic parameters we need here,
things like ephemeral keys in TLS 1.3,
we have good techniques here in the cryptographic space,
we need to use them,
we need to stop using older cryptographic systems
that require these long-lived, stable private keys,
especially, small elliptic curve stable private keys
to be visible in memory, to a system under attack.
That's a very, very bad, long-term proposition
in the wake of Spectre.
And last, I think we have to solve Spectre v1 in hardware.
I do not think, that anything I've shown you for v1
is tenable, long-term.
I think we may be able to sneak by
for the next five to 10 years,
while the hardware community moves on this.
I understand that there are real timeline issues here
that they cannot change, but they must actually,
solve this in hardware.
Think of it in a different way.
I do not believe that we can teach programmers
to think about Spectre v1.
How do we teach programmers?
We say, like, well, you have these set of assumptions
and once you build up these assumptions,
you work within them and then you build up more assumptions
and you work within those, and you build up more assumptions
you work within those.
And how does Spectre work?
It says, eeeh,
You have all those assumptions, they're very nice
but I didn't pay any attention to them.
Now, we have to teach people
to think about the behavior of their code,
when literally, none of their predicates hold
and I don't think that's viable.
This is different from saying, like today,
we have C++ without contracts.
We're gonna to get contracts to it.
This is worse than going back to C++ without contracts
'cause today, what we have are unenforced contracts,
we have contracts in our documentation, in our comments,
We have asserts, we have predicates, everywhere.
Imagine having none of them
and having to write code that was correctly behaved
even in their absence.
I don't think that that's viable, and so, I do not think
we can exist in a computational world where a Spectre v1
is a thing programmers are thinking about.
I think we have to actually remove it.
And so, I'll give you a brief conclusion.
Spectre, misspeculation, side channels
give you information leak of secrets.
It's a new and it's an active area of research,
this is going to keep happening for a long, long time.
We have at least a year, maybe years, plural,
of issues that have yet to be discovered.
You need to have a threat model
to understand its implications for you
and you need to tailor whatever mitigation strategy
to your application because there is not
a single one that looks promising.
And ultimately, I want all of you to help me
convince our CPU vendors, that they must fix
Spectre v1 in hardware.
We can't actually sustain this world
where our assumptions do not hold.
So hopefully, you all can help me with that,
and I thank all of you and I also wanna thank
all the security researchers that I've been working with
for the last year, across the industry,
it's a tremendous group of people,
they've taught me a whole lot.
Hopefully, I've taught you all at least a little bit
and I'm happy to take questions.
Just as a quick reminder,
we only have a few minutes for questions
like four or five minutes for questions.
I would really encourage you,
focus your questions on my talk.
We're going to have a panel to talk about
everything to do with Spectre in about,
just over half an hour.
I'll be there, a couple of the other folks
working on this will be there.
If you have generic questions, feel free to wait until then
and we'll try to answer them then.
With that, let's do the question on the left
or the right here.
- Some mitigations require a compilation.
I'd like to understand, it's like a compilation of
It's not see specific problem,
it's processor instructions, specific problem?
The key thing here is,
as we start to work with Spectre,
we see an increasing need for you to be able to recompile
all of your source code in your application somehow.
Because all of it, potentially has, the vulnerable piece.
- So, that's true about Java managed systems and whatever?
- To a certain extent, its true of Java and managed systems
however, constructing ways to actually
break these types of things
is much harder in managed systems.
This all is based on the fact that
the speculative execution executes code
that is actually not supposed to run.
So, eventually, the pipeline will catch up
and the CPU will realize that,
I'm actually not supposed to execute this branch
and then stop executing it.
Just, like a ballpark estimate,
how much code can I get into that
before the CPU realizes that, I shouldn't be executing this
and stops doing it.
- That's a great question.
The key question is,
how much code can be speculatively executed in this window?
What's the window of my risk?
I have been asking processor vendors
that question for a long time and they will not answer me.
But I'm not throwing them under the bus.
I actually understand why,
increasingly, I really understand why.
I don't think that there is a simple answer,
it's not that easy to reason about because,
what you actually are seeing is the exhaustion
of resources on the processor.
But different kinds of instructions
exhaust resources at different rates.
It's very hard to say, oh no, 100 instructions
and then you'll be done, because different instructions
may take up different amounts of resources.
However, in practice, we have seen
hundreds of instructions execute speculatively.
Not tens, hundreds.
And we should expect that we will get better and better
at tickling this particular, weird part of the processor
and sending it further and further down these traces.
We should also expect that processors are going to speculate
more and more as they get larger and more powerful.
- You said a mitigation for this is to put untrusted code
in a separate process from the secret data.
- But you also said that there's something called NetSpectre
where you can exploit over a network, how does that work?
- If you're moving untrusted code into a separate process
what you're protecting the data from, is the untrusted code.
You can also move trusted code that handles
untrusted inputs to a separate process.
And then, NetSpectre is going to
leverage that code to read data in that process.
But if that process doesn't expose to its untrusted inputs,
any control over the inputs to the process
with the secret data, you can't construct an attack.
And you have to think really carefully about,
just how trusted is my input?
Can I fully trust, can I fully validate
the communication, the secondary communication
from the at-risk process to the trusted process?
But sometimes you can do that.
Sometimes you can say like, no,
all of the communication there is written by the programmer,
All we can do is select between those,
we can't construct arbitrary risky inputs,
so now, we can trust our inputs in the trusted process,
we don't have to worry about a Spectre vulnerability.
- So, we have to think about, not just trusted code
but also, trusted input? - Absolutely.
At-risk code is either untrusted code
or code handling untrusted data.
- Cool, thanks.
- It seems to me that the whole issue is because,
the CPUs are trying to speculate where they are going
and try to do this optimization
on the way of, they are working.
How bad would be to turn this completely off?
- What's the cost of turning off speculative execution?
It's actually pretty easy to simulate this.
When I built the speculative load hardening compiler parse,
I also built something that added Intel's
suggested mitigation of an LFENCE
but instead of doing it only on the risky branch,
it adds them on all of them.
It's a very simple transformation,
much simpler than the speculative load hardening.
And I measured the performance of that.
And that's actually an interesting thing to look at
because what LFENCE does, is it essentially,
blocks speculation past the fence.
And so, this doesn't turn speculative execution
completely off, but it dramatically reduces
speculative execution on the processor.
The performance overhead of this transformation
was somewhere between
a 5X, to a 20 or 50X performance reduction.
There was like several very tight computational loops
so, well over 20X performance reductions and at that point,
I started having trouble measuring with high accuracy.
I don't think that's even remotely desirable
due to the performance impact.
This shows you also, how incredibly important
speculative execution is.
No one should leave this and be like,
"Oh, those processor designers,
"why do they have to use speculative execution?"
It makes your program 20X faster.
It's really good, unfortunately,
it does come with a problem.
- Hello, I wonder on the impact on compile optimizations.
For example, when it was pretty new I tried to get rid of
all my indirect jumps by just not using function pointers
and I observed that basically,
the only option I had to parse to my compiler was to
disable jump tables to get rid of it.
Like some compiler parsers now being overthought
to like maybe, generate completely different code.
- The question is, is Spectre really changing
how we think about compiler optimizations?
I don't think it is in a lot of ways
because a lot of software isn't really impacted by Spectre.
So, we want the optimizations to run there.
But when we know we're mitigating
against some part of Spectre, we definitely turn things off
So, when you're using Retpolines for example,
we turn off building jump tables, so that
we don't introduce more of these risky things
that we then, have to transform.
But I don't think there's a lot of impact beyond that
Mostly, the impact on compiler optimizations is figuring out
how we can mitigate these things less expensively.
- Okay, thanks.
- Most of the stuff on memory leaks
all happens during speculative execution and gadget chains
are relatively inefficient use of instructions.
How deep can you go, how many instructions can you execute
speculatively, given those two things combined?
- Again, we don't know, we don't have card answers here,
but our experimentation shows, hundreds of instructions
which is more than enough to form
any of these information leaks.
And remember, even even though your gadget chain
for a wrap-based gadget chain, may be fairly inefficient.
The set of operations needed here is fairly small.
They fit into a pretty tight loop,
especially if you're willing to have a lower bandwidth
I used a fairly high bandwidth,
high reliability timing mechanism.
There are other approaches that are much
shorter code sequences, that for example,
extract a single bit at a time rather than extracting
all eight bits of a byte in one go.
And so, there are a lot of different ways
you can construct this.
- Thank you.
- It sounds like you said that,
none of these approaches will work across a process
or a hypervisor boundary, and I was just curious
if you could elaborate a little bit on why that is
and what protects us in that scenario.
- The key question here is,
why are we safe across these boundaries,
these operating system and hardware boundaries
such as system calls, privilege transitions,
virtual machine transitions?
Fundamentally, we aren't protected by these inherently
but the operating systems and hypervisors
have all been updated in conjunction with the hardware
to introduce protections on those boundaries.
And so, that's why, the very first thing I said was,
you must have the operating system mitigations in place,
otherwise, you don't have the fundamental tools
to insulate one process from another.
- Thank you.
We're gonna cut this short
but I'll take these three questions.
If you do have a question that would be fine at the panel,
consider if you can just, wait in 20 minutes
and you can ask it then.
- You said that basically,
if you don't have anybody to steal the secrets,
then you're safe, so like, nobody
your process communicates with--
- You're safe from from information leaks.
I think I remember reading, when Spectre came out
that you can actually use it by just running
another process on the same machine,
so like, there's no obvious communication going on
but you can like time caches or something,
without any relation to unit processes.
- You have to have some way of influencing the behavior
of the thing you're running.
There are some edge cases where you can do that
from outside the process, as just a sibling but those are
pretty rare and isolated,
I think it would be very, very hard to do that.
You have no way of triggering a particular type of behavior
of the victim.
It's gonna be very hard to cause it to then,
actually leak the information you really care about.
This is less true for some of the other things
that are mitigated at the operating system level,
but for Spectre specifically.
- Can you tell us anything about Spectre
and non-memory related side channel attacks?
- The question is, are there other side channels
and the answer is, yes.
There are many, many, many other side channels.
BranchScope showed a,
branch predictor-based side channel.
The NetSpectre paper included a frequency-based,
very generally, a frequency/power-based side channel.
Essentially, any bit of state
in the micro-architecture of the processor
that you can cause to change during speculative execution
and that does not get rolled back is a candidate
and there are a tremendous number of these things.
- Thank you.
- You ended with you talk with a sort of, call to arms
for us to help you convince--
- I wouldn't say arms, I would say action.
- Action, sure.
For us to help you convince hardware vendors
to mitigate this in hardware.
I have heard that Google spends quite a lot of money
with hardware vendors, so,
one might be forgiven for wondering
if Google can't convince them,
what hope do the rest of us have?
- The key issue is,
why is one person asking the hardware vendor
if that person buys enough CPUs,
why is one entity asking the hardware vendor, not enough?
Fundamentally, these hardware vendors
are not in a good position to scale their production
and their economies of their production
in ways that differentiate between customers arbitrarily.
So, if only one customer really needs this to happen,
they may not be in a good position
to spend a tremendous amount of money building that
when only one of their customers will benefit.
If all of their customers want it,
then they get the full economies of scale
for that particular feature.
My fear is that,
this feature is going to be expensive enough
on the hardware end, that unless it's universally desired,
it won't make economic sense to the hardware vendor,
and so, that's why I think, everyone needs to do this.
But it's also important to keep in mind,
we literally do not know how to do this yet.
We have some ideas, a few people have ideas,
they're not fully fleshed out, we're not sure that they work
we're not sure that they're implementable.
And so really, the first step is
to try and figure out how to do this,
what the cost would be and then hopefully,
if there is a way to do it at a cost that at least,
is reasonable, if the entire user base of these processors
lobbies very effectively,
I'm hopeful that the processor vendors will actually step up
and provide a real solution, long-term.
But with that, we should probably end the Q&A
and hopefully, you'll all come to the panel session
which will be a lot of fun, thank you all.