Cookies   I display ads to cover the expenses. See the privacy policy for more information. You can keep or reject the ads.

Video thumbnail
- Good morning everybody, I think we're nearly in.
So, I'm gonna start right on time
because I have far too much content to fit in.
And we may not have too much time for questions at the end.
Don't be afraid to come and find me afterwards
and ask me questions if you have any.
So, my talk this week, this week?
This year is The Bits Between The Bits,
How We Get To main, and it's one of those
topics where as programmers we typically
don't think about it too much.
I mean, certainly, I'm most famous for that website,
and that website shows you assembly code.
You type in a bit of C and you see a bit
of assembly and you're like, that's cool,
but it doesn't really tell you the why
to picture, how everything fits together,
how the program actually starts up
and how your code actually starts executing.
And there's a lot of stuff that goes on before
you even get to main, and that's what we're gonna
be talking about, and some of the tools behind
the scenes and why some of the things you may've
learned along the way about don't do this are so.
So, first of all, the name, The Bits Between The Bits.
How many people have ever heard of the band Ozric Tentacles?
One, two, ole Brits, alright.
So, the bits between the bits was
one of their least-inspired albums.
For some reason, the name stuck with me as being
a name for like the stuff between the important
things that you don't really think about,
and that's the bits between the bits.
Now, I happened to Google for this picture
to do these slides, and I found the website
progrock or progarchive or something,
and it described the album as good,
but non-essential, and I just hope
that this is not inauspicious for me,
but anyway, we're gonna start with a program.
This is about the simplest program you could write.
I don't think there are many, short of the white
space between the parens and the squiggly bracket
things, you couldn't write a smaller program.
What does this compile to?
I'm gonna use gcc and, for the whole of this talk,
actually, we're gonna be talking about Linux, mainly.
Just a quick note on that, I don't know much
about Windows nowadays, so I'm not really qualified
to talk about it; I assume that similar mechanisms
to the ones I'm gonna be describing happen
inside the Windows linker in runtime, but I'm
not the right person to talk to you about them.
So, I'm gonna use GCC; this is GCC 8.1, I think,
just the one I happen to have on my laptop,
and I'm optimizing for size.
What is your idea about how this program compiles?
How big will this program compile down to?
Any kind of thing, sorry what was that?
(audience member calls)
5K, anyone higher than 5K?
- [Audience Member] 13.
- 13K, anymore?
- [Audience Member] 42.
- 42K, 42 bytes, no 42K, yeah.
You're onto the right ballpark.
The C version here, if I compile that as a C function,
as a C program, sorry, 7976, okay.
So, I think we award the prize over there.
How much do you think the C++ program compiles to?
(audience members calling)
100K, 100K exactly the same.
(audience laughing)
Turns out, if you're not using the features,
you don't pay for them, thematic, I think.
I was surprised as well, the binaries are different.
There are some things, there's some padding
going on which means that the swapping one
is taken up by the other, so, it, anyway.
That's where we're starting, so what is in it,
though, because 8K represents almost all
of the programmable space in the first computer
I owned, to do nothing, right?
What on earth is going on here?
So, I'm gonna use some of the tools
of the trade to sort of poke through and work
out what is in there, and I'm gonna use objdump.
Does anyone use objdump out there?
Yeah, there's a whole, oh my gosh,
and you're gonna be telling me.
There's more people out there that
know what is going on than I am.
And I'm gonna say to objdump, hey,
can you disassemble that because,
presumably, there's code in there,
and that's what's taking up all the space.
So, the minus lowercase d in there is the disassemble.
The capital c means do the horrible
de-mangling so I get to see C++ stuff,
and the --no-show-raw-insn means I'm not
interested in seeing all the bytecodes.
I just would like to have an idea about what the code is.
And this is what we see, and this is,
now I have to slide around a bit, first of all,
there's code that I didn't write.
There's this _init function.
Hmm, interesting.
Here is a function called main,
and it has xor eax, eax, retq, now that's what
I was expecting to see, three bytes, right?
Xor eax, eax is two, and retq is one, brilliant.
But what the heck is all the rest of this stuff?
_start, pages and pages, and pages,
and pages, but, you know, I didn't write this.
But it's actually not that much.
Let's work it out.
The address of the last instruction's 40054c,
and at the beginning it is, oops, excuse me,
4003b0, so it looks like 200 hex from beginning
to end which is 512 bytes, okay.
Well, what the heck's the rest of it, then?
So, let's just dump everything.
Let's dump everything that's in this executable,
and now we start to see what's going on here.
There is a header with a load of stuff.
There's some section headers with loads more things.
There's program headers; there's strings.
There's more sections; there's all this stuff.
So, what is going on here?
And that's what we're gonna talk about.
We're gonna say what on earth does it take
to make something executable and runnable.
So you probably know that on Linus the file
format for the executables is called ELF.
ELF is the executable and link format.
And it comprises a header at the top,
which explains, hey, I'm an ELF file,
and this is the kind of, these are ways
you can find extra data about me,
what architecture I am, that kind of stuff.
It has a program header table which explains
to load me as a program, you need to do these operations.
It has chunks of binary data which constitute sections.
We'll talk about those in a second,
and then at the end there's a section
header table that explains where all
those sections are, what their names
are, what properties they have.
It's not a great picture, by the way,
but it was the easy one to steal from Wikipedia.
So, sections, what are sections?
They're blocks of code or data
in the executable, or maybe not even
in the executable, we'll get to that.
For example, the code itself is stored
in a section with the name .text.
Why it's called text is beyond me,
but I don't know, for those of you
if you've been using UNIX for a while,
you've probably at some point tried
to overwrite a program that's running
and you've seen the error: text file busy.
And so, in my mind since I was an undergraduate,
I was like what is this, is it like some
kind of tome that the processor is
leafing through page by page reading;
this is the text from which it reads,
but for some reason, code is text; fine.
The read-only data is separated out.
It's separated out because then it can
be loaded somewhere and marked by the processor
as read-only; any attempt to write to it
would cause a programmer fault.
The data section contains data, is readable
and writable, and then there's bss.
And so, there's a lot of strange things.
We've already said that text was a strange name;
bss is even stranger, it's the block started by symbol.
Now, I think somewhere along the line
from the mainframe technology that this was
developed on, the meaning behind that has been lost,
but bss now to everybody means zero-initialized data.
So, if you've written anything that's a global
scope or explicitly or otherwise set to zero,
it gets put into this bss section
which is gonna be cleared for you.
The cool thing about that is that that doesn't
have to be stored in your executable.
There just has to be something in the executable
that says I need 16K's worth of zeros, please,
over here, and that way we don't have to store
all of that zero-initialized data.
I know it would compress really well,
but you know, why store it if you don't need to.
So, let's start, no let's go onto a more interesting
than an empty program program, and in this one
we're gonna do something which you should not do.
That is, we're gonna have a global object
which has behavior that runs at construction time.
I mean don't have globals, if you can avoid them, obviously.
Although there are definitely some globals out there.
Now who can think of a global that exists in C++
that's... oh yes, thank you, yes, std::cout is
a global variable, right, so, presumably, it has
a constructor and it has to do something
quite interesting during its construction.
So, we're gonna look at it from the point
of view of our own objects, but again,
don't do this; this is not best practice.
I am not Jason Turner. (laughs)
So, we've got a Foo class; it counts
the number of Foos that are in existence.
We've probably all done something like this
before to sort of diagnose problems with leaking
objects and stuff like that, and I'm gonna use
a static counter to count the number of Foos
that live, and C++ being what it is,
I have to say it up here and then say
it again down here to define it.
And then, in the same file on the right hand
side, I've got it as a global object.
Who would like to say what they think this program outputs?
Undefined, someone's saying over there.
Anyone?
One numFoos equals one.
Zero, alright, what does it print?
Well, if you run it, and I'm gonna
print run it with O zero here just
because the code later on that we're
gonna look at, it's gonna be that much easier.
numFoos equals one and it is my belief
that this is well-defined behavior
because by not specifying numFoos with a value,
it is a global or a statically initialized
thingymajig which gets put into that BSS section.
I beg your pardon?
(audience member speaks)
Right, there's a discussion; I don't have
time, unfortunately to go into that bit,
but, numFoos is equal to one.
Okay, so, somehow before we get
to that cout, numFoos has become one.
Now there are a number of ways that
that could have been achieved, right?
The compiler could read through my code,
and go, hey, this must be one.
There's no reason for it to ever be zero.
I'm just gonna write a one in there, and it's one forever.
The compiler could be inserting code
at the top of main that caused the constructor
to global ahead of me, to sort of insure
that it gets updated, but which one of those
is happening or is something else happening?
I'd love to know because that's the kind
of weird spooky thing that happens outside
of the domain of the normal program.
The way that I went around finding this out
is to bust out the debugger.
So, it occurred to me while I was doing
this that I didn't actually really understand
this myself, and that I'd agreed to do a presentation on it.
So, what I thought would be more interesting
than me reading up on it and then just regurgitating
it is showing you how I discovered how this stuff works.
So, I have inside this live directory
that global example and I'm gonna run gdb on it.
Well, I meant to do GDMSQ, so let me just list.
There is my program; I've wrote it in,
and what we can do is, I haven't even started
executing the program yet but I can ask gdb,
hey, what is the value of Foo numFoos.
It's zero, okay, so clearly my first hypothesis
of somehow the compiler reasoned that I could
never see the value of it in one is not true.
It's zero when the program starts up.
So, let's put a break point on main and run.
So, we're now on the line that's about to cout
the number of Foos and I'm gonna disassemble it.
And for those who don't speak gdb, or even assembly,
that little equals thingymajig there is the next
instruction we're about to run, and this is
the beginning of the main function itself,
and all that's happened before where I am
right now is some stack manipulation.
So, there's nothing up my sleeve.
There's no tricky instructions that are
about to do anything other than that.
What is the value of numFoos here?
Clearly, there's no code that's had
a chance to look at the number of Foos.
So, that's good, Foo, numFoos.
One, okay, so somewhere in-between me loading
the program and the first line of main
being called, that Foo got incremented.
Now we could get clever and do some
write break points and all that kind
of stuff, but it's much easier just to put
a break point on the constructor itself.
So, I'm gonna put a break point on the Foo constructor,
and I'm gonna restart the whole thing.
So, run.
Yes, yes, yes, yes.
Okay, so we're on the line that's incrementing numFoos.
Where did we come from?
Backtrace.
Okay, so there's my code, and then there's
some funny looking things there that I didn't write.
So, that's enough of live gdbing.
This is what the call stack looked like
with nice colors so we can sort of look through it.
So, there's code I wrote, Foo::Foo.
And it's attributed to line six of global.cpp.
Here are two functions I did not write.
And, in fact, if I had written,
I would be wandering off into undefined behavior
land because they have two underscores in them
and you can't put two underscores anywhere
in the name, and I think the other rule is
you can't use underscore and leading underscore
and a capital letter afterwards.
So, these are symbols that I'm not allowed to write.
But they've both been attributed to global.cpp,
my file at various points, so that's interesting.
The next two are libc functions.
Libc, not libc++ or anything to do with C++.
This is the C runtime starting up, and ultimately,
it was called from _start, no main at all.
We're nowhere near main at the moment.
So, something else is going on before we get to main.
Where on earth do those functions come from,
those funny _static initialization?
Well, there's a website for some of this stuff.
Let's go and have a quick look.
Sorry. (chuckles)
So, we can go and find them over here
in the right hand side.
Here's static initialization and destruction,
and it is attributed like the thing says.
You can't really see all that easily on there.
The line 18 on the far left is showing
that somehow this static initialization
and destruction function was generated
at the same point of the end of the file,
or maybe the closing brace of main
is actually the end of the file.
We can't tell in this example.
And then some code down here, which is calling
Foo::Foo globalFoo.
So, this is the global constructor Foo
is in the middle of that function,
and that is attributed to the actual definition of Foo.
So, again, something seems to have
happened when I define a global variable,
and the compiler emitted some code somewhere
into some magical function that it's writing for me.
And then there's another; this is this other
global sub_I ::numFoos function which just,
all it does is call static initialization and destruction.
Okay, but who calls this function,
oh dear, yes, no, I was gonna say,
how does it get to my function,
that's a bad title for this slide.
How does it get to this function?
Well, at this point, I've been doing
a lot of this stuff on my commute
in and out of work and so I have
very spotty cell coverage, but at this point,
in my spelunking I discovered that I had
decent enough internet to Google
for that libc function name, and I was
able to find the libc source code,
pull it down and have a look.
Oh, darn it, sorry, forgot about that slide.
So, if you've ever looked at the libc code,
it is beautiful; it is wonderful.
It is a testament to engineering skill,
but it is not the most readable code
in the world, mainly because it has
to support pretty much every platform
going, and every architecture going,
and a lot of different compilers, too.
So, you'll forgive me for having paraphrased
this down to the very small bit of code that's going on.
So, this is that libc_csu_init.
It was one of the functions that had been
called from the _start that ultimately called
my function through those funny underscore named functions.
And so, I don't know about anyone else here,
I can't read function pointer syntax nakedly.
So, I have to typedef it or I should've done
a using here, I suppose, but this is a c file.
So, what this is doing, for those like me
who need to look at it every time, is defining
a type called init_func, and that type is
a function, a pointer to a function, I should say,
that returns void, takes an int and two char**'s.
Int and two char stars, stah, chah, sorry,
try saying that, two char**'s, that's
sort of strange, that sounds a little bit
like the signature to main, right?
Okay, so that's the init_func.
It then also defines two externally defined
symbols, and they've called __init_array_start,
and __init_array_end, and they are defined
as being arrays that live somewhere
else out in space somewhere, and then,
we subtract the end from the beginning.
That tells us how many things are in this apparent
array of things, and we call each in turn,
and we parse in argc, argv, and envp.
Who've forgotten that the envp
could be put on the end of main?
I had before I looked at this slide here.
There's the third parameter you can use here.
Okay, but where do these init_array_start
and init_array_end comes from?
It's clear that something is going
on where an array of function point us
to things I would like to run at startup
has been constructed, but I'm compiling
my one file at the moment, and presumably,
every file I compile that has a global
would like to have its functions initialized
as well, but how do I build an array
of contiguous function pointers when I'm
compiling a whole bunch of things separately.
If you try and think about how to do that in C++,
I don't know if you could across multiple
translation units combine together
something like statically.
Jason's now starting to think and see if he
can come up with a solution, obviously.
Well, there's a clue; if we go back
to the example we looked at before,
and I know, probably, this is all noise
to you whenever you use compilers probably
but these funny things at the top.
If we take out the filter which turns
off all of the assembly directives,
then we get to see all of the other things
that the assembler is told by the compiler.
And if we scroll far enough, actually, let me
just zoom down to it, somewhere after all
of the code has been emitted, and there is
that funny function, the global initialization
function, there's this section, sorry,
overloading the word section here.
There's this part of the code which has
a .section directive in it and a whole bunch
of things, so there's a clue there.
Oops, sorry, forgot to close things.
Ah, there we go, so it's this bit of code.
And so what this is doing, is it's a directive
to the assembler to say, you're currently assembling
code and you're putting it into a bucket
called the text bucket, right, we know
that the text is where the executable
stuff goes, can you now switch over
and start putting things into a different bucket,
and that bucket I'm gonna call init_array,
section is probably a better name than bucket
'cause that's what they really are.
Align to eight byte boundaries and then I want
to put a quad word, and eight byte pointer
to you and me, that points to the function,
that function that we defined, the one
that has the initialization in it,
and then the .text is shorthand for .section,
go back to putting stuff into the text bucket.
So, we've kind of bifurcated the program at this point.
We've got code that's going into one block,
and we've just put something into another thing
called a section and we've given that section
a .init_array, which was not one of the ones
that I mentioned earlier when we were talking
about .text and bss, and all that stuff.
Hmm, interesting, well, in order to work
out a little bit more about what's going
on, we're gonna have to talk about the linker.
What does a linker do?
What do we think of when we think of the linker?
- [Audience Member] Magic.
- Magic, that is a good answer, actually.
I mean for the longest time the linker
always just seemed like that annoying
long step that couldn't really be parallelized
to the end of my build and that seemed
to take up tons of I/O and all that,
but no, it has a lot of work to do,
and if you look into what the linker's
doing behind the scenes, there's a deep, deep rabbit hole.
So, trivially, the linker collects together
all of your object files that you gave it,
resolves references between them.
We know that in one file you may
refer to something that was defined
in another file, so there must be some
way of tracking between those two.
One thing that I hadn't really thought
about until I prepared this talk was
that it also is responsible for determining
the layout of the executable, like where
the bits go in the actually binary,
like does main happen at the beginning
of the block of code or is it at the end,
or anything like that, and in conversations
with people prior to this talk, it's also doing
something like graph theoretical stuff
where it's following dependency nodes of like
this symbol needs this thing which means
it needs that thing which means it needs
that thing, and stuff like that, and then it
writes the metadata, as well, which says,
okay, I've finished with your program,
here's the program header, here's all
the bits, and this is where I put them.
We've got a slightly more representative program
now that's actually in two files so that we can
see how this link process is gonna come together.
It's the hello world, and on the left hand side
we've got the bit that does the printing,
and on the right hand side we've got a pluggable
getMessage function which we've written somewhere else.
Main cause greet, greet cause getMessage.
getMessage is defined in a different file.
We compile them in the obvious way,
link them and it does what you'd imagine.
No surprises there.
What are those files that we generated though?
I kind of glossed over it.
Normally we just type make or we type CMake
or we do whatever it is and we know that
somewhere there's an object file,
and odds a million there's an executable,
and the linker kind of brings them together,
and what are those object files?
Are they just big assembly files or something?
I don't know.
Well, this is where you bring out the Unix tool
file which is like a great, I have no idea what
this is, please tell me what this is, tool.
I don't know if anyone uses this regularly.
It's just one of my favorite things to run.
It's amazing what you'll discover, like files
that you just find on your hard disc.
And it turns out that well hello, no surprises,
the ELF executable we were expecting it to be.
But hello.o and message.o are also ELF files.
I guess the L in ELF is linker file, so,
executable in link format, so, not too surprising.
And we just saw that the assembler is able
to put things into differently named buckets,
and those are sections and so why not use
the same format that we store our sections
in for our executable for our intermediate files, too.
That also means that we can run those tools
that we were doing before, objdump and ELF,
and ELF dump, ELF whatever, sorry, I've forgotten now.
The other tools which will appear shortly.
So, here we're gonna dump the hello.o.
So this is the one that has main in it
and it calls greet which calls the getMessage
which is defined somewhere else.
And we can see here's the greet function.
And we know that greet is calling getMessage,
and then it's gonna call operator chevrony thing,
streaming operator to stream out to cout.
And so the first call is going to be to getMessage,
but it's apparently it's a call to greet
which is weird because this is greet.
Is this like some crazy recursion thing going on?
But, more interestingly, if you look at what address
it's calling, it's calling the next instruction.
Why on earth would you call the next instruction?
Well, obviously, we don't know where getMessage
is yet; it's somewhere else; we have yet to define it.
When the compiler was emitting this code,
it said I know that somewhere exists
getMessage, but I don't know where so it
leaves it blank, and you can see that.
So, in this instance, I have shown the op codes.
So, e8 is the op code for call, and then there's
an offset here; this is an offset to where
that I would like you to call to,
and because the compiler and the assembler
don't know, they've put zeros there,
and then said, okay, linker you figure
this out, and you put it here afterwards.
And it just so happens that this offset
is relative to the next instruction
which is why 00 00 actually looks
like a call to the next instruction.
And it's actually relative, yeah, we'll go
into that a second, so, look, there's
a whole bunch of these, and interesting
as well, even though main here in the same
file is calling greet in the same file,
it's also just calling itself, and it's
letting the linker determine where that is
gonna end up, you'd think that because they're
in the same file, it could actually
just call between them directly.
I know where I put this so I'm gonna call it.
But it doesn't, it chooses not to.
There's a subtlety to this because main is
actually tagged to be in a slightly different
section name from all the other things.
It's in like a .text.init section
and that prohibits the compiler
from doing some optimizations that it would
otherwise do, unless you're on the highest settings.
And interestingly, as we'll see in a second,
the sections are kind of like the primitive unit
that the linker has to work with.
It doesn't know what's inside each of those
sections, so it has to move them around chunk by chunk.
And so, by putting things into more sections,
you're giving the linker more flexibility
in where it puts stuff, but obviously, then it
needs to patch the code to refer to, sorry.
I'm gonna go to the next slide
and where I actually explain relocations.
So, we're gonna talk about patching.
Those zero, zero, zeros that were in the middle
of the op code need to be turned into something
by the linker, but we need to tell the linker
what it is we would like to put there.
So, here is me dumping it with relocation information.
So, I've used objdump again, and I've used
a different flag to say I'd like to show
the contents of the relocation section.
So there's actually a separately named
section inside the object file that describes
all of the things that need to be done
in order to link my executable.
As it happens, objdump is good enough,
nice enough to interleave these even though
they're in separate blocks of the file.
So, here we can see that the push
and the call are defined here.
And then there's this funny thing
at apparently address five which says,
hey, linker, can you go and find getMessage,
wherever getMessage is defined, go and find
that address, then subtract four from it,
and then please, can you poke it
in using this particular kind of poke,
the R_X86_64_PLT32 at address five.
This is the five here.
So, address five, of course, is,
well, there's four, so this one's five.
So, this it saying write 32 bits worth
of data that refer to getMessage and put
them here in the middle of this instruction.
And similarly, for the other calls down here,
and in fact we didn't notice or I didn't
point out that here is a reference to the cout
global object which also needs to be patched
up in a particular way, but you notice there
are different types of patches here.
This is a PLT32 which is a procedural linkage
table which we'll get to if we have time at the end.
And this one here is a GOT, global
object table, pc-relative things.
So, there's some different things going on here.
The takeaways, really, here is that the linker
needs to be told how to find the symbols,
which symbols to find, and then when it's
found them, where to put that information
inside the binary blobs that represent the assembled code.
There are different types of relocations.
We saw two of them just there.
It's worth noting that those types
are dependent on the kind of instruction
that it's patching, and the architecture
that the linker is working on,
so, if patching an ARM instruction's
very different from patching an X86 instruction,
and in fact, there may be some things like
where constants that can't be expressed in one
instruction in ARM have to be bust into two
parts, and ordered together and so there are
different relocation types that push the top
16 bits into this bit, and the bottom
16 bits over here, things like that.
They're also used, within the same object file.
We saw that between main and greet.
So, if the compiler has decided to elect
to put things into different sections,
it can still refer to things within the same
translation unit, the same object file
and let the linker do the hard work
of working out how to put things together.
So, let's talk about the symbols.
I just sort of said, hey, go find getMessage,
but where is this getMessage bit coming from?
How do we know where getMessage is?
Well, there's another section in the file
that says, this is where I'm defining
all of the symbols that I need, or I provide.
So, if we use objdump again, and say,
what symbols do you provide, hello.o.
Again, hello.o is the main function.
We can see that there is a whole
bunch of symbols being brought in.
This left hand column here, l means it's local,
whereas g means it's global; so, these are
local things that are less interesting to us.
Although, you'll notice that
the static_initialization_and_destruction
and the GLOBAL_sub_I magic functions are
listed in there, although this isn't our global example.
Our greet function is here, and our main function is here.
This F means it's defined in this file,
and you can see there's a whole bunch
of undefined symbols which are
like, hey, I need these somehow.
Someone else has to provide these for me.
So, obviously, if we were to dump
a message.o which just contains
that getMessage function, we'll see
that, yes, it defines getMessage as a global symbol.
So, then, the linker, sort of, reads
all of the inputs; you give it all the obj files.
It identifies all of the symbols that each
of those object file provides; it works out,
then, which symbols provide which relocations,
and it lays out the file in some yet to be
determined way, but you start with maybe with main.
We put main down, and then we say, okay, main,
oh, we need to find this thing; okay, go find it.
Put it here; now patch the references into it,
and so on; so we can see how that could work.
Let's just go through it in a sort of pictorial way.
So message.o here has two sections, the blue sections.
We've got getMessage and, of course, there's
actually some read-only data in there.
The actual string, hello world, is a string
constant and that needs, the bytes for that
need to go somewhere, and then hello.o
has a greet function and main.
And the linker effectively is gonna do
pretty much as I described, collect together
those things, output them in some kind of order,
and then we're gonna have a .text on the way out,
and a .ro-data and some program headers
that say, hey, this is where to find the things
you need, Mr. Operating System, or whatever,
whatever process loads and runs it.
And you'll notice here that the greet and main
are defined in two different sections,
but they've ended up in the same section in hello.
So, you might just think to yourself,
okay, so the linker has a hard coded
set of rules that say collect together
everything called this, and emit them over here.
All the sections named this, bring them, pick
them up, put them over here, and then link
the things between them, and do a little relocations,
but of course, that's not the case because linkers
can be used for more than just boring executables.
The Linux kernel, for example, is linked
with a linker, and so, it doesn't have
a regular layout like you might imagine.
For those of you that've worked with embedded
systems, you probably know that there is some
sort of magical addresses that you like, I just
need my code to start at address one, two, three,
four because that's where the CPU is gonna
start jumping into once they power it on,
and so I need to be able to lay out things
in a much more structured way, and the way
that you do that is through a linker script.
And so here I'm dumping, I'm just running
gcc and saying don't do anything, but run
the linker with verbose, and, oh, well,
for some reason, oop, oh blast, sorry.
This is the problem with these things.
Linker, yeah, okay, so we're gonna
quickly just scrim through this.
So, the linker at the top here,
if we dump it out in verbose mode,
it prints out the linker script which is
effectively a programming language in which you
tell the linker what you would like it to do.
You set the output format, and the architecture.
There is an entry; now this is a hint
to the linker, hint, it's a directive,
sorry, to the linker to say, when you
write out that program header table that tells
the operating system what to do, this symbol is
where you should address the start here.
So, there will be a field somewhere in the header
that says, this is the first instruction
of my program, and it should be called _start.
So that's setting up those metadata.
And then the most interesting part is the sections table.
So this sections table is going to explain
how to take all of the sections that are
coming in from the input and put them into the output.
And so, for example, we'll just pick one
at random at the top here, .interp,
whatever .interp is, in the output
what we're gonna do is we're gonna take
every source file, sorry, every object
file, and find its interp section,
and just basically concatenate them together.
So this is a way of picking up all the interp
sections from all the files, collecting
them together, and then putting them
in a section in the output called interp.
And you can see there's an awful lot of this.
It's pretty complicated, but in the middle
of it all, interestingly, we see
a reference back to init_array.
You remember the global thing we were
looking at right at the beginning?
This is interesting.
So, here it is syntax highlighted.
Now we can actually see how that init_array was populated.
So, what it's saying is that in the output
I want you to create a section called .init_array.
We're gonna ignore this top line for a second.
We're just gonna look at this KEEP.
Inside of that init_array, we're gonna pick up
everything called .init_array from all of the source files.
So here, every global we would've defined
would've made it's own .init_array section,
and then this thing says, pick them all up,
and just plunk them all together, one after
another inside a section called .init_array.
Cool, okay, so now we actually get to see
how those apparently disparate pointers
that were defined in different files
get put together into a single contiguous array.
I mean, it gets very complicated with the C runtime
here, also using the same mechanism.
Interestingly, up here you'll see that there's
a set of things the .init_array.something
which can be sorted by the linker by some level of priority.
Now, I've observed that the C++ system by default
does not use this technique but if you delved
deep inside the implementations of std::cout
and things like that, it's possible that they're
using some extra tagging and sort of magic annotations
to say, no, no, no, please sort me to the front
of the initialization so that cout is ready early.
I know that libc doesn't do this.
We were talking about this earlier,
but it's possible that other implementations do.
And certainly, if you look at the c documentation
of, say, gcc, there's an attribute you can give
to functions which says put it into this section
with this priority and this sort key, effectively.
The other thing that's defined here is these PROVIDE_HIDDEN.
So, it turns out that the linker
script can make up new symbols
as its running, which is pretty cool, right?
So, what this is saying is please, create
a new symbol called __init_array_start
and assign it the value of dot and dot is a magic
thing that says where I am currently outputting to.
So this is the address of where
I am right now in my link process.
So we've effectively put a marker down
called .init_array_start, then we've gathered
together all of the, first of all those
things that have a priority, sorted by their priority,
and then the everything else bucket
that all of your global constructors
will have been put into, and then at the end
there in that last bit, we provide a .init_array_end.
So we now have bounded everything that we need
to run with a start and end, that's pretty cool.
And it's also, you'll notice, why you should
never use global variables because you don't
get to control which order they get plunked
down in, and I've had all sorts of horrible
bouts where we've inadvertently relied upon
that, and it's been like the inode order,
and the system that is actually ultimately decided
which way 'round things get initialized.
And just don't get there; don't use globals.
This is not a best practices talk.
Oh, I forgot that I highlighted it.
So now we know how that global process works.
So, just to recap, the compiler makes
a static initialization function in every
translation unit which calls out to all
the constructors for the objects that are
global to that particular translation unit.
It puts a pointer to this function
into a section called init_array.
The linker then gathers together all of those
init_arrays and puts them one after another,
and the script kind of bookends it by putting
a tagged symbol name at the beginning and the end of that.
And then, finally, the C runtime walks
through that init_array and calls each in turn.
So now we know how global constructors work, hmm.
Things that are interesting about this is
that those linker scripts aren't just for the compiler.
There are some situations in which you want
to write linker scripts yourself.
Again, if you're an embedded system,
or if you are writing a kernel or something
like that, you might need to control very,
very carefully the output of these, of the order
or of the addresses that things get assembled to.
It's also interesting, and I noted while I was
looking at this, that some dynamic objects,
some DSOs that you're linking against
or referring to aren't actually DSOs.
They are linker scripts, and the linker,
when it sees something to link against,
will look at it, and if it looks like a text
file, it will interpret it as a linker
script, and will follow the instructions in that.
So you can have some actual link
time behavior defined in those
which can be used for versioning tricks.
The linker as well, like I said, has
like a graph theory thing in it where it
can actually work out which sections are unused,
and then it can throw them away.
Oh, I didn't show you, but back in here,
the KEEP part, so the KEEP that it's saying
here is a hint to say, even if you think
this isn't being used by anyone else, keep it
because, of course, there's no way it can tell
that these things are actually being used by the C runtime.
So, let me just go back to here.
So, yeah, you can tell the linker
to garbage collect sections that are unused.
It's not on by default, and I'm not quite sure why.
One thing to note here is that the section,
apart from the relocations that poke into it
and change the instructions, the sections
are essentially opaque binary blobs
to the compiler, sorry, to the linker
which means that it can't discard an unused
function inside a section because it doesn't
know that a function exists inside of it.
It's only if a single object file has
a section for which there are no references
pointing into that the linker can discard it.
So, if you've ever wondered why your executables
maybe contain functions for which you think
well, why has this not been thrown away,
it's because the linker couldn't throw away
the section that that function was defined in.
There are flags to the compiler to say,
well, if I put every function in its own
uniquely named section, and every block
of data in its own uniquely named section,
I'm giving the linker the ability to have
a much more fine grained ability to throw
things away, and you can turn those on,
but it starts to prevent optimizations
between functions that you would otherwise be able to do.
So, these are very advised things.
If you really think you can squeeze
out a bit of size, it's worth testing
this both before and afterwards,
and being totally sure that they're
the right thing for you, but they
exist, and it's interesting to know.
Alright, how we doing, goosh, goosh, goosh?
Good, excuse me.
So now we get to the thing that I was
most interested in working out, dynamic linking.
So the 7K executable we saw earlier
doesn't have the whole of the C runtime in it.
I think that's clear, right?
We saw bits of it, that libc stuff,
but it's not like I saw loads and loads
of bits of code that referred to the operator
overload of O stream and that kind of stuff
and that's because the code isn't in my binary.
It's somewhere else, and in fact, this is
the level of which we're talking.
If I do a dynamic link of my hello executable,
it's just over 8K, so the 7K was for the empty
case, if you remember; this is just for the hello world one.
But if I were to statically link it,
it comes in at 2.5MB, which is quite big, right?
I mean there's a lot of stuff going on in that.
There's a lot of C++ runtime stuff in there,
and probably for all the reasons I was
just describing, the linker can't see
that I'm not using bits of it, and throw
them away, so, I'm stuck with it all.
So, dynamic linking is gonna help me here.
So let's just rephrase our hello world
and see what it looks like if we use
a dynamic link aspect to it, and obviously,
the C runtime is far too complicated
for me to delve into right now.
So, I'm gonna split my hello program
into the main and the other bit
which returned the getMessage,
and I'm gonna make the getMessage a DLL.
Sorry, for all those who are Linux people,
I'm sorry, I, DLL is when I'm in C, and DSO,
I know, is the right term for it.
But I think you know what I mean.
So this is just the relatively straightforward
of linking that as shared, and then saying,
please find getMessage in the libhello.so,
and it works, and by works I mean it doesn't work
because DLLs are a pain in the backside.
(audience member speaking)
Oh, the question was why didn't I provide
fPIC and it's because I forgot to put
them on the slides, thank you very much.
Yes, in order for this to have worked,
the code must have been compiled with position
independence which means that it can
be moved around a bit; it has this more latitude
and where the link can lay things out.
So, I did do that, I just haven't put it on these slides.
Thank you for the comment.
Also, I didn't put in here anything to do
with like the rpath catastrophe that you have
to do in order to make it actually work
meaningfully all the time, but that's a whole other talk.
So, let's have a look at what happened.
We linked the hello executable and I'm now
saying, well, I did readelf --help and I looked
through it and I went, oh, these things sound
interesting, what is the dynamic section,
and what are the program headers here?
So, the program headers include this thing
called interp which you'll remember we
actually saw when we looked earlier
at how the sections had been laid out,
and there's this interesting annotation here.
Requesting program interpreter, blah blah blah blah.
Hmm, interesting, okay, we'll note that,
and we'll come back to it.
Some mappings, and then here.
There's a section called the dynamic section,
and here is a load of metadata and you can
see that libhello.so is mentioned, so this is
where somehow I'm communicating to the operating
system that I need to find libhello.so
and load it in before I can be run,
and then there's the dreaded rpath.
We won't talk about that.
So, let's do some more archeology, and what happens.
I have another example here, and we have this,
the hello one, okay, live, hello.
So, let's just list that.
Ah, there's our function again.
I'm gonna put a break point on greet,
and I'm gonna run it, and then, so we're
about to call getMessage, and we know
that getMessage is defined in the DLL,
and if I disassemble, we're seeing we're
about to call _Z10getMessagev@plt, huh,
there's that plt word that we saw earlier.
If you'll remember, that was one of the sort
of mystical sets of letters that were inside
the R relocation, and I just sort of glossed over it,
saying it stands for the procedure linkage table.
Well, this is how the dynamism bit comes in.
So, that's not calling directly to getMessage
because we haven't worked out where getMessage is yet.
Linux, by default, is lazy about looking
symbols up; we'll say why in a second.
So, this call goes to a thunk or a trampoline
or any number of those funny things.
So I'm gonna actually break on 40060,
and actually I'm gonna do stepi.
Okay, so here I am now, I'm actually in that call.
I just stepped over to it calling,
and oh, that's interesting, I stepped too far.
Sorry, let me do that again.
Live demos.
I'm gonna just disassemble it directly.
4006b0.
Oh, this worked before.
I'm sorry, ha.
Oh, that's meant to be break on that.
Hurray, continue, disassemble.
Are you gonna work now?
Yay, okay, phew; I can use a debug, honest.
Alright, so, this is the function.
This z10 getMessage via @plt, and it doesn't
look like my getMessage because I didn't write it.
In fact, the plt is a section that is generated
by the linker and every relocation to a function
which is defined in a DLL, in a DSO (grumbles),
is given an entry and all of the calls
to those functions that are defined elsewhere
come through the plt entry instead, and it's
a very weird looking thing, right?
We've got a jump here, and then after it, there's
a pushq and another jump; that's really weird.
And this jump, this syntax is an indirect jump.
It's saying, hey, look up some other piece
of memory, get an address out of it and then jump
to wherever that address told me where to go.
Hmm, right, we'll I'm not gonna debug
through that 'cause we haven't got
time, but here's one I did earlier.
So this is what it looks like.
The reference to the memory address of where
we're going is that 601018 that's at the bottom there,
and in the moment it's a value of quad 4006b6.
What do we notice about that address?
Where are we gonna jump to?
We're gonna jump to the next instruction.
This is a really, really, really, really
complicated way of going to the next instruction.
Weird, right?
I mean we saw the call to the next instruction,
and we kind of, well, that's odd, but then we found it,
but this is absurd, we're calling a function,
and we jump off somewhere which comes back
to the next instruction which then pushes
and goes somewhere else, and now
if we were to follow this through,
we'd see that would actually happen
one more time for very complicated reasons,
but ultimately, what happens is, that 4006a0 jump
at the end goes off into the dynamic loading
subsystem which looks up what getMessage should
point to, and then ultimately goes to that address.
Okay, right, so, presumably that's an expensive process.
I've got to look through all the symbol tables
of all the DLLs that are currently loaded,
maybe I even have to load them
off disc if they aren't already loaded
into memory, and then, I jump to it.
That's a really expensive thing.
We would not tolerate it if all of our calls
had to go through something as expensive as this, right?
So, this ultimately resolves symbol zero,
that's what that pushq is by the way.
It's pushing the ordinal in the text symbol
table that it's looking at, but it has
to use push because every other register
has something important to the function
you're suppose to be calling, right, so it's
just kind of having to use the stack
as a back channel or a front channel.
I don't know.
The cool thing is that once it's done
that resolution, once it's worked out
where getMessage really is, it writes
it back into that address at 601018.
And so the next time getMessage is called,
that first jump goes directly to getMessage.
So that's pretty cool, right?
We actually kind of patch a code as we go.
Every time we call a function that's in a DLL,
the first time we call it, there's this expensive
process where the lookup happens,
and after that, it's free, cheap, I should say.
It's not free, an indirect call is a little bit
more expensive, the branch predictor will pick
that up, but you should stay and watch Chandler's
talk as to why that is probably gonna not be forever.
Okay, so, why does it do it lazily?
Why does it not just look these things up at the beginning?
You'd think that you'd load your executable,
and it would just do the resolution there and then,
and it's got the set of symbols; it's gonna just go
through them all and it should find out where they go.
Well, there's a whole bunch of reasons why.
The first thing is the C runtime has a ton
of functions and hardly any of them are called by anyone.
If you imagine every time you type LS to list
a directory on Unix, you're firing up a new executable,
running it and coming back out again, and so,
if you were to do the work of looking up all
of the functions that it didn't call,
you would slow down the startup time.
So, it's like a lazy optimization
about how starting up your application
should be fast, and then you only pay
for the functions you use.
Hey, hey, see, only pay for what you use.
This is not always what you want, though,
for example, if you work in the finance
industry and, I mean you shouldn't be using
dynamite libraries anyway, but if you happen
to be, and you wanted to make sure that every
time you called a function, even if it was
the first time, it's quick, you need to insure
that that happens ahead of time, and you
can force it to happen ahead of time.
So you can set an environment variable,
LD_BIND now and if it's set, then that's a hint
to the system that it should just straightaway
apply all of the relocations and fill in that plt
with the actual addresses rather than the one
that goes through the resolver.
You can also specify it as a flag
in the linker which marks a bits
that says please do this, and incidentally,
what is the thing that is doing this, right?
I sort of waved my hands and said, oh,
the dynamic linking system, and I even
deliberately said, oh and the kernel or whatever.
It's not the kernel.
The interesting thing here is that
that interpreter, that I glossed over before,
is actually what is doing that work.
So the kernel's job is to load
in the program, read through the program,
header which loads in just a few blocks
of your program and if there's no interpreter
set, it jumps to the _entry address.
But if there's an interpreter set,
it also loads in the interpreter,
and puts it over here, and then jumps
to the interpreter, giving it the _entry address
is like a parameter, this means that now
the kernel can go away and say I've done
all of the things I needed to do to set
up a process to start executing your code,
and now I'm out of kernel mode and I don't
need to be touching the kernel to change this.
I've given it over to user mode entirely,
and as an executable, that interpreter,
the user libc4ld thingymajig which is now
responsible for starting the whole DLL process,
and it will ensure that the plt's aren't loaded.
It will follow the dynamic section and make
sure that each of these SO's that you need
can be found and are mapped, and then it will
provide effect to the service to which they are jumping.
That's why that first, I pointed out in the slide,
this is like the fourth instruction jumped
to something which then did the same dynamic lookup again.
It's because the interpreter itself is effectively mapping,
yeah, anyway, magic happens, magic happens.
I haven't thought this through and I'm thinking
it through as I'm here on stage and I'm thinking
this is not a good time to be thinking up new content.
Anyway, so that's what happens, the interpreter
is responsible for doing all of the clever machinations
and reinterpreting the sections that say
what dynamic stuff needs to happen.
And if you've ever had problems with, for example,
your rpath and you've wondered, hey, what is
happening inside this dynamic system.
It's really complicated; I run my executable,
and I've got a stale result; I'm sure I changed my code.
We've all done this, right?
I edited my code and the bug's still there.
What did I do wrong?
And normally you've either edited it on the wrong
computer or a wrong copy of the code,
or you forgot to make or your make file failed,
or maybe you just recompiled a DLL, and it's
in the wrong place and so you're not actually
loading the DLLs you think you're loading,
those kinds of things.
Now, to debug that, you can run LDD
on your executable which says, hey, this is
what I'm gonna resolve these things to.
(coughs) Excuse me.
Or you can set the environment variable LDD bug
which the interpreter then uses to sort of print
out the studr loads and loads and loads of things
that it's doing and it's fascinating to turn
that on and just look at what the heck is
going on behind the scenes.
You can do LD_DEBUG=help and then you get help.
Otherwise I typically use LD_DEBUG=ALL,
and then trawl through it all to work out what's going on.
And, for example, you can see that lazy
loading and symbol resolution thing.
I actually did it on ls and you could pause
ls in the debugger and you could continue it,
and step through, and you'd see all these
functions it's calling being resolved
and being output out; it's quite fun.
This leads us to another thing you can do
which is, because it's lazily done,
I can interpose myself into the whole proceedings
and say, I'd like you to do something different,
actually; you can set LD_PRELOAD, the environment
variable LD_PRELOAD, and as long as it resolves
to itself a shared object, your shared object
that you specify will be loaded ahead of time
and its symbols will be injected right
at the front of the symbol resolution process
which means that you can steal any dynamically
referred to symbol that's in an executable.
You don't have to source code to the executable anymore.
Maybe you lost it; maybe you never had it.
But you'd be really interested in instrumenting
code to either open or write, things like that.
Well, you write your own open and write.
You compile them into a dynamic library, and you
LD_PRELOAD them, and then you run your executable.
So you do LD_PRELOAD = mySO./executable I'd like to look at.
And that allows you to actually interpose
and steal those things away from a pre-built
executable which is kinda cool, right?
I mean, for example, if you run a website
which allows users to arbitrarily run compilers
and do anything, you probably wanna make sure
that they're not opening files that they
shouldn't be opening, you know, hash_including
et cetera shadow or something like that.
So, you could do this on open and say
if the file matches some black list,
then return e no pub, which is what compiler explorer does.
Another thing that a friend of mine told me
which is hilarious is that they had a mathematical
analysis tool system, and he suspected that their
mathematicians that provided the service weren't
really up to much programming wise, and although
it was taking many hours to run simulations
that they were getting that actually probably
the problem was that various transcendental math
functions were being called over and over
and over again with the same numbers.
So he instrumented some of the hyperbolic sine
functions and things like that and replaced
them with something which just kept a histogram
of what input values they had, and it turned
out that, yes, 97 odd percent were calling
like sine with the same thing over and over
and over and over again, so he was able
to go to them and say, well, a) fix your code,
but b) I can fix it here for you.
I can put some rudimentary cache hint
to your f sine and now I make your program run faster.
So that's quite cool.
It's also used by some networking systems
to replace the traditional networking layers
inside the operating system so that you can
do direct access to cards that provide kernel bypass.
There's a load of cool things you can do with this.
If you're interested in that, talk to me afterwards.
Okay, I have very little time left.
I really wanted to get to some of the other
cool things that are much more C plus plusy.
So, weak references, you know that if you define
Foo, the function Foo in two places, and then try
and link, you're gonna get multiple definitions, right?
And that's mea culpa, I shouldn't have defined
it in two places, but when I make an inline function,
I'm kind of defining a function in two places,
and sometimes that function is genuinely inline,
like in terms of the optimization processes
inline, but oftentimes it isn't, which means
that if I've used my getFoo function, my inline
getFoo function in translation unit A,
and in translation unit B, there's two copies of it.
Why does that not cause an error?
And the answer is it gets marked as a weak
symbol which is another tag that I didn't
have time to show, and the weak symbol,
the linker says it's okay to have as many
of these as you like, but you get to pick
whichever one you like as the implementation
of getFoo which is great provided they are
all actually the same, and again, the linker has no idea.
These are just bags of bits as far as it's concerned.
So, it can't look at the implementation
of getFoo or whatever and say, oh, this is
the same across all translation units, that's fine.
So this is where ODR violations show up.
If you've kind of got two implementations
that are marked inline of the same function
defined to be slightly different in two areas,
disparate areas of your code base
and then the only time that they actually come
together is in the linker, you're in for a shock.
The last thing I would've liked to have
talked about would be link time optimization
where the linker actually starts collaborating
a lot, lot more closely with the compiler,
and there's a sort of two-way relationship.
The compiler generates an intermediate form
rather than just assembly that is a bag of bits.
It has much more rich intermediate representation
that's then passed to the linker and then in
during the process by which the linker decides
which things are visible or can be reached,
it calls back to the compiler and says,
hey, I need the code for this now,
and the compiler gets to see the whole
world as the linker is laid out and some
amazing optimizations can happen there,
and actually, some ODR violation checking
can happen there that you would otherwise not
be able to catch, which is great.
So that's the way I've encountered some subtle
ODR problems before is by just turning on -flto.
So I recommend you all doing that anyway.
There's a whole bunch of stuff.
So, Ian Lance Taylor is responsible for writing
the gold linker which is the one I've been
using most; it's pretty amazing how fast
and how much sophistication goes into it
because linking is something you prefer
not to have to do, just like compiling, right?
So, and it's the only step that's inevitable
in your whole build process unless nothing's
changed at all, so, making the linker
fast is a super important thing.
For the link time optimization Honza's blog
has got a whole bunch of stuff
about how that works behind the scenes.
Yeah, I guess that's it, so I'm sure you
have questions; I'd just like to point
out that Jason Turner, myself and Charley Bay
will probably be getting together some time
next year and doing a training event.
So, you're welcome to ask us about
that, but I invite your questions.
I actually have two minutes left.
(audience clapping)
Thank you.
Hello.
- [Audience Member] During the linking process,
or during the layout process when it's, I guess
my question's really about the resolution.
You said the kernel has some role it plays
by laying everything out; it's starting
the process, it lays out parts of memory,
and then it turns everything over
to the interpreter script, and does it lay
out every dynamical object that is there,
or does it kind of go, okay, I'm gonna lay
out some of them, and the interpreter can go
back and go, hey, I need that thing,
can you do that for me kernel?
- It's more like the latter, as I understand it
the program headers have a few slabs that they say,
load in executable text and put it at address,
you know, this offset of my file, put it at
this address of memory; put read-only things
over here; put the other bits over here,
and then, that's when it will jump
to the interpreter and all the dll stuff,
then, will happen in the interpreter where
it will read a specially named section
that's tagged as being this is the name
of all the things I need, and that will
continue on from there; it will in map
those in and mark them in the appropriate way
and then do the link resolution.
So, more like the latter.
Cool, thank you.
Over this side.
- [Audience Member] Hey, could you just clarify?
At some point you said you should never
actually use dynamically linked libraries.
- Ah ha, right, yes, so dynamic linked libraries,
I say if you're writing highly performant code,
you should avoid dynamic linked libraries.
They're a barrier to optimization across units.
Essentially the compiler can't see across them
even with all the clever, fancy link time
optimization things that I'm starting,
personally, to rely upon, and there's
this resolution cost, even if you do turn
on the LD by now, there's still a jump
through the plt to get to your function,
so, it's a little bit of a bump in the way.
That's really what I'm saying there.
- [Audience Member] Okay, so if you statically
link, you never get the plt...
- Correct, if you statically link, then the linker
will just write into the instruction, jump
to the actual location 'cause I know where it is.
Thank you, over here.
- [Audience Member] In Windows, in portal
executable format, there is a problem
that every executable has an address
that they want to be loaded at,
and if it doesn't happen, there is
basically similar process, but happening
at runtime; there should be like a relocation
happening; is it also something that
happens in ELF or Linux world?
- So, I'm not actually sure about that.
I know that there are some preferred
addresses that are marked inside the program
header table itself; I don't know if they
are required or not, and I'm not sure
if there's a problem 'cause you're,
I think, specifically with DLLs in the pae format
you would like this is a DLL that would like
to be at like four million in RAM, and that's
where it's, everything's expecting it to be.
Is that what...
- [Audience Member] Yes, funnily enough,
like in Windows world if your project
is built of many, many DLLs, there is
an optimization technique that allows
you to come up with some algorithm
how you assign those addresses based
on alphabet, or something.
- I see, so, I don't think that's happening
anymore and mainly because of address space
layout randomization which means that every
symbol wants to be shoved into some random
place just to try and make it harder
for the bad guys to get in, so, I'm pretty
sure everything now tries to be put into,
like, make it as easy as possible to map it anywhere.
Thank you, oh, another one over here?
- [Audience Member] Actually, thank you,
following on that question; how different
will be all this in Windows?
- Yes, good question, I think at the beginning
I put my massive disclaimer that it's been
about 20 years since I've touched Windows,
so I think I will defer and say there are
probably experts you can find here.
If there's anyone who considers themselves
an expert, and clearly I'm not an expert
at any of this, as you saw from me just like
discovering process kind of by myself,
then I invite you to raise your hand
and speak to my friend here.
James McNellis, yes, if we can find him,
then, yeah, he's a good person to find.
Thank you, sorry I can't be more help.
Another question there behind you.
- [Audience Member] Could you explain what's
the difference between a fix-up and a relocation?
- Ahh, no. (laughs)
No, I don't know, actually, is there anybody
who could in the audience who would know the difference?
I would assume that they're synonyms,
but these things are subtle.
- [Audience Member] It's something like
the compiler emits a fix-up which the linker
then emits the relocation or something like that?
- [Audience Member] I'm not sure how you're using
the word fix-up here, but I do know all about ELF
and stuff 'cause...
- Right well, I defer to the right hand side
of the room in this instance; so the short
answer's I don't know, there, oh, this one
is having a conversation, but you've got
a comment or question, or you can help me out here?
- [Audience Member] I was just going to try
and answer the question.
- Oh, I see, thank you.
- [Audience Member] I don't understand the question, so.
- Right, right, I mean there's a ton
of interesting terminology like trampolines
and thunks and other things that
get used in this scenario.
- [Audience Member] In ARM the linker will emit
sort of trampolines to deal with out of range
branches 'cause on ARM there's a certain distance
that a branch that a call instruction can go to.
So, the linker will see that, oh, I'm trying
to go too far, and then it'll emit a lot
of code between sections that it can,
and then jump to that which'll then do
a farther jump to the target location-y thing.
- Ohh, I've often wondered how that happened.
That's clever; I don't know if people got that,
but there's the encoding and format of ARM
instructions is such that the branch,
everything's a 32 bit instruction in ARM
if you're not in some mode, and so there's
only a small amount of space in the instruction
encoding to put the address of where you'd like
to branch to which means you've got like a plus
or minus some amount...
- [Audience Member] I think it's two megabytes.
- Two megabytes, right, which when I was
doing ARM, two megabytes was more memory
than I had in my computer, so that was great.
But nowadays, obviously, two megabytes doesn't
get you very far, so the linker has to know
when the destination of a branch is too far
for it to reach, and then it has to put
a function whose only job it is is to be
like an intermediate post along the way
to get to the final destination.
Cool, thank you, I learned something.
There's some heated debate over there.
Have you got any comments or thoughts there?
- [Audience Member] No, it's fine; good job; thank you.
- Thank you very much, thank you, I'm E&OE.
(audience clapping)