Okay so I'm gonna start now
even though people are still coming in.
So my name is Louie I work for Amazon and we're gonna
be talking about runtime polymorphism today.
We're gonna start right away because
I have something like 80 slides and just an hour so.
Alright so as a short refresher
I expect that actually most of you
know what you know runtime polymorphism is
but let's get a small refresher.
So imagine you have a set of types
and they're all related by their interface in some way right
and then you want to return them from a function
based on some runtime condition for example,
or in this case some user input.
So I don't know exactly what type I'm gonna be returning.
In this case I need some kind of,
I can only return one type from the function
so I need some kind of erasure
to coerce all these types into a single type.
Or if I want to store objects of these different types
in a container for example I need a single type
to basically represent all of these types
so I'm gonna need some kind of runtime polymorphism there.
And sometimes variant does the trick right
sometimes it's enough, but it only works
when you have a closed set of types
and when you know all of your types in advance
when you can name them, when it's convenient to name them.
Which is not always the case right,
and it's also not always convenient
to use a visitation based approach.
So when that's not the case,
when you have an open set of types,
you want to use runtime polymorphism
as we usually understand it.
And obviously C++ has a solution for that.
We all know what it is, it's called inheritance.
And the way it works essentially is that
you have a base class which defines the interface
and then you have the right classes
which derived from that base class
and they implement this interface in different ways,
and they can have different representations
like I could have different members
in my derived classes and so on.
Under the hood what this looks like is basically
you have a pointer because usually we're gonna
manipulate these derived classes
using a pointer to the base class.
So you have a pointer which points to
an object of a derived type.
And then within that object there isa pointer to a Vtable.
And the Vtable contains function pointers
that are the most derived implementations
for the interface that you defined in your base class okay.
I hope everybody is following me at this point.
This is typically how you do it in C++.
But it has a bunch of problems.
And the goal of this talk is not
to rant on inheritance, that's not the goal.
But to act as like a motivation for what we're gonna do next
I'm still going to go over a few of them.
So first of all it bakes in reference semantics.
In C++ were really really loving of value semantics
but when you start using inheritance you need to
because you need to use a pointer to the base class
basically you lose that that ability
and for example if I make a copy of a pointer
I'm not making it an actual copy of
the object it's pointing to so now
I don't have two different objects.
And that is just really difficult.
That makes it makes it really difficult
to do a local reasoning for example.
It also causes you to do heap allocations because
you gotta store these objects somewhere
and so you're usually going to put them
on the heap and that's costly.
It also bakes in nullable semantics.
You just wanted to have polymorphism
and what you end up with is
having to manipulate pointers.
A pointer can be null.
And now you kinda have to check for null
or at least you're living with
that additional state in your object now.
Because you have resources you need to manage them
and so you have to pick one of our really nice
smart, or not so smart you know pointers in C++
and decide how you're gonna...
What's the ownership of this object like?
And here I'm saying like unique pointer
but really we're all kind of lazy and we
all end up using share pointer right?
And that's even more costly.
And if we use raw pointer then it's complete hell
because whoever gets that pointer
needs to not forget to delete it.
Also doesn't play really well with the standard algorithms
because for example if I want to sort my vector of vehicles
here it's not really going to do what I want.
It's just gonna sort the vector itself.
These are pointers, these are not the actual objects
So it's gonna sort them by their address
which is obviously not what I wanted.
So everything becomes a little more complicated
when you start using pointers
to interact through everything.
It's also intrusive.
If I have a third party library
or like a fundamental type or something
and I want it to satisfy my interface,
or if it satisfies my interface.
I have no way of doing this with inheritance
because I would need to reopen the type
and add the base class to it.
And that's very intrusive so I can't do that.
So anyway the point of this talk
is not to discuss the drawbacks of inheritance.
Sean Parent does it much much better than I do.
So I encourage all of you to watch this talk.
This talk is the runtime polymorphism talk
in his better code series but there
is other talks from Sean that treat this subject.
By the way I think this is one of
the best talk I've seen in my life.
Okay, I find it really sad that C++ only gives us
inheritance to do to achieve runtime polymorphism because
what I wanted is this, essentially okay.
And this has nothing to do with the interface keyword
or the interface metaclass that Herb talked about yesterday
just to set the record straight.
So imagine I have this imaginary
interface thing which creates a vehicle.
It just describes the interface that I want okay.
And then I have a bunch of types
that satisfy this interface.
Obviously they do satisfy the interface.
They don't need to have any base class or any anything.
They just need to satisfy the interface.
And then I could for example create a vehicle,
a vector of those and when I push back
or when I insert my things that satisfy the interface
it makes a copy of them and we're
back in value semantics world right.
We're actually manipulating values.
I mean we have no way of achieving this right now in C++.
Maybe, but you know just imagine at this point.
I think this would be much closer to what we want,
like no issues with ownership,
with memory management, with nothing.
All we have is types that know nothing about inheritance
and we can use them polymorphically, that's neat.
So this talk
is going to be about how we can achieve this.
The whole talk is only gonna be about
different ways of implementing
this imaginary interface keyword,
or rather the imaginary vehicle type that it generates.
This is the only thing we're going to do.
And so with inheritance like I said
you basically have a pointer to a derived type
and then you have a pointer to a Vtable in there.
So to give you an idea of what we're gonna do
we're gonna play around with oh what if I?
What if I take the Vtable for example
and I pull it down one level
and I bring it like into the the yellow box there?
Or what if I try to inline this in there?
Or what if I try to move these boxes around?
What does that give us?
Does that even make sense?
And so that's exactly, we're gonna be
exploring different ways of organizing these boxes.
And I'm gonna I'm gonna steer the discussion
in one specific direction, I'm gonna try
to treat the storage which is this green box here,
and the Vtable or the dynamic dispatch
which is this purple box here.
I'm gonna try to treat them independently.
And so we're gonna find a few ways to do the green box,
and we're gonna find a few ways to do
the dynamic dispatch, the purple box.
And if they're really independent
we're gonna be able to mix and match them.
Which gives us a nice flexibility.
Alright, so first one.
If our vehicle, let's say we just try
to pull down the Vtable one level okay.
So imagine the vehicle type, the magic vehicle type
is not just a pointer anymore.
It's actually a pointer to a Vtable, and yeah go ahead.
So the question is whether
I can point with the mouse cursor.
It's gonna be really difficult because
there is lag with the mouse there.
I understand that yeah.
But this is why you guys are here and they're not there so
I'm just gonna you know waste a bunch of time otherwise.
The idea is what if you pull the Vtable down
into the vehicle itself so the vehicle now
has a pointer to Vtable and a pointer to the actual object.
So let's see how we can implement that.
So here I have my vehicle which has a pointer to a Vtable
and I'm gonna show you how the Vtable is implemented.
There is a type called Vtable there
I'm gonna show you on the next slide.
So I have a pointer to a Vtable and I have a void pointer
which basically pointer to whatever I'm holding
whatever the the vehicle is holding.
I can't make it a pointer to a base class
because there's just no concept
of a base class in this approach.
And then I have a constructor that can
be constructed from anything because that's what I want.
I want my vehicle to be constructible
from anything that satisfies my interface.
So the way you do that is you have a templated constructor.
And I will have some fancy enabled business here
to say you know only enable this constructor
when you have an accelerate method in this case.
I'm skipping a bunch of details because this is slide code.
So I have a constructor that can be constructed
from anything that's a vehicle and then imagine
that I have a Vtable somewhere.
I initialize my V pointer from that
which is what the compiler does under the hood really.
And then I copy my vehicle.
I copied let's say the car, or the truck,
or whatever it is and I keep my pointer to it.
I'm also gonna skip the implement.
For all the talk I'm not gonna show
the copy constructor implementations.
There's already enough stuff to cover,
but they're easy to implement.
And then when I want to call my method
I just go in my deep Vtable by going through my V pointer
and then I call that on my this which is basically
the pointer to the object I am holding.
And when I destroy my vehicle I just go this,
I in direct from my Vtable again
and I go destroy the thing that I'm holding.
So I hope this doesn't look too weird to anyone.
Now let me show you how the Vtable looks like.
So I have a Vtable.
A Vtable is just a struct with function pointers.
It's just a table with function pointers.
And so I have function pointers in there,
and then I need to initialize my Vtable somewhere.
So what I do here is I have this little,
this is a variable template.
This is a C++ 14 variable template.
Basically what I'm doing is I'm creating
an instance of this Vtable for any type T.
And the way I do that, I initialize it with lambdas
which will because they are stateless lambdas
they are gonna decay to a function pointer.
So I'm just creating functions here in line.
And I take in a void pointer this,
but I know I'm the Vtable for T
so I know this this is actually a T star.
So I can static cast it to a T pointer.
And then I do the static dispatch
I call my accelerate method in there.
And since this is static dispatch
the compiler is gonna see this and is gonna be
free to do any in lining it wants.
So this should be basically considered equivalent,
absolutely equivalent to just calling accelerate directly.
And so for the delete here I'm just
static casting to T star and then I'm deleting the object.
So I'm just gonna go back real quick here.
So again you have the vehicle,
it gets constructed from anything
then you initialize it with a write Vtable
so that's a table with two function pointers in there
that have been initialized in the next line
and then I make a copy of my, of whatever I'm containing.
Is anybody completely left at this point?
Okay that's real good.
So I wrote this little library
and the purpose of this talk is not
to talk about the library but I'm gonna
show snapshots of how you,
I'm gonna show the equivalent using the library every time.
So it gives you like a poly type basically which
handles much of the the nastiness under the hood.
And essentially what I want to point you to is
it accepts what I call a storage policy.
So it's a way of storing the underlying object,
and here we're just using the remote storage policy.
And that's basically equivalent to
what we implemented by hand here.
And then when I want to dispatch my vehicle,
when I call accelerate, it's gonna reach into the Vtable
of the poly to the accelerate function,
the function that matches the accelerate string.
And then it's gonna call that on the poly.
Now you're looking at me and you're like are you crazy?
You're replacing virtual functions
by like a map lookup or something?
Well so if you don't know me I'm a crazy meta programmer.
And this is a compile time string.
So this whole expression here is
completely resolved at compile time.
And that's equivalent to doing a struct,
accessing a member in a struct.
So these are absolutely equivalent,
and it's not only in theory like I'm benchmarking.
And it's exactly the same as doing
the hand written implementation.
But that's not so important.
The really important part is there's a dyno
remote storage thing which makes it equivalent
to what we wrote in the previous line.
Now I do need to specify what the interface is
so I have this fancy way of creating Vtables okay.
Basically I'm creating a Vtable saying
you're gonna need to be copy constructable
so there's gonna be a copy constructor in the Vtable.
You need to be destructible and you need to have
this accelerate function which has this signature.
And then I fill my Vtable I say
here's how you implement accelerate.
And I'm not getting into any details there.
It's just to show you,
if you're interested go read the documentation.
But basically I have a way of defining my Vtable.
And so the strengths and weaknesses not of dyno
but of the approach that we just looked at.
Which is the to basically pull the Vtable down
just one layer is, it's a simple model.
It's still very similar to classic inheritance
so it's easy to reason about if you're
used to reasoning about inheritance.
But it always requires a heap allocation
because we're still going through a pointer.
So that's not really ideal.
So we're going to look at a different
way of implementing this.
What if what if we did this instead?
And that's called a small buffer optimization SBO for short.
So the idea behind SBO is basically that you have a pointer.
You either have a pointer to the heap
or you have a local buffer of a fixed size
right there in the vehicle in the object right,
and if what you're being constructed from,
so like the derived class in a sense,
if what you're being constructed from fits in the buffer
you can put it there and you don't need to go to the heap.
Because you have enough storage right here.
And if it's too large then you need to go to the heap
and there's a bunch of ways of implementing this.
But you basically need one bit of information
encoded somewhere somehow to know whether
you're on heap or in the local buffer.
And the Vtable here that doesn't
change from the previous approach.
Like we're no can touch Vtables for a while now.
You still have a V pointer, whatever.
So one way is you just have a bool there,
that that's a way of doing this.
And then yeah this pointer here
basically points either on the heap
or it's not used because it's in a union
and the buffer is used instead.
And the way we can implement this is basically
you have a V pointer, the union with some aligned storage,
and then when you get constructed from any concrete type
you check oh do you fit in the buffer.
In my case I have 16 bytes.
So if the size of the concrete type
that I'm being constructed from fits in the buffer
or doesn't fit in the buffer rather,
if it's too large for the buffer I put it on the heap.
And otherwise I'm gonna placements new it in my buffer.
And if placements new is a form of the new expression
that basically just constructs an object of a given type.
It basically calls a constructor on a piece of memory
and a piece of the address in memory
where I'm constructing my object is ampersand buffer here.
So I'm just constructing the location of the buffer.
I'm constructing an object of type any.
And then when I want to call a function
I just go through my Vtable again
and if I'm on the heap I use my pointer
otherwise I use my buffer.
Is anybody completely left?
There's a bunch of implementations like I said.
I just showed a very naive one.
So another way of doing this is
that you can put the bool actually
instead of in the object you can put it in a Vtable
because there was one Vtable per type
and so each Vtable kinda knows
whether the type that it's being,
that it's useful for will fit in 16 bytes or not.
So that's a way.
You can actually inline that knowledge in
the functions that you put in the Vtable as well.
So they're gonna receive a vehicle
and they're gonna know whether
they should reach into the small buffer
or on the heap right, that's a way of doing this.
Another way of doing this, thanks to Gaspar.
I stole it for him.
Basically is you have, you always have a pointer.
Instead of having a bool you always have a pointer.
And the the pointer is either pointing on the heap
or it's pointing to a local buffer.
And so you don't have a bool.
You don't need to branch every time
that you're gonna access your your storage.
You just follow that pointer wherever it leads you
and it might lead you just to the local buffer
right after or actually on heap, yeah.
The question is but you still need
to know whether it's on the heap.
Yeah you do, you just follow the pointer.
The pointer has been set up to be either
pointing to the heap where the thing lives
or on the, oh okay I understand your question now.
Yeah okay you can actually retrieve that bool
that bit of information to know whether you should free
the object you're pointing to or not
by saying do my pointer, does my pointer
point to right after itself?
So that's how you can retrieve that boll.
And so with dyno you just need to use
a different storage policy, SBO storage whatever,
whatever size you want.
That's the only line that changes.
So the strengths and weaknesses of this approach is that
it does not always require allocating, that's good.
But it takes up more space and copy, move,
and swap operations are kind of complicated.
You have to handle a bunch of different cases
like if this guys on the heap, that guys on the stack.
And so on right.
So you have I think four combinations to handle.
And dispatching may also be more costly
depending on how you actually implement it.
Alright, so we're gonna...
Do I have any questions at this point?
- [Audience] Can you go back a couple slides please?
- [Audience] Stop, no there, nope back one, there.
So have you considered instead of storing
the boll in the Vtable just store
the size of the object in the Vtable.
And then you can have different small buffer
sizes optimizations and the test is not
is the bool on heap but rather
is the size the size of the buffer?
Yeah that's a good point.
So in dyno I actually do store the size
but I had not thought about using it for that purpose.
I think that's an interesting point.
But I think actually the more interesting
thing you can do here is really inline the knowledge
of whether you know the move is true or false.
You inline that in the Vtable in the
implementations of the functions in the Vtable.
But yeah that's a good point, that's another implementation.
So you have all kinds of ways of encoding
that bit of information and actually there's
another thing you can do, you could also like,
it's crazy but you could encode the that bit of information
in the low bits of your Vtable pointer
because you know the alignment of your Vtable pointer
so you could do some bit whittling, like there's...
- [Audience] The goal I was shooting for here is
not necessarily just a different kind,
different buffer sizes but also
to get back to the idea that with
inherited based polymorphism the Vtable
is constant it doesn't change at runtime.
And I'm wondering if you can get back to that.
That Vtable doesn't change, never changes.
- [Audience] Right, but if you had different
buffer sizes policies then the on heap would have to change.
Sure so you would yes, you would need to create
one Vtable per for such buffer size yeah.
So that's an interesting, there are probably
many other implement, ways of implementing
this that I haven't thought about.
This is just a general idea.
You can have a local buffer and have a fallback to the heap.
And what we're gonna do now is we're actually gonna
push this further we're gonna say let's not have a fallback.
So you have a local buffer of a fixed size
and you don't have a fallback.
So when the type is too large it doesn't compile.
That's what you.
The question is who deletes the Vtable?
The Vtable is in static storage, nobody deletes it.
No static storage, not thread local storage static storage.
Actually it can be constexpr even.
No need to do anything with it.
So if it doesn't fit in the local storage here
the only thing we can do is really fail to compile.
And the way we can do that is well
we just have some buffer of a fixed size
a line start rich 64 bytes in this case for example.
And then when I get constructed from any concrete type
I make sure that it fits in the buffer.
So if the size is less than or equal to
the size of my buffer, and here I'm skipping
some alignment checks and so on but it's like good.
So if it doesn't fit in the buffer then I static,
I basically give a compiler error.
Otherwise I placement new it in the buffer and that's it.
And then when I want to call the method on it
I just go through my Vtable.
I always need to go through my Vtable.
And then I call that and my this pointer
is basically the starting address of my buffer.
And then I need my destructor, I call my destructor.
I don't need to free anything here,
I just called my destructor on the buffer.
And with the library
you just have to say local storage and then pick your size.
So the strengths and weaknesses of this approach
are that well the strengths, main strength really
is that there's no allocation ever.
You're guaranteed to never get an allocation right.
You'll never get one.
The dispatching is as simple as it gets
because you always just go reach through
your buffer which is right there.
Actually if you have the object the buffer may already be
on the same cache line or something.
So this is as simple as it gets.
But it does take up more space unless you have
many many objects that have basically the same size
and they all have the exact size of the local buffer
in which case your optimal, you're using
your local buffer size optimally.
But in some cases it can result in some wasted space.
Do I have any questions at this point?
Is everybody following me yeah?
Could you have different, the question I think is
could you have different types
for the same representing like the same vehicle
for interface but having different storage policies?
Absolutely, depending on how you use them why not?
You'd have to make them interoperable but, or well probably.
But if you're using vehicles in a certain way
in your application here and a different way there
it makes total sense, or maybe you have different vehicles.
They're they're larger, they're they're smaller.
It makes total sense to tweak your implementation
of polymorphism to your your use case.
That's what I think, yeah.
And it's tedious if you do it by hand.
If you use a library and you can basically
you could templatize your vehicle class
on the storage policy and then
you just have to change the storage policy
that you're using, it's really easy.
So let's look at a few benchmarks.
These are like super super micro okay.
These are really micro benchmarks so
we're not gonna draw any any firm conclusions from them.
We're just gonna draw some really
coarse grained conclusions from them okay
because I don't want to get into that and then
next thing you know a compiler engineer
comes to me and says this is,
you're full of crap and bah bah bah.
I realize that the columns are maybe not
visible from the back I'm sorry for that.
So anyway this benchmark shows
creating many 16 bytes objects.
And I'm gonna go from the left to your left.
The leftmost one is inheritance.
Basically I'm going to the heap there, I'm allocating.
And all that this graph shows is that
you're doing an allocation and you're paying for it.
Remote storage next, also there's an allocation.
There in the same ballpark.
SBO storage with four bytes.
The third one is the SBO storage with four bytes.
Four bytes is not enough to store 16
so I'm always going to the heap.
I can never actually put the object in my buffer
and so I need to go to the heap and well I pay for it.
Same for 8 bytes which is the next one,
and then when it drops it's the SBO storage 16 bytes.
Where suddenly my objects fit in my buffer.
I don't have to do and allocation,
and boom the cusp drops dramatically.
And the last one is the local storage with 16 bytes where
the object always fits as well.
So I'm not paying for an allocation, yeah.
- [Audience] Have you tried this
with a pool allocator instead?
Have I tried this with a pool allocator?
No I have not.
That's a good point, this is just standard allocator.
But regardless I doubt that you can
find anything faster than just stack.
- [Audience] How many types did you use for this benchmark?
Can you repeat that?
- [Audience] How many types of objects
did you use for this is benchmark?
So I'm using Google benchmark basically and I'm,
I'm using Google benchmark which iterates
until the variants of the measurements goes really small.
- [Audience] How many classes there were?
Like each class is virtual job to the another place.
There's only one class here.
I'm just creating,
is basically an aligned storage of 16 bytes
that I'm I'm erasing in my in my object.
So it's really naive.
The only thing I want to show here is that
hey you need to pay for an allocation.
This is not like an exact number or whatever
but surely allocating is gonna be
a little more costly than not allocating right.
And that's what this benchmark shows.
This is like how coarse I want our conclusions to be.
Now I'm accessing many four bytes objects.
I have like 10 times three virtual method calls
and then Google benchmarks iterates this
until the variance of measurements it looks satisfactory.
Basically I'm not drawing any conclusions
except that they're all in the same ballpark, okay.
So I'm not trying to draw any conclusions
except to say that well probably that
the speed of dispatching may or may not be an issue.
In your application you should just measure that.
But the guidelines are
you should try to use local storage
whenever you can afford it, whenever it's convenient
for you to use that because you're always
gonna get rid of the allocation
which we saw actually mattered right.
Otherwise you should try to use the small buffer
optimization with the largest possible size,
or largest reasonable size.
And you should go to a purely remote storage
only when you know that your sizes are so
sparsely distributed that you're almost never,
you're rarely gonna take advantage
of a small buffer optimization
and you're always gonna be wasting the space.
And actually before this should be just measure it really.
Obviously just measure it.
Don't take any advice blindly.
But otherwise I think these are the most reasonable.
After measuring these are I think the most reasonable
guidelines I was able to come up with.
Now we're going to leave the realm
of value semantics for a second.
We're gonna look at non owning storage.
What I want is to have basically
a polymorphic view over some existing object.
And I don't want to own it, I'm not making a copy.
So the way you can implement that
is by simply having a, you still have your V pointer,
and then you have non owning,
you have basically a void pointer but this not owning.
So you just have a reference essentially
to the actual object that lives somewhere else.
So in this case here for example I have my truck.
It lives on the stack.
And then I call my process function which takes a vehicle
but there's no copy being made
because I don't actually try to view the object.
I just take the address basically
of my object under the hood right.
And there's nothing to do in the destructor as well.
You have to speak up.
How is it different than passing it as a...
as a vehicle reference?
Yeah yeah no no.
So if you pass it as a vehicle reference
in order to get a vehicle with the value
semantics based approaches you're gonna
create a vehicle which may incur an allocation
and then you're gonna pass a reference to that.
It's not what you want.
What you want here is just to pass basically
the truck itself and what you want basically
is just to bind a Vtable of your choice to that reference.
No in this case truck doesn't know anything about vehicle.
That's right, that's the point.
It only has the right interface so that when you create the,
when you create the vehicle with the non owning storage
you're creating a Vtable from that right.
And that's what binds to the accelerate method of the truck.
- [Audience] But you could achieve the same effect
by just making process the template right?
Well yes but this is slide code.
It's not always a case right?
If you had, or first of all it's not always possible.
And it's also not always convenient.
Let's say you want to have a fixed API
and have it be extensible by other clients
without you having to know what
their types are gonna be in beforehand.
Any more questions on this?
I'm not gonna show the implementation because
I don't want to run out of time.
So another thing you can do and now
we're actually back in the value semantics world.
Another thing you can do is have shared remote storage.
And this is something that
Sean Parent talks about in his talks.
Basically what you do, the idea behind that is
you use a short pointer okay so that all the vehicles
share the same underlying representation okay.
And so when you make a copy you're only
creating a new short pointer to to the original one.
So that allows you to do optimal sharing.
Now when you want to modify one of these vehicles
that's when you actually make a deep copy
and then you modify it so that's copy on write.
So that's an interesting way of retaining value semantics
while making it easier to you know do optimal sharing and
not having to worry basically
about breaking value semantics.
And I'm also not gonna show
the implementation for the sake of time.
I know this is kind of...
Since I'm not showing an example
it may be hard to picture it in your head.
So go ahead if you have any questions okay.
So now it's not clear though why we care
about all of this actually right?
So I'm gonna try to convince you that we care about this.
So have you heard about standard function probably?
Great, so standard function is basically is a type.
You give it a signature and then
you can stick any function that satisfies that signature
including function objects and they can be stateful okay.
You can take anything in there
and it's gonna have your call operator
and it's gonna call the underlying function.
In place function is something that was proposed to SG14.
Actually we discussed that yesterday and
the idea is it's like a standard function
but it never allocates because it has a fixed size buffer.
Have we talked about that?
And function view is something
that Vittorio Romeo blogged about.
And there might be a proposal baking I don't know.
But the idea is basically it's like a standard function
but it's like well it's the equivalent of
string view for a standard function.
So it's like a non-owning standard function.
So you just take a function view
and you're not making any copy
but you can pass anything that's
callable with the right signature.
What if we did this?
Imagine I have a basic function,
much like basic string, same idea a little bit.
I have a basic function which has
a signature and some storage policy.
And then I do this little dance here I have my
I can be constructed from anything.
I should you know restrict this with some
enable_if, for it to really work.
And then I just create my poly under the hood
which does all these nasty tricks
that we've done manually in the previous slides.
And then I have my call operator
which just dispatches to the vtable
and calls it with the arguments.
I should be doing perfect forwarding as well.
I'm trying to save some trouble here.
And then imagine I have this callable interface
which I'm not going to show but you define it.
It's not too difficult to define.
It's that little boilerplate that I showed before.
So you define your callable interface and then you just
set the storage policy of the poly.
Imagine we have this.
Then here is all of these three proposals
and actually more, implemented right here.
So function is just a user's SBO storage.
Technically it's not specified like this
but most implementations use some size of SBO.
They try to do this optimization.
I think libc++ does 16 or 32 or something.
And then inplace_function is just
a basic_function with the right local storage.
And function_view just has non owning storage
And we could maybe even consider having shared functions.
I don't know, I don't care about it but it's possible.
And you're just using shared storage.
So I think this way of seeing things as
behind basically a storage policy is pretty powerful
because function is just one example of
type-erased thingies right.
We also have a polymorphic memory resource now.
And they're using this nice base class interface
which I'm personally not a big fan of
but we could have maybe used something different for that.
I can totally picture something like
standard polymorphic allocator
which is just like standard function except
it doesn't require inheritance
and hence it does not make a bunch of choices beforehand.
So that's just an idea.
Okay so now we've talked about storage.
And I said I wanted to treat storage
and dynamic dispatching separately
so now we're going to only look at the dynamic dispatching.
Are there any questions for the first part of the talk?
So usually a Vtable is remote.
Regardless of the storage itself
you basically have a pointer to the Vtable
and that's how you do your autonomy dispatching.
But maybe we have some choices.
What if we tried to actually inline
the Vtable in the object?
That may or may not be very smart
but that's something we could do.
So how would that work?
Well basically you just need to take
your V pointer and turn it into...
I used to have a pointer to my Vtable
and now I just have the Vtable right there.
I just skip the pointer.
And now I've got all my Vtable
because I'm creating my Vtable and my own by hand
I have all my Vtable in my object.
And then when I initialize my object
I just have Vtable, I copy my Vtable from
the static storage and I store it in the object
and nothing else changes basically.
Well I'm also not interacting to a pointer
when I do my virtual calls anymore.
It turns out this is, well okay...
And this is how you can do it with dyno
so there's also a Vtable policy.
Basically you can just say everything
in my Vtable has everything locally.
It's a fancy little domain specific language .
If you're interested you can go learn about it but
basically there is a way of tweaking
how your Vtable is stored.
And turns out this is usually a pessimization
at least according to my not very precise micro benchmarks.
Okay well surprising.
It's always a space pessimization, that's very clear
because we have the Vtable and the object
so that takes more space.
I don't really understand why it's actually slower
but so far it seems to be slower, I don't know.
If someone knows why, go ahead.
- [Audience] I saw your dyno code
and I think the reason why the local storage of the methods
is slower is because you are heating
potentially called cache lines as opposed to
the Vtable which in any benchmark
that I can think of is gonna be very hot.
Okay so that's possible.
I don't need to repeat the questions I guess
since they're on the mic, okay good.
So that's possible however even when I hand roll it
so even if I like when I benchmark it
with a hand rolled low called Vtable
like just you know the simplest struct
and I just put it there it's,
like without going through dyno or anything,
any kind of fancy abstraction it's the same.
I also see like it's, so actually
the same comment probably applies there.
- [Audience] Did you to the benchmark
in the Intel platforms?
- [Audience] That's because the internal reordering
of the buffers and the addresses
it doesn't really matter that you have an extra indirection
if the indirection is already resolved.
So you're not getting anything by storing it locally.
I see okay well.
See that's super interesting feedback.
So I wasn't able to show that it was faster
so usually it seems to be a pessimization
that here's probably the explanation.
And so again the main guideline is always measure.
However, something that may be possible
is to just inline parts of your Vtable.
And basically the idea is let's say you have a really hot
let's say you have a really really hot function
then this one maybe you want to pull it down on one level
or maybe not depending I don't know.
But if you do want to pull it down
then you could actually do that where you have
a part of your Vtable locally in the object.
So basically you just have a function pointer,
or two function pointers, or however many you want
directly in the object and then the rest
for the other virtual functions
like the destructor and stuff like that
which you don't care about so much
you can actually have a pointer to the Vtable for those.
The way you could implement this
is by having basically a Vtable
that remote part of the Vtable
where it has like the destructor and whatever
you don't care about it, so the cold part there.
And then you have a local part which has a pointer to
the remote part and has whatever functions you care about.
And the polymorphic wrapper would be
a little more complicated but basically
it would reach sometimes in the remote part
sometimes in the local part okay.
With dyno you have this fancy little language
that you can say oh my Vtable okay,
only this function is gonna be local.
Everything else gonna be remote.
And so you can pick function by function
which ones you want to have locally.
And so here are some benchmarks.
Calling a bunch of virtual calls again
I just cannot draw any conclusion from that
so I'm not gonna even try to.
So that's not super conclusive,
so what I did was I looked at some assembly.
And actually I asked a compiler engineer
to decipher this for me and what I got out of this,
and again some people in the audience
may actually know more than, many people in the audience
probably know much more than me about this.
Let's look at...
So I have three different implementations.
One is like a manual Vtable that's remote.
So I have my remote thingy which has my remote any
which has a pointer to the Vtable.
And then I have my local any which has
the Vtable right there in the object.
And then for comparison I have the basically
inheritance based object like this.
And then I had just have functions that call these
kind of virtual or these functions
that should go through the Vtable.
And I'm not going to do a super deep analysis
of this because I'm just not able to
but what I was told is basically,
okay if you look at the, that was on my screen.
If you look at the remote one you'll see that
there is basically two MOsv's and then
two moves and then a call,
two moves in then a call, and that's it.
And if you look at the, and it's the same for inheritance.
You have two moves here call,
two moves here call, two moves here call.
If you look at the local one, the local version
here there's two move, and then a call
and then there's one move and a call, one move and a call.
What is this?
So here is what I was told.
What I was told is one of these moves,
the move that gets repeated every time
with the remote any or the inheritance version
which are just the same, so this guy.
Move RBX RAX, RBX RAX, RBX RAX.
This is loading the V pointer.
This is loading the V pointer into
whatever is needed to call it.
And then it's doing the call.
And the reason why they need to do that
is because when you do the virtual call
you're passing this right and you don't know
whether you've modified something.
The object could have changed.
So they need to actually reload
the Vtable, the pointer every time.
Whereas if it's local you see that it doesn't change.
So you don't have to do that.
Now maybe none of this that makes
any difference in the grand scheme of things but
this is I think you know an interesting thing
to know that optimizers need to
make sure that your code is correct.
And sometimes when you hide what you're doing
behind interactions they have to...
You know that you're never gonna modify
the temp V pointer but they don't right.
And in order to be correct they actually
need to do that kind of stuff.
And now I have another fun story about inlining.
So I had this benchmark.
I mean it's not exactly this benchmark
I had but something like this.
Which instead of I was type erasing an iterator there
which is not a super clever thing to do
but I was just playing around and
basically I have an input vector and an output vector.
And then I create iterators over
my input vector and my output vector.
And then I just copy the input vector into my output vector.
And the interesting thing here is that each equals here,
each comparison and each increment
and each dereference is going to a virtual call
because everything is type erase.
And so I run this and notice here I'm creating my iterators
through some make function which is not inline.
So I run this and that's what I get so
the leftmost to your left is the static dispatch.
Where it's basically just a vector iterator.
And this is a benchmark.
I was probably not clever enough.
The compiler was too clever.
It probably just like alighted everything and basically
it realized that nothing was going on probably.
And within inheritance or a remote Vtable
or like any sort of indirection it couldn't
figure it out anymore and it had to do some work.
And now I tried something, just a little tweak.
I just did this.
I just removed the no inline from
the function the crazy iterator.
So now the compiler was able to more easily see
what function was being, what iterator was being created.
And suddenly I got that.
Suddenly the dyno remote Vtable was basically being,
all the work was being completely elided, why?
And so what I think the reason is
is that inheritance goes like this.
You have a pointer to a derived class
which has a pointer to the Vtable.
With dyno or any fat pointer, that's what we call those
fat pointer approach you have a pointer to the Vtable
which has some storage in this case remote storage I think.
And I believe that by,
I was probably hitting some like inliner
some inlining limits in the optimizer.
And so when I made it non-inline suddenly
the the dyno remote Vtable because it has one fewer hops
I guess the optimizer was seeing through it better
but it couldn't do that for the inheritance.
That's my best guess at this honestly.
I'm not a compiler engineer.
But I think the lesson is just that hey
generally speaking reducing pointer hops
can lead to unexpected inlining
and when that happens the gains could be crazy.
And so in a real world system you may or may not have,
you may not have a place where that could actually help
but if you do you might be leaving
lots of performance on the table.
So it might be worth trying to play around with.
- [Audience] I think that it is not the reducing
of the pointer hops what gives you
the inlining but value semantics.
Because what is at the root of it all is that
with normal simple source inheritance
you're always using referential semantics and
things can change in ways that are opaque to the compiler.
If you do this these benchmarks that I've seen your code
the compiler can see that something is a local variable
one change and therefore even if there are many pointer hops
it knows that things are not changing
because the activation chain is not opaque anymore.
I see so that might even be a better explanation.
Like I said I'm not a compiler engineer
but so I guess then I should change this lesson
to making your code more, less reference based
and more value based which is gonna be helped actually
by reducing pointer hubs basically.
That can help the optimizer seeing through you're code.
And sometimes it can be pretty dramatic.
So I think guidelines by default
just don't bother with you know Vtable inlining.
It's probably not worth it.
And it may actually not help.
But you could consider inlining some methods
if you know they are really hot,
if you have some space in your object you don't care about
and if you've measured that it actually makes
a difference and a positive difference.
Otherwise just don't bother it's probably not worth it.
But it's still an interesting thought experiment.
So one of the biggest problems I think with with
what I've presented today is that it's super cumbersome.
You have to be like do this boilerplate
and like really nobody wants to do that.
Nobody wants to deal with that.
And the the dyno library I mean let's be honest.
It's cool what it achieves but you don't want to be
doing these compile time strings
and all this weird domain specific language.
Nobody wants to do that in live production code,
unless it's like super well hidden and everything but still.
But I think in the future we can
actually get something much better.
Say if we had reflection I could take this truck vehicle
and extract an interface from that.
And then it had if I had something like code injection
my poly here could look at the interface
for my vehicle and say oh I need to generate
this method in that method right?
And then you wouldn't have any kind of boilerplate.
You wouldn't need any boilerplate.
We would just reduce everything.
Or with metaclasses you could have
this fancy interface keyword which I wish it wasn't based
directly on inheritance unlike super a naive inheritance.
But instead if it we could actually easily make it based
on type erasure which is a technique
that we've talked about today.
That would be pretty easy.
And so I think a summary for this talk,
if there was one thing, or if there was
a few things I would like you guys to remember
is that inheritance is just one way
of implementing runtime polymorphism.
We've talked about a bunch of them,
and some of them may be better for some cases.
worse for other cases but it's absolutely not true
that inheritance is a silver bullet.
And that it's the only way of doing runtime polymorphism
which is usually what we assume when we do C++.
I mean this is, currently I'm questioning
the most fundamental thing in C++.
This is what makes C++ C++ with classes.
So what i'm saying is that's just one design choice.
and it's not always the best one.
really not always the best one it has many drawbacks.
And you may not want to actually bake that choice,
that specific design choice in your program.
And you can avoid doing this by using type eraser,
basically creating your wrapper, your vehicle wrapper
and hiding how your your polymorphism
is implemented in that class.
There are as many ways of storing polymeric objects.
We watched a few, we looked at a few of those.
It's always a space time trade off.
The Vtables can be inlined.
It's not clear that it's a gain but it may be.
You just want to measure at this point.
But it's still is an interesting thought experiment I think.
And type erasure is really tedious to do manually
but reflection will be there to help.
So the dyno library is available online.
It's not like production quality at this point.
This is still a little bit experimental.
But you can look at it and I'm welcoming of poll requests.
Here's a bunch of links.
The slides are gonna be published.
So you can go watch previous talks
or read on the subject of type erasure.
There is plenty of material out there.
A bunch of people have worked on this problem before me.
And that's it, thank you very much.
- [Audience] So you were saying
don't bake this into your design.
How would you given the current state of C++ 17
recommend you not bake that into your design.
So some kind of typing erasure?
You may have a few links.
Yeah so you just use type erasure
instead of using inheritance directly basically.
- [Audience] What would that mean in practice?
- [Audience] What what would that mean in practice?
So what kind of--
Well I just showed like four different ways
of implementing these vehicle wrappers,
or you can use a library.
And on these slides there there's Adobe poly you can use,
there's boost type erasure, there's Zach Laine's.
Zach Laine's talk presents like a tool actually
that can generate this thing for you.
There's Eraserface which is a macro
to generate these things for you.
There is liberasure which you could also use.
So like you have a bunch of--
- [Audience] So you're saying
look at them all and take your pick?
Well I would say use dyno.
Honestly I would say use dyno.
It's more flexible than anything else I've seen but
it's kind of a little bit too experimental but
personally I would say use dyno because
it's simpler than what I've seen but,
I mean yeah you take your pick basically.
- [Audience] Thank you.
- [Audience] Hello.
So there's a couple of axes that are really important
to the performance discussion that weren't really touched on
so one of them was we just talked about
whether things are faster or slower to call.
But there's a really big difference
when you're talking about I want to call one polymorphic
object and I want that to go through as quickly as possible
versus I have an array of, a vector of polymorphic objects
and I want to call all of them as quickly as possible.
For instance the amount of space that it occupies
not very important in the first case
but very very important in the second case
'cause the number of objects you can read in at once.
And the second thing is that
one of the reasons why
storing the Vtable remotely doesn't
typically end up being a big deal
and I don't no if you tweak this parameter either
is that usually in the second case
when you have a huge array of objects
the number of classes you have is very small
compared to the number of objects so--
Everything's really hot yeah.
- [Audience] What's that?
So you're Vtables are basically really hot already right?
- [Audience] Yeah like your Vtables will stay hot.
So you have like five Vtables and a million
objects and it's just not a big deal.
And again the win that you get from the space side of things
almost always makes it worth it.
If you have a million objects and a million classes
then suddenly if you think about every Vtable is new.
Now you're basically talking about a linked list.
Every single time you get a Vtable
that's a cache miss every single time.
So in that case if the number of objects
and the number of classes is the same
I would expect some of dynos approaches to do better
but that's a very rare use case in real life.
But it's not to say it never happens.
Just a couple of thoughts.
Yeah yeah no that's good.
I mean again I think the guideline clearly is
don't do anything fancy with your Vtables
just leave them remote right and
if really you have a fancy use case
then you can tweak it, it's easy.
You just change a few lines you try it out and that's it.
- [Audience] So you might have touched on this
and maybe I just missed it but
with classical inheritance you get the override keyword.
- [Audience] Override.
So how does dyno solve that kind of contract?
With override you have to make sure that you have
the virtual functions or you'll get a compiler error
and all that kind of stuff.
How does that solve that?
Yeah, so there's no equivalent in dyno.
I mean you just have no inheritance at all.
So there's no override or virtual keywords required at all.
But yeah you don't which means that also
the classes that are supposed to satisfy an interface
they don't have a way of saying I want to error if I'm not.
You just don't have an equal entity to the override source.
- [Audience] There's no way to enforce
that kind of contract?
- [Audience] Okay thank you.
- [Audience] Alright so I really like
the design you had with your basic function
where you have you know just a single interface
with storage policy because one thing
we do a lot in our projects is just reimplement
the entire SED string with its 200 something methods
just because we want to control
the small buffer optimization.
Do you think it's some new direction the committee
could take if they have a new container
just define a container interface, offer a default storage,
and let users choose the storage they want?
I think this would be something,
okay so my personal opinion is that
it would be useful to have a
general way of customizing storage
for people that need very, that have very
tight requirements like basically people as G14.
I think that would be useful I believe the committee
might not like this idea, I don't know.
I just don't know it's just based on my intuition
is that maybe it would be seen
as being too narrow or something.
- [Audience] One problem I can think of
is for example for a STD vector if we could
change the storage to use a small buffer optimization
you'd lose the guarantee of when you move the vector
the iterators might be invalidated
if you don't have a remote storage.
I think that could be a concern this that
the template has a different concept
depending on the storage policies.
Yeah yeah I think that's also very true.
I hadn't thought of that but yeah.
So there are definitely challenges
to actually generalizing this to many many more
kind of containers or anything so I don't know.
I think it's an interesting,
all of this is very experimental.
It's like an exploration of what we can actually do.
So I think it would be interesting
to see what happens if we do it with containers.
Maybe it applies maybe it doesn't.
- [Audience] Alright thank you.
- [Audience] I'd like to raise a few concerns with inlining.
As you might know the modern compilers
are sometimes capable of inlining the virtual calls.
Yeah the virtualization yeah.
- [Audience] Yeah and I personally have faced
the limitation, or actually bug in the compiler
when the compiler decides to inline the call
and then I add a different runtime module
and feed another object and
the buck is here.
- [Audience] So the inlining of virtual code
should be avoided in my opinion.
I think that's between you and the compiler.
I think that's between you and like
if Clang does it and you don't like it
for example you file a bug against them.
I mean I see your point but I think for most people
the ability to do with the virtualization
far outweighs that kind of stuff.
But anyway like I don't really,
what I think is just like if it is a problem
then you should definitely talk to your--
- [Audience] Actually if you can make sure that
all the objects live inside the single module
and you compile this module then the inlining is okay.
But if your program consists of several modules and
they do runtime linking so one module
can send an object to another in runtime
and this inlining kills your program.
So I mean technically if you wanted to avoid inlining
you could put like a no inline.
I think it could probably do a,
put like a no inline attribute and
I don't know exactly where but
there's a place where you could probably do a no inline
because you have control over Vtable
if you write it by hand or with dyno.
So you could almost certainly make it
go through a function call that's marked as no inline
for your specific use case and then you would know that
this is never inlined I think.
- [Audience] Okay thank you.
- [Audience] Since you mentioned Sean Parent's
talk on polymorphism what's your take
on the concept T based technique he's shown
I don't know maybe two or three years ago.
I don't know if you remember it.
Because it looked a bit simpler to me
and maybe since it used actual polymorphism
behind the scenes the compiler
is able to better optimize it?
So in all benchmarks,
so I'm aware of that technique actually.
I decided I, in previous versions of the talk.
This is the first thing that I showed.
But based on feedback I got from doing this talk at meetups
I actually removed it because people would not understand it
so I had to spend a lot of time to explain this one.
So also in all the benchmarks that I that I've done
they, they're not better than the hand roll ones.
So I don't know.
- [Audience] Maybe if we improve that
with some a small buffer optimization?
It's not clear to me how you can do you...
You could probably do smaller for optimization
even if you're using, yeah you could.
I mean Adobe poly does that right?
They have the SBO.
But like vanilla,
the vanilla technique, the technique that Sean
shows in his talks if you just take it as is
it's no better than just--
- [Audience] Alright.
And this is gonna be the last one
because we're out of time.
- [Audience] Exploring the space,
have you given any thought to open mounting methods?
I have, I think it's a difficult problem to solve.
And I think, actually I've spoken to someone that might may,
that wanted to do this, to add this to dyno.
I don't know where it's gonna go.
I think I personally don't have the bandwidth to do that
but I think it's a really interesting problem to solve.
It's a really difficult one
especially to do that efficiently.
And it can almost certainly be solved but
there are no plans at the moment.
- [Audience] Alright thanks.
Okay thank you everyone.