Cookies   I display ads to cover the expenses. See the privacy policy for more information. You can keep or reject the ads.

Video thumbnail
- Good afternoon everyone.
I'm Titus Winters, I do not love doing my own introductions,
but there's a couple things that I do want to say.
I've been leading most of the C++ Common Library efforts
at Google for eight years now, dear lord.
I'm a maintainer for Google C++ style guide.
I founded Abseil.
I'm now the chair for Library Evolution,
that is the group at the standards committee level
that does API design for the Standard Library.
The new study group on tooling,
I've done tons of guidance,
I write a lot of the tip of the week series,
I work professionally on providing
good guidance on API design, and because of all
of the code base wide refactoring work in Abseil
and all of the other teams we've been on,
we are subjected to the pain of fixing it
when we get these things wrong,
which is to say hopefully you can trust
that I am not making all of this up.
So, when we talk about design, why do we talk about design?
I think we do this because we want to ensure
that the things that we produce are usable,
that people understand what you mean
when you write out an interface,
when you write out a function, when you write out a class,
that they know how to use that.
By looking at the things that work
and the things that don't we find ideas and design patterns
that are easy to follow and that make
the resulting APIs easier to work with.
In the end, though, design serves us.
This is largely not about math
or fundamental principles of the universe.
These are not rules written out on stone tablets
and brought down from on high.
I am not bringing you commandments here.
I am bringing you stories, best practices,
things that I have found seem to be
the way that we do things, things that I have found
seem to be the right way to express things,
but this is all going to be evolving over time, right?
And I do want you to think
about all of these things carefully.
Don't just take my word for it.
There are a few places where there are
underlying math source embolic logic,
and those things will help inform good design,
and that'll be good.
We'll sort of call that out.
There is also this question,
are we prescriptivist or descriptivist?
We can approach design just like grammar
from either of these views.
Do we see the rules as they were written down
on those tablets and value the rules over all else?
Or do we see that, oh hey, we've made a mess,
and some things work and some things don't,
and try to produce rules that describe what things did
and what things didn't to like encourage the good ones
and nudge us away from the bad.
And you can sort of take either viewpoint here,
but I do sort of prefer when we approach things
in a descriptivist fashion of this seems to work,
this seems to not work, and here's why, right?
It is really important to me that you all think
about these things and understand why.
This talk will be in roughly three parts,
starting small with the basic units of design
and working our way up to big questions like
is this an acceptable design pattern for types?
There's a spectrum here.
As we go forward in the talk we'll go
from talking about syntax to semantics to theory.
This is scheduled in a two-hour block here at CppCon.
And I'm gonna cover the first of these in this talk
and the higher level pieces, types and type design,
in the next slot.
So we'll wade in starting with the smaller
and hopefully more understd part of the design spectrum.
First, a question.
What is the atom of C++ API design?
That is, what is the fundamental small chunk of API design?
It might not be the smallest chunk,
but it should be the small thing that we reach for
or that we think about most often.
And if you asked me this a year ago I would have said,
well, it's the function, right?
After all, that's the piece that we use the most.
Free functions, member functions,
special member functions, et cetera.
But recently I've started to think
for maybe the last year or so,
maybe functions are actually our protons.
The better unit of design is slightly larger.
The better unit of design is an overload set.
When you have a well-designed, when you have a reasonable,
when you have a good overload set,
and it turns out there's actually very solid agreement
amongst all of the experts on
what is good and reasonable here,
overload sets are a much better unit of design,
especially as we move to a richer type system,
richer set of vocabulary types, concepts,
and even deeper understanding of move semantics
and move semantic designs.
Pop quiz, what does this mean?
What does this simple function signature like this mean?
By the end of the talk I want you to understand
that this question is bogus.
This question is ill formed.
You really need to know a bit more about Foo, the type,
and quite a bit more about f
and everything else that is named the same,
everything else that is named f.
I will say, if f is appearing all by itself
and isn't part of an overload set,
what we've got here is the function signature
for maybe move.
This nugget didn't actually fit anywhere else in the talk,
and I really find it very important
so I'm just gonna say that right now.
You can repeat it three times to yourself under your breath.
It is maybe move when you see
a function signature like this.
Okay, overload sets.
Somewhat formally, an overload set
is a collection of functions in the same scope,
that's namespace, class, et cetera, et cetera,
of the same name such that if any one of them is found
by name resolution they all will be.
That captures the syntax,
that captures what the compiler cares about,
but not the semantics.
That is what a user will care about,
of a good overload set.
The core guideline says very good things about this.
Core guidelines has two rules.
What is overload options that are roughly equivalent,
that is if you have two things
that are doing roughly the same thing,
name them the same.
And also the flip side, overload only for operations
that are roughly equivalent, all right?
That is, if you have two things
that are doing something very, very different,
please name them differently, right?
This should not be shocking.
The Google C++ style guide says
use overloaded functions, including constructors,
only if a reader looking at a call site
can get a good idea of what is happening
without having to first figure out
exactly which overload is called, right?
You shouldn't need to do overload resolution in your head
and know all of the symbols that might show up
through transitive inclusion.
Like, what's everything in your program
that might have the same name, right?
You should actually only name things the same
if it doesn't matter to the reader which of those
is actually gonna get picked, right?
If it's gonna do the same thing.
We're definitely lacking a solid theoretical way
to describe that same thing,
'cause it's sort of squishy, right?
Like you can't really say, like,
give me the semantic definition of
I have a function of two arguments and a function
of three arguments, and they do the same thing.
Like, that's gonna be just weird to try to come up with
any sort of formal definition of that, right?
But we sort of can see what we mean with some examples here.
So for instance, we can overload on arity.
How many parameters the function takes.
And a great example of this is StrCat from Abseil.
We've had a variation on StrCat
in our code base at Google for many years.
Pre C++ 11 StrCat was an overload set
of something like 25 or 26 separate functions
to go from arity 1 all the way up to arity 25.
And it didn't matter, right?
You don't need to know which of those you're calling
because what StrCat does is take all of its things,
convert them to string, and return you
the concatenation there.
And even after we switch to C++ 11
and moved this to being a variadic template,
still doesn't matter, right?
It's one thing, because even that statement
of it being a variadic template
is slightly a lie because the first I think five arities
are hand rolled free functions for optimization purposes
to make it easier on the compiler,
and none of that matters, right?
Because you see a call to StrCat,
you don't have to count them,
you don't have to know which one is called.
It just does one thing, right?
So you can clearly overload on arity
in some cases like this.
You can also overload on types,
usually for types that are similar,
and the most common example that you're gonna find
is for legacy like stringish overloads.
There was some old function in your code base
that accepted const char* and someone got tired of that
and they added an overload for const string ref.
And this is a great example
of a well-designed overload set, right?
You've got some sort of stringish data.
The user that is calling this function
or the reader of some code that is calling this function,
this overload set, sorry, excuse me,
doesn't need to know which type is being passed exactly
or which function is being called exactly
because we can see at a glance
that the semantics are the same, right?
In this case, one is implemented in line
in terms of the other.
We see overloads throughout the standard library
for optimization as a result of move semantics.
For instance, there's vector push_back.
This is an overload set.
This fits and slightly expands our definition
of these things have the same semantics
and I don't need to know which of these is called.
At the call site, the user doesn't have to care
whether it's the Lvalue or Rvalue version of push_back.
At most, they need to watch out for use after move,
but that's true irrespective of what API you're calling.
You always need to watch out for use after move.
This also helps flesh out
what we mean by the same semantics.
It is the same post condition on the vector,
not necessarily for the T that was passed in.
However, we don't actually really care
about the post condition on the T,
because the T is either const ref, not being changed,
is a temporary, which case we don't care,
or withstd moved and it's definitely not our problem,
see previous result, right?
Does that all jibe?
And it's also worth noting that the calling code,
so long as it obeys this restriction,
would be the same behavior, not the same optimization,
if we removed the Rvalue push_back overload.
The semantics for all of our callers
are totally the same, right?
Nothing's actually gonna change.
So with those sort of examples, if we can overload on arity
we can overload for optimization,
we can overload for same types-ish,
same platonic notion of types like string-ish data.
Let's look at the overload set guidance.
We can say properties of a good overload set,
you can judge the correctness
without having to do overload resolution,
and I really like the second option here.
A single good comment can describe the full set.
For StrCat that comment would be something like
take all the provided arguments,
convert them to string in the default fashion,
and return a string formed by the concatenation
of the stringified arguments.
For the string thing for Foo it'd be
do x on the given string, whatever that is.
For vector push_back it would be something like
add this T to the back of the vector.
Right, we don't need to have a comment
on every individual element of the set, right?
It's probably the case that one comment
describing the overload set as a whole is actually
more explanatory of what that overload set does, right?
And probably much clearer for a user.
So this pushes some of this squishiness
of what is a good overload set back a little bit
onto what is a good comment,
which is still squishy, I can't define it,
but I'll know it when I'll see it sort of thing.
But practically speaking,
nine times out of 10 you can spot the bad comments
when the comment is encouraging you
to do overload resolution, right?
Is that like, you've all seen comments like that?
That is definitely a code smell, right?
Does this all make some sense?
Good overload set.
Any questions?
There are mics.
Please feel free,
I will not be able to like see you probably,
but please feel free to chime in.
I would love to hear from you.
It's kind of awkward.
When we start consciously treating overload sets
as the basic unit of design,
then we start seeing them in other places, right?
The most important overload set of all
is one that we've discussed a lot over the last few years,
but usually not specifically in terms
of it being an overload set.
Any guesses?
Copy versus move.
I really, really like the formulation of copy and move
as an overload set.
This actually has huge conceptual ramifications
when we reconceptualize along those lines.
The type trait for movable isn't stupid anymore.
It's always really bothered me that is move constructable
didn't really you whether it actually moved.
It only told you if syntactically you could construct it
from an Rvalue, right?
That was like, ehhhhh.
Whereas now when you recognize
that move and copy are an overload set,
all that actually matters is that you can construct
from a temporary, you can construct
from an Rvalue, right?
'Cause they're an overload set.
It doesn't matter which one you pick.
It is up to the type author in that model
to ensure that move is efficient
whenever it's plausible, right?
It is up to the user to ensure that move is used
wherever it's relevant or important.
And the user doesn't need to know if a type
has a move constructor because you don't
need to know which member of the overload set is chosen.
This also requires that the semantics of copy versus move
must be the same, at least with respect to the destination.
The type, the object being constructed.
This matches the way that the standard library
is behaving more and more.
This matches the way that concepts for the standard library
is being defined.
This matches the behavior for papers
that I've been writing about
what the standard library promises.
More on that later in the week.
Move is an optimization of copy
is what I've been saying for a few years,
but I think the better way to phrase it is move and copy
must be a well-designed overload set.
Does that make sense?
Interesting, explicitly conceptualizing everything,
even constructors like move and copy,
as an overload set gives us
some guidance on things like explicit.
When you view your constructors as an overload set,
then you start having a better idea
of when explicit applies.
Does a user need to know which constructor was picked?
If so, make that constructor explicit.
Viewing it another way, copy and move
are the canonical constructors
that at least take parameters.
We know they're semantics.
They take a T and they make a new T that's like the given T.
That's the canonical constructor behavior.
But if your constructor doesn't take T
but takes some other type or maybe some other types, right?
If you'd usually be comfortable passing T and U,
Foo and Bar, const string ref and const char*
as an overload set, then you're probably fine
having constructors for both of those, right?
You could have a constructor that accepts Bar in your Foo.
If it would be an acceptable overload set
for both Foo and Bar, right?
And that's most commonly the case when T and U
represent the same idea, right?
These are two different types
of the same sort of canonical data.
If it is, on the other hand,
merely the case that we can construct a T
from some bag of parameters, but those aren't basically a T,
right, this is vectors constructor
that takes a T and a size, right?
Okay that is not the same as a vector.
That is a recipe for creating a vector.
Then your constructor should be explicit, right?
Does that make sense?
Any questions?
I'll wait just a second.
I find that we wildly, wildly
underuse explicit on constructors.
And I think the standard library is as guilty as that
as anybody.
Like almost all constructors should
probably have been tagged explicit,
and we kind of screwed that up,
but we're good, okay.
All of that said, in overload
there's another really common pattern that I see,
which is people attempting to use overload sets
to enforce certain types of behavior.
And my high-level guidance is don't use equals delete
on a member of an overload set.
Is that a question?
Nope, all right.
So I somewhat regularly see people try to delete a member
of an overload set to enforce lifetime requirements.
Show of hands, anyone seen someone do this in their code?
Yep, a few, yep.
Looks like maybe 5% of you.
The problem here is that no temporaries
versus things that have the correct lifetime,
things that have the lifetime that I actually require,
are not actually synonyms, right?
That Venn diagram like overlaps a little bit,
but it's more disjoint than not.
Like generally speaking, if the lifetime requirement
for a parameter to your function is not
as long as this function call, like the default,
simple, obvious thing,
then it's gonna be a little hard to pin down, right?
It could be this function must live
as long as this function,
or this variable must live until
the next time you call this function.
This variable must live as long as this object.
This variable must live until this thread completes.
This variable must live until this call backfires, right?
All of those things are complicated.
Certainly much more complicated
than just no temporaries, right?
Those are different levels of complexity.
Starting asynchronous work is particularly challenging.
And when you do this, the most common workaround
for most people when they say, like oh, geez,
you've equals deleted this, I can't pass a temporary,
I guess I'll make this not a temporary.
Nine times out of 10 they do this
by pulling out an automatic variable.
And technically now that's gonna build,
but the odds that that, the lifetime requirement
of that automatic variable actually matched the requirement
of your kooky API are pretty slim, right?
In practice, your API that is kicking off async work
or storing a reference, right,
is going to require you to have a pretty detailed comment
in its API saying exactly what the lifetime requirement
on that reference is.
The freeform nature, right, the arbitrary boundless
possible complexity of that requirement,
is a whole lot more complicated than even C++'s type system.
Right?
Equal's deleting a thing here doesn't solve that problem.
All of which to say, the solution
to documenting lifetime requirements on borrowed references
is either a, don't make it a borrowed reference,
or b, document the actual requirement.
The type system like this cannot do it for you.
If you want to equals delete it on top of that documentation
I guess that's fine.
But it's a half measure at best.
It's a quarter measure at best,
and it gets really messy and it's misleading,
and it's a false sense of security.
And I would not accept it in code review,
but I guess your mileage may vary.
Good?
(coughing) Excuse me.
There's also using equals delete
or just (mumbles) a function
from an overload set in some cases,
in order to force the user to use the move version
of a function instead of the copy version of a function.
And in simple cases, maybe even in most cases,
that looks fine.
That could be fine.
But in general you don't know all of the ways
that your API is going to be used.
That is fundamental to the whole business
or providing an API in the first place.
While it might be the case that you know
that many invocations of your function
should be done via move, not copy,
you can't know that for everything, right?
If I wanna do two separate scans
on a slightly modified chunk of DNA,
it's less efficient to call this on temporaries
'cause I have to do the modification twice.
And if you make me contort my code
so that I only do the modification ones
but can't call your move only interface,
I can do that, of course,
but it is a little bit more awkward.
My point being, for functions you can't really know
that nobody ever is going to need the copy API.
And when you provide it as an option,
the calling code is certainly simpler.
Sort of at a very high level, don't be judgy, right?
You don't know all of the ways
that your code is going to be used,
be accepting.
If you do somehow know that copies must never ever happen,
that is almost certainly a property of the type,
not of the function that you're passing that data to, right?
Make a DNA class in this example.
So if you're that worried about accidental copies adding up,
make it a separate class, don't use string, right?
And then probably still make it copyable,
just with some special name, all right?
Maybe make it more explicit
and hard to trigger accidentally.
I've sort of snuck in here a pass by value design sink.
Here DNAScan is accepting a string, the DNA,
which is presumably a very large string,
by value, ooohh.
Other things like vector push_back from earlier
do this as an overload set.
Which one is right?
That is to say, is vector's push_back
a well-designed overload set?
Should everyone always be doing that
when you're accepting a value to sink?
Or is just accepting a value fine?
And there's been a lot of discussion on this.
In fact, I think one of Herb's keynotes
the first year or two of this conference had a long section
that was touching on a lot of the same things.
Spent a lot of discussion and partial guidance on this,
and a lot of that guidance does not agree, right?
And to some extent that is because
there are a lot of possible questions,
a lot of different scenarios
that you might be optimizing for.
So really we should be asking some questions
before we try to come up with perfect,
all-encompassing guidance.
And the questions that you might think
that you might need to be asking,
is this a generic or am I sinking a particular type, right?
In the DNAScan example, I'm sinking exactly string.
That gives me some knowledge.
Or I might be sinking exactly DNA strand, right?
And that gives me knowledge about probably
the relative costs of copying and moving
versus what the function I'm about to do is.
Is it a question?
Although it might be good to,
I can try to repeat, but either way.
- [Man] Can you clarify the sink, what you mean by sink?
- Can I clarify what I mean by sink?
So there are a lot of functions
where you pass a value in and it's read and returned to you
and nothing else, right?
That's sort of normal.
Then there's things like vector push_back,
which is a sink.
It's passed in and copied and stored, right?
So for any function that you are accepting the input
to then copy either for storage in the very common case,
usually like vector push_back,
or even in some cases I'm accepting it
in order to make a copy that I'm gonna mutate.
So, you could imagine a silly function
that is print everything capitalized, right?
Which might accept a string and need to make a copy of it
so that it can capitalize it before it prints it, right?
So sometimes you might have either storage
or I just need a copy.
Does that capture it?
Yeah.
Yeah, and there's also the question of
relating to whether it's generic or not.
How expensive is the function compared
to a copy or a move of that type or those types, right?
And if it's generic, like vector push_back, right,
all you're actually doing is
making a copy or doing a move, right?
There's basically no overhead above and beyond
the cost of copy or move.
For something like DNAScan, right,
if I need to sink it I'm probably also going to do
a whole bunch of work on it.
And that whole bunch of work, it's probably
much more expensive than a move on a string
or a move on a DNA snippet, right?
So you need to maybe weigh those things a little bit.
But there are more questions.
There's the question of are there multiple parameters
that are being sunk, right?
If I need to sink two or three or four parameters
then the cross product of const ref and ref ref
for all of those parameters means I have
a combinatorial explosion of elements in my overload set.
And that just might not turn out to be fun.
Certainly not fun to maintain.
There's a question of over time as I maintain this library,
as I maintain this code base,
do I know that this is always going to be
a sink of exactly T or do I just want T-ish things, right?
In the case of accepting strings,
you might if you have a lot of not actually strings
in your code base but things that convert to string_view,
you might make your sink in terms of string_view instead
so that there's one clear conversion point.
And then there's the question which I think Herb raised
in his keynote a couple years ago of
can allocation reuse dominate?
And this is a case where if I have a type
who has a member variable like a log or something,
then it could be the case that as I pend data to that log,
maybe it's a string, it may have to resize and reallocate
as I append more data to it.
And if I sink a new log, sink a new string into place,
the allocation of the old one is lost
when I move the new one in, right?
And if I continue growing again
then I'm gonna have to do all of that
reallocation over and over again.
That seems like a fairly rare case,
but is not by any means unheard of.
So it is actually a thing
that you might actually have to consider,
like when you're deciding how
to accept your sink parameters.
There may even be other questions
above and beyond this five,
but I think that's a reasonably complete set
and already very complicated.
But I will throw out the following
sort of very general guidance.
I would probably personally provide this as the guidelines.
You probably want the overload set of constr ref and ref ref
if the implementation of your function is small
compared to move constructing a T.
It is a little bit more complex.
It is worse error messages,
it is worse compilation performance,
and it is probably too much of a pain
if you have multiple parameters.
Right, there's that combinatorial explosion.
You could sink by value if the implementation
is largely, larger cost than move constructing a T.
Like in the DNAScan example,
I'm about to walk through everything
in it, DNA snippet, right?
That is bonkers more expensive than moving a string.
But it also does constrain you a little bit.
You want that to continue to be a T
and exactly a T for all time.
You don't want conversions in there.
And then there's const T ref is actually
never a terrible choice if you don't know the answers
to these questions because it's simple
and it gives you flexibility.
It's well understd, right?
It's hard to get wrong.
That's how I would simplify that.
It's also worth noting that this gets
a little more complicated if you're dealing
with strong exception types and sinks that may throw
if DNAScan may throw and DNA needs
to be strong exception safe,
then you have an additional set of constraints.
Practically speaking sinks don't usually
throw except for allocation.
If exception safety is your primary concern
you may have to reevaluate this a little bit.
Mostly don't pass by value
for types that are strong exception safe.
When I'm talking about non-sink overloads
historically I find that we're talking about const char*
and const string&, I mentioned that a little bit,
these tend to have a similar look.
In modern code we tend to replace that overload set
with string_view.
And once we start talking about string_view
as the string like parameter type,
then we start looking at other common
non-owning parameter types, like span,
these have unusual designs, there are sharp edges.
There was a whole talk on that already this morning.
Span even leads us to a bigger can of worms,
because unlike string view,
like string view does one thing, it is character data
span tries to be a general, any contiguous range of type T,
but there are lots of contiguous ranges of almost T
that you're reasonably likely to work with.
For instance, there's pointers versus smart pointers.
We can easily publish guidance to say,
don't pass smart pointers by const ref in general.
If you wanna pass a pointer
not the ownership wrapping information.
So we get it ingrained, don't do this,
like identify this in code review.
Suggest const T* or even const T ref.
But types don't actually decompose, right?
A vector of unique pointer T is not convertible
to a vector of T*.
And if you've got a vector of owned pointers
and need to invoke a function with a vector of T*,
there just isn't a good way to do that.
So modeling based on span, it is not hard
to imagine producing a more generic span of T-ish things.
I've seen this in my code base as AnySpan,
which I don't love the name,
but I do increasingly like the type.
It effectively type erases a contiguous container
of things that can be converted to T* or T ref
in a fairly clear fashion.
And we can go further and further down that rabbit hole.
Maybe it doesn't need to be contiguous.
Maybe it's just some form of range.
Maybe we can do this for associative containers
and we have a map view or a set view.
Stepping back a little bit, C++ is a language
that is all about types,
more so than basically anything else.
Overloads for non-owning reference parameters
like string_view and span and AnySpan,
are about getting closer to duck typing,
in terms of what types are accepted,
which, give me anything that looks like a duck
and quacks like a duck, and I will use it as a duck.
Bjarne was talking about this
in the keynote this morning with concepts.
It's a language approach to a very similar problem.
And those are the two main conventions
that are emerging in this space, right?
We can, in the library,
build non-owning reference parameter types like these
or we move to more generic code and use concepts.
And when it comes to that question
of which of these will emerge,
I don't think the community has enough experience
to provide particularly deep guidance yet.
My suspicion is that this will come down
to whether the library of types like string_view and span
are found to be sufficiently expressive.
If the library providers of the world
build a rich set of such types,
we'll probably go that way.
This approach has a headstart, after all.
If we invest a similar effort in concepts
and, important, we find only a comparable set
of sharp edges for concept usage versus view usage,
then it may be that concepts comes to dominate.
That is a pretty significant shift
and with a lot of unknowns.
And it's unclear yet whether everyday programmers
can write in a generic and duck typed fashion
efficiently and safely.
We will see how that turns out,
but interestingly we already have a trial
of that happening right now without concepts
in the form of callables, std function.
Even without concepts we can write fairly reasonably,
something that takes in a callable
in either a library or a language fashion.
Both of these have their uses,
but I think when we're writing everyday code,
most of us are going to reach for the library form.
And that seems telling.
Until we have erasure and storage for concepts,
I think we're probably going to reach
for the library solution.
If I had to guess about the future,
I'm gonna guess that we'll devote a fair amount of energy
to both approaches and we'll wind up with a powerful,
very useful set of concepts and then those will be
type erased and provided as library,
like with library types that wrap them.
And most user code will deal in those library types.
It's just a guess.
Even still, std function is a little unusual in this class
of type erased parameter type.
'Cause when we compare to string_view or other view types,
std function is simpler in a couple very important ways.
First, it's only erasing one thing, right?
If I accept the std function
I'm accepting one callable thing.
Nearly every other commonly discussed type erased type
is erasing a collection of things.
String_view and span erase contiguous sequences.
AnySpan it erases a contiguous sequence of not quite T.
Map view or set view erased the ordering details
of some associative container, et cetera, et cetera.
When you're type erasing a single thing
it is much easier as std function does
to make that an owning type.
A type where you can copy it and not have any requirement
that the original outlive the copy.
When we do type eraser over a container,
on the other hand, over a collection,
then we generally don't want to actually copy
all of the things in that container.
And we rapidly wind up with types
that are very, very easy to make them dangle.
And then we get two big schools of thought.
We can have non-owning reference parameter types
only as parameters, right?
Have string view only as a parameter type,
never use it anywhere else.
And this school of thought will say,
non-owning reference parameter types are okay
as long as they're only function parameters.
And then there's the use with caution school
of use non-owning reference parameter types just carefully.
Like, yep, there's sharp edges there.
Just stay away from the pointy bits.
Always question storage of any such type.
There is also a third school of thought
that these types are all completely garbage
and too hard to use and we should throw it away entirely.
I don't see that happening,
but I have been surprised before.
Even with just these two options we have,
as a community, a difficult choice to make,
especially in a language with such lofty goals
as do not pay for what you do not use.
Because there are absolutely use cases
for non-owning types like string_view
above and beyond just as a parameter.
Consider your file name processing.
You could imagine a path processing function
that takes a string view for the path
and returns a view into it for the suffix
or the file name or the directory.
But note that we're looking at this.
Using string_view on input here
means that we don't have to overload on char* and string ref
and whatever user provided types
might be contiguous and useful.
String_view does all of that overload work for us.
That's the point of vocabulary types.
That non-owning parameter type as a replacement
for an overload set is very powerful
and it is why we are talking about this right now.
We could make this design
a little bit more palatable to some people
by changing that return value
to string instead of string_view,
but forcing a copy there and changing that return type
feels a little awkward,
especially if this wasn't suffix but was directory, right?
If you deal in very long file names,
those might start to actually be large copies.
That might start to add up.
Don't pay for what you don't use, that's C++, all right?
If you can use this style of design safely,
that sounds like a very C++ thing.
But it is awfully easy to misuse.
Take a glance at this slide.
Half of these are bugs,
and they are awfully close neighbors
to code that works just fine.
All of which is to say if we continue to build views
and other non-owning reference parameters,
there's going to be a tension here.
I think that the basic language,
like design and evolution principles,
are gonna say yes, it's fine to use these carefully.
If a user hurts themselves on that sharp edge,
that's on them.
But we're definitely going to see
a lot of very caring people offering guidance
like never use these except as a parameter
or even never use these at all.
And that is a hard tension width.
These are going to be the most efficient ways
to express that overload set, or instance.
Personally I've been using string_view
for quite a while and I find it pretty easy to spot
questionable use in code review.
Anytime that it is used as anything other than a parameter
I ask why do we know that the underlying data
will live longer than this view.
That does not work so great if you
are an almost always auto person, sorry.
But this is all sort of a long tangent
on doing type erasure for parameter types
and duck typing and a library form.
There's open questions here.
We'll see how this all plays out.
But we need to pop back the stack a little bit.
We're done looking at overloading on parameters
or producing parameter types
that hide that overload set for you,
and instead we're going to look
at the other important dimension for overload sets,
which is method qualifiers.
This is a really important variation on overloads.
You can overload member functions
based on method qualifiers,
either ref qualified or const qualified.
Overloads that vary in const qualification
tend to be of the form access this underlying data
in a const appropriate fashion, right?
You see this in vectors, operators, square brackets, right?
If you have a const vector you get a const T ref.
If you have non-const vector you get a non-const T ref.
All right?
Simple, easy, obvious overload set.
Overloads that vary on ref qualification
tend to be about optimization.
You can do one thing safely, the Lvalue qualified version,
and if we know that we are operating on a temporary,
or operating on an Rvalue,
we can more aggressively optimize
by leaving the object as a whole
in that dreaded, valid, but unspecified state.
So for example, in C++ 20 the string buf type
will gain an overload for str,
a ref qualified overload for str.
Here a ref qualified overload means steal.
So you can change your code that is returning buf.str,
which has to copy out of the buffer,
to return std move of buf.str to say,
I'm done with this buffer and because I'm done
with this buffer you don't have to copy that string out,
you can move that string out.
When we use a pattern like this,
we don't need to worry about scary naming
for destructive member operations.
With just consistency with higher level rules
don't operate on moved from objects
does all of the warning that we need to do, right?
That's very handy, like you rely
on existing user experience,
and understanding and forming
these performance when available overload sets
is also a nice way to be future compatible.
We can all start writing this return statement right now.
It doesn't hurt anything
and it expresses a reasonable intent,
I'm done with this buffer.
When the underlying standard library catches up,
it'll just optimize a little better, right?
So a future compatible design, that's always nice to see.
When we combine const and reference qualifier overloads,
we can keep const correctness and provide good optimization
like in the case of optionals value.
These types of overload sets still meet
our general definition for good overload set.
A user does not need to know which one is called.
A single comment can describe probably more clearly,
the behavior of the whole overload set
without having comments for each member individually.
While we're here we should talk a little bit
about method qualifiers on their own without the aid
of an overload set.
So what do ref qualified methods mean when not part
of an overload set and what do const methods mean?
If you've got nothing but an Rvalue-ref qualified function
that means to do once.
This is a great design for destructive operations
and things like call this function at most once.
It should only be used, however, when the Lvalue equivalent
semantic would be buggy or break the design of your type,
not just because of inefficiency, right?
This goes back to the don't equals delete things
just because you're being judgy.
It's perfectly reasonable for me to provide
only the Rvalue version here
because the whole type is this is a one-time callable.
On the flip side, Lvalue qualifying a function
says don't do this on temporaries.
This comes up almost never, outside of overload sets,
but it does have one case that I have been seeing,
which is we should maybe be Lvalue qualifying
our assignment operators in general.
Like you can currently assign to a temporary
of most user-defined types.
You currently cannot assign to a temporary of an int.
Like we are not doing as ints do.
But if you ref qualify it like this,
then the compiler will catch that that assignment
is probably nonsense and not what you meant.
And practice, I don't think I've ever
actually encountered that bug in real code,
I don't actually care that much,
but from a design consistency perspective
that's maybe an actual use case and it sort of expresses
what the intent is.
Moving away from references,
what do we really mean when we const qualify a thing?
Hypothetically if we marked every method as const
and every member as mutable,
this class builds just fine.
But this is going to be an absolutely
rotten type to work with.
Const should mean const.
But there are types that have mutable members,
and those aren't actually a problem.
But there's some question, there's some connection there.
How do we use const and mutable well in design?
And I suspect that there are a couple ways to view this,
but the one that has given me the most mileage
is the tie between const methods, mutable members,
and thread safety.
The standard has some things to say about this.
It says it in a very obtuse fashion.
I'm 95% sure that's the right citation.
If you squint it talks about read access,
write access, modification, and const arguments.
According to the person that claims responsibility
for that wording, it's horrible wording,
but the intent is roughly this.
Const accesses to standard types do not cause data races.
Standard types are thread compatible
unless otherwise specified.
Here we have to define thread compatible
as a very hand wavy definition, concurrent invocation
of const methods on this type do not cause data races.
Any mutations of an instance of this type
means that all of accesses require external synchronization,
as opposed to thread safe where concurrent invocations
of const or noon-const methods do not cause data races.
That's mostly things like mutex.
There is, of course, also a thread unsafe classification,
but you should just not do that.
More on that in the next talk.
It's interesting to note that if you build your types
out of thread compatible or thread safe types,
and you don't use the mutable keyword
for your member variables,
then you're probably thread compatible right outta the box.
There are some scenarios where pointers are shared around
and that isn't quite true.
But more on that in the next talk.
In this model of things const is less about
I am changing the internal values
and more about it is safe to call this method concurrently.
And with that model of things we can see at a glance
that this class is thread unsafe unless response
is inherently thread safe.
And usually what such a design requires is a Mutex.
But what just happened?
We started talking about properties of types,
which means that we're finally ready to move on
from low-level API design and talk about higher level stuff.
But it is also important to note that this is a bridge.
There is a bridge between these domains.
Cons is both a promise about your values
and a promise about the ways that it is safe
for your type to interact with the rest of the program.
And that makes that a topic for the next talk.
And we have lots of time for questions.
I will leave this up to jog your memory.
There are microphones in both places.
- [Man] So you were talking about the qualifiers on methods.
I'm not sure I understand the meaning of a const Rvalue
reference type method.
- Yes, the optional value
overload set has a const ref ref in it's overload set.
And I am 95% sure that that is only there
so that it works nicely in generic contexts,
but like semantically it doesn't mean anything.
- [Man] Okay, so I'm not crazy
that it sounds meaningless. - Yeah, you're not crazy.
Yeah, it is, the first time that I took
a good hard look at it I'm like, wait, wha?
Huh?
Yeah, you know you're well spotted.
Yeah.
- [Man] Kind of in the same vein with ref qualified members,
you talked about the star member of string buf.
And you talked about how interactive
with the guidance that we not use moved from objects.
Now imagine that we had a type that was like string buf,
but it had separate buffers for input and output,
and had ref qualified members
that allowed us to retrieve either of those.
If we follow the guidance not use moved from members
and ref qualified them both we could extract one
or the other but not both in a destructive manner.
What are you're thoughts on that kind of API design
there a type is safe to be used after it's moved from
so that you can extract other members from it destructively.
- I think it would be really hard to express the,
I think it would be really difficult
for that to actually play out in practice
because the move constructor,
no that's not quite right.
I would be deeply skeptical to start with
because the very, very high level principle is
don't touch it after you've called std move on it.
Right?
Except in very, very unusual circumstances
that you do not want to get into.
And so I think you would probably be better off
with some other naming for those types of things,
and I haven't actually seen a whole lot of value types
where there's multiple logical parts to it
that you would want to be consuming.
I think, perhaps, a more accurate thing would be
that you wanted
you wanted an accessor for the
input and the output individually
that you could steal from.
And then it would be a std move on that member,
but you'd have to sort of make that member public,
and I don't know, it's gonna be kind of a weird type.
- [Man] Thank you. - Yeah.
- [Audience Member] Actually, in the same vein
of move from types, I guess you're saying that
advice is to never touch a move from object.
There are cases definitely with the standard library objects
where you potentially could reuse them
with certain constraints.
Like if you build up a vector that's a member,
and you, once it's built up to a certain point,
you can move out those values but then start building up
your vector fresh again,
as opposed to having a unique pointer to it.
I mean, do you see the standard keeping that
kind of generic advice or do you see
certain standard types providing slightly
stronger guarantees about what you can do
with move from objects?
- I mean, you will always be able to,
in the next talk we will talk a lot about the precondition,
like preconditions expressed on the APIs of a type.
And you will always be able to call any function
that has no precondition after it has been moved from.
Whether you should is an entirely different story, right?
It is definitely well understood that when you move
from a unique pointer now it is definitely null,
and so you could call reset on it
and when you move from a vector
you don't know what's in it anymore and we all-
- [Audience Member] And were assigned to it or something.
- Right, you could assign to it, you could call clear on it.
You could ask it its size, right?
But you should not make any assumptions
about data being there or not.
But practically speaking, the likelihood of encountering
a scenario where the clearest way to write your code
actually has you reusing that zombie husk seems rare, right?
And you're probably better off not causing the wait wha,
of your reusing it after move.
Like, just don't poke that bear.
Like, yeah-- - [Man] I have a
personal example which I'll talk about later.
- Yeah, I mean, yes, technically speaking, it will work.
But there is a higher level requirement on everyone of
don't produce code that makes your reader go wha?
'Cause confusion costs more than CPU cycles.
All right?
Over here.
- [Man] Would you mind to show again the slide
with the results of function return in string_view?
Taking the string_view and returning part of it.
Yeah, the one with like red and,
yeah, it's beautiful.
I'm afraid it's not very safe.
I mean you marked option four as good,
and I believe it's undefined behavior.
- No. - [Man] Your argument of
destroyed--
- Not, no.
- [Man] Your string actually-- - Temporaries are destroyed
at semicolons.
By the time the string's copy instructor runs,
actually by the time the string's, yes,
copy constructor, move constructor?
Copy constructor, by the time the copy constructor runs
the temporary is still there
because the temporary doesn't go away until the semicolon.
Like, I guarantee this is fine.
John. - [John] Hey Titus.
- What's up?
- [John] So you were talking in the beginning
about how overload sets should define a group of functions
that are all semantically basically the same.
- Yep. - [John] And you were also
talking about five minutes ago about
how it's really important for const to be meaningful
and especially in thread's safety situations,
and there are pretty commonly used overload sets
like operator brackets is like this,
which often will overload on const-ness
even though giving a read only reference
and giving a mutable reference are really, really different,
especially in a thread safety context,
but I don't think anyone in the room
would argue that that's a,
that like operator brackets is somehow completely
a broken design like on a vector or something.
So how would that fit into the advice that you're giving us?
- Well so the advice is like, at a very high level
the advice is it is probably a good overload set
if you can have a single comment for it, right?
And a comment for that const non-const overload set
on vector is give me the specified T, right?
Like give me that object, maintaining as much const-ness
as you can if you wanna be really wordy about it, right?
But like, that is a reasonable definition.
- [John] Awesome. - Yeah.
- [John] Thanks. - Yeah.
(audience member yelling)
And yes, and I will pitch Jeff Gromer's talk
on thread compatibility and thread safety on Thursday.
That's actually in my script
in the next part of the talk as well,
so everyone that sees both of these
will get that pitch twice, but yes,
go to Gromer's talk, it'll be great.
- [Man] I'm not gonna try to start a debate, I suppose,
on edge cases, but I do have some curiosities
regarding perhaps some guidance you might offer
on how access modifiers when used with different types
of constructors and more importantly non-const references
when past functions, how would you recommend
this mechanism as a tool to prevent implicit conversion
from types, particular in my example,
I suppose const char* to standard strings,
but plenty of times where that has come up
with other situations.
- I think that actually is the third bullet point here,
make explicit any of your constructors
that aren't an obvious easy overload.
Like, I think knowing what we know today,
we probably would have made the const char* constructor
for string explicit so that you can spot
the fact that oh that is an expensive copy.
- [Man] Sure, you would encounter the same scenario
with assignment operations as well
when you're not dealing with a constructor
at that particular point as well,
but you'd still end up having
to encounter implicit conversion for the type provided.
- I think I lost you, sorry.
- [Man] I may just be blowing hot air I suppose.
- No, like, it is, code is very hard,
and it is much easier with examples
instead of verbally, so come find me afterwards
if you wanna talk.
Yeah, I just can't quite do that one live.
Yeah.
- [Man] You had a slide about taking sink,
from data as sinks and about taking it as value
versus ref versus Rvalue ref,
and the guidance was based on sort of
relatively complicated evaluation of whether
one operation's gonna be more expensive than another.
Is there a fundamental reason why that's a decision
that I have to be making as an API designer
and the compiler can't decide for me.
- The,
in this language the compiler can't design for you.
We have too much legacy stuff,
like we can't change these behaviors.
I think in theory it is the sort of thing
that might be amenable to optimization,
or to automation, but that would be a mad science project
first off in order to figure that out a little bit,
because among other things, like,
it's going to change wildly if you take a new text.
It's going to change wildly if you call an RPC, right?
Like when you're sinking a thing,
like you need to know the cost of those things,
and not every line of code is equivalently costly,
and trying to teach the compiler, like,
which of these things is expensive?
That would be a neat trick.
So like in the presence of magic,
yes in theory that would be cool.
And until then it's gonna be a little complicated,
and I don't know.
Use your best judgment.
It's hard.
- [Man] Hello, in a very early slide
you had really the general idea of whether an overload set,
you know, if you're taking in a std string
and then it's a lightweight wrapper
around something that goes to a const char,
pointer to a const char,
that was a good thing.
And then I think at a later point,
if I understood you correctly, you started saying that
when you identified people using const references
to std unique pointers, in coder views you see that
as like an issue. - Oh yeah.
- [Man] Is it kind of like an issue in it's just
a nice wrapper around passing a pointer
down to some?
- It's not a wrapper around a pointer, right?
- [Man] No, no, no, I mean like when you make a,
add something to your overload set
just to make it easier for people already using
unique pointers to pass down the raw pointer?
- I,
no because the operation to actually extract a raw pointer
from a unique pointer is a one,
whereas if you only had a const char* overload,
well, no.
Yeah, now I see your point.
There is a logical inconsistency there.
I think it is that it is very common
for us in legacy code bases
to have char*'s floating around and strings floating around,
and like it's nice if you don't have to know
which one it is and which one to care about.
Whereas, passing a unique pointer by reference,
especially by const reference,
is just fundamentally a little silly
because you're saying I can only invoke this
if I already have ownership of the thing,
but I'm not transferring ownership, right?
It'd be like, that would be an okay function by itself
if you had to prove ownership of an object,
which is a weird semantic.
I guess strictly speaking, if it is an overload
of T* and const unique pointer ref,
I guess strictly speaking
that might be okay, but I don't know.
That's a weird, like,
I feel like that's the wrong result,
but I think you might be right.
(audience laughing)
So, yeah, I don't know, I'll have to think about it.
But interesting, yeah.
We are strictly out of time, but I will take Eric.
- [Eric] Hi. - What's up?
- [Eric] Well Titus, I think I heard you suggest
that you recommend explicit on constructors
of more than one argument.
- If those constructors aren't logically the thing.
Like, for any constructor that is accepting
a bag of parameters from which you can construct,
as opposed to the parameters it has
are platonically like the same notion
as what you are constructing.
So maybe in a ranges form two iterators is a range, right?
But in a vector, a T and a size is not actually a vector.
Did I head you off?
- [Eric] No I mean it's a question I've had,
because I mean, C++ has this language speaker
and I've never known what to do with it.
- Yeah, I think by default,
by default we should be tagging
all the constructors explicit until you think about it
and we have the default wrong as is often the case.
But yeah, like I really think
that explicit should be way more common.
There's a tip of the week on that.
- [Eric] Okay, thanks. - So yep and we're outta time.
Thank you all very much and-- (audience clapping)