Cookies   I display ads to cover the expenses. See the privacy policy for more information. You can keep or reject the ads.

Video thumbnail
Welcome, everyone. To — ah, what is this... — day 3 of CppCon!
My name is Arthur O'Dwyer.
I'm going to be talking today about container adapters, exception guarantees, and the STL.
The abstract to this talk made it sound like I was going to talk a whole lot about `flat_map`.
Given that `flat_map` is now dropped from C++20, and I found lots of interesting examples that are already in C++...
We're actually going to be walking through a lot of examples that are not `flat_map`. But they are `flat_map`-related.
So prepare yourself for some `priority_queue` fun.
By the way, a little bit about me...
I've been, now, to three standards meetings; but I am not an ISO member.
I'm an "insider outsider" in that respect.
Yeah. What else about me? I do C++ training! If you're looking for someone to come train your new hires, let me know.
All right. "Mostly invalid." What do I mean by this?
Well, this talk grew out of a blog post that I wrote during the Kona standardization meeting in February 2019... https://quuxplusone.github.io/blog/2019/02/24/container-adaptor-invariants/
...where, during the LWG discussion of `flat_map`, I was looking at how existing library implementations handle class invariants.
And so we're going to start by looking at three fun examples. One of which has now been fixed.
And I'm going to relate that to what's been proposed for the `flat_set` and `flat_map` proposals.
We're going to look at something coming up in the committee that is relevant to this area —
Not a complete fix, but definitely indicating the committee has interest in fixing this sort of issue.
I'm going to speculate on some possible problems and solutions — well, I should say, "problems, and possible solutions" —
and then we will probably have a lot of time for questions.
I have a lot of slides, but a lot of them are walking through these examples. So I expect that to go quickly.
We'll see what happens.
All right.
By the way, for each of these examples, I have a Godbolt link in the upper right-hand corner of the slide.
So, if you get bored with my walkthrough, you can go to Godbolt Compiler Explorer and type in that shortlink, and you will see this happening live. On both libc++ and libstdc++.
Let's make a priority queue.
I'm assuming that if you're coming to learn about `flat_map`, you already have some inkling of what a `priority_queue` is.
A `priority_queue` has inside itself a container — usually a `vector` — of its element type. In this case we're going to be using `vector`.
And it also has a comparator. It keeps the vector arranged in a max-heap order, based on that comparator.
So here I have my comparator type. Normally it would be `std::less`, but I wanted to do something more interesting; so I'm actually going to hold a `std::function`.
The comparator takes two elements — my element type is `int` — and it returns a `bool`,
We'll get to what that function is going to do in a minute.
Here's my `priority_queue`. Its element type is `int`; container type is `vector`; and it takes one of these comparators.
So what does the comparator do? The comparator is this argument that I am passing to the `priority_queue` constructor.
So what I'm doing in this case is, I wrote a lambda that simply returns `a < b`; but...
let's say that your comparator does something complicated.
I don't know what it's doing here. It shouldn't be doing this. But let's just say that it does this. This is obviously a pathological, malicious user, saying...
...when `a == 2 && b == 3`, throw "oops".
You can tell they are pathological and malicious because they are throwing `const char *`s. Please don't do this.
And then I've got some code at the bottom just to show you how to print the elements of a `priority_queue`.
We just pop off the maximum element — the top element — over and over, and print them out in order.
So I'm not going to show that code on the future slides, but that is what's happening when we're printing the elements of the queue.
So here's my code.
I make my `priority_queue` with my throwing comparator. I push 2. I push 2 again. I push 1.
Then I push 3. Now, I claim this is going to throw. So I'm going to wrap it in a try-catch.
This is— I'm expecting my comparator to throw an exception.
And I'm going to do it again. My comparator's going to throw again. I'm going to catch and ignore THAT exception.
Now, what state is my `priority_queue` going to be in? We're going to walk through and see that. I'm going to push 4 and I'm going to print out the elements.
Let's walk through this. What happens?
Here's my priority queue. This is my vector with capacity 7.
I've drawn it a little staggered so that you can see the tree structure. Right? The top element at the front. Its two children. Those two children's two children.
So let's start inserting things. We push 2. 2 goes in the front. That was easy.
We push 2 again.
We compare 2 to 2. 2 is not less than 2. They're in the correct order. We keep going, having preserved our heap invariant.
We push 1. Uh, 2 is in fact greater than 1. We're preserving our max-heap invariant — the max element is still at the front. We're all good.
We push 3. We compare 2 with 3... and that throws.
Now, since this throws, the rest of the function — the heap sifting — does not happen.
We do not go on and swap them. We don't know which one is less than the other one.
So we got an exception. We ignore that exception. And we go and we push 3 again.
And we compare 3 to 2 and again it throws. And we ignore that exception, and we go on.
Now our priority queue is in this bad state.
Okay. Well, we'll push 4 anyway. We'll compare 1 to 4; 4 is greater than 1; we swap it upward...
4 is greater than 2; we swap it upward.
And then we start printing off our elements.
We pop off 4. The way we pop from a priority queue is, we swap the max element to the bottom and then we pull it off.
And then we re-heapify. We sift down. So we swap... and we swap...
And now we've preserved our heap invariant. Except of course our heap invariant was broken to begin with!
Our heap invariant was broken at the time that we did the pop. It remains broken after we do the pop.
[AUDIENCE] Can you comment quickly on heap invariant? I'm not familiar with that term.
[ARTHUR] The heap invariant is, as we are putting things into the heap here, every parent is greater than both of its children.
In terms of the comparator. Which in this case is integer "less-than," except that it sometimes throws.
So when we put in the 3, we really wanted it to get swapped all the way up to the top of the heap
But, because one of those comparisons threw, we didn't actually finish re-heapifying. We didn't finish sifting it up.
All right. So we we pull off the 4 and we re-heapify.
But we still don't have a heap, because, you know — garbage in, garbage out.
We swap the 2 to the end and pull it off.
We swap the 3 to the end and pull it off. We re-heapify.
Now we're back. We've got a nice heap. We pull the rest off in order.
So this is the output that you get when you print the elements, from highest to lowest, of this priority queue, with libc++.
Now, I hand-picked an example that libstdc++ also gets wrong... in a different way!
They actually produce different output for the exact same input. I just thought that was cute.
Okay, so, if we have a comparator that throws...
We put that inside a `priority_queue`... we break our class invariant...
then bad stuff happens.
We end up with a `priority_queue` that is in what the title of this talk calls a "mostly invalid state." This is worse than a moved-from state!
This is NOT a "valid but unspecified state" for this class to be in. This is an object that has broken its container invariants.
`priority_queue` has a class invariant that its vector is always sorted by its comparator.
The whole point of having a class invariant is that should invariably be satisfied. It is an invariant.
The only place you're allowed to break your invariant is maybe briefly inside a member function.
By the time that member function exits, your class invariant should be restored.
But a throwing comparator can force `priority_queue::push()` to exit after doing SOME work — breaking the invariant — but before restoring the invariant.
So this means that then the outside user of the `priority_queue` gets to see that broken invariant; gets to see the class in an invalid state.
All right. So this puts the `priority_queue` into a very bad state.
There are also other ways to break `priority_queue`'s invariants.
If we have a comparator, such as this `struct X` here, whose copy assignment operator throws...
This is also trouble for the standard `priority_queue`.
So here I have `struct X`, and it has a flag that allows it to sort up or sort down.
And its `operator()` just does sort up or sort down — less-than or greater-than.
And it has a copy-assignment operator that throws.
We put one of these into a `priority_queue` with the initialization \{5, 4, 3, 2, 1\}.
By the way, when you give an `initializer_list` to a `priority_queue`, it will call `std::make_heap` on that. Right?
We can't force bad input into a priority queue by giving it an out-of-order initializer list.
It will take the initializers it's given by us and then it will call `std::make_heap` to make sure they're in sorted order
But in this case, we're giving it something that already does satisfy the max-heap ordering.
5 is greater than 4 and 3; and 4 is greater than 2 and 1.
And we create `asc`, the ascending `priority_queue`, with a comparator that wants ascending order.
And 1 is indeed less than 3 and 2; 3 is less than 5 and 4.
So we make `desc` and we make `asc`. And then we assign `asc` to `desc`.
When we do this, it copies the container contents, and it copies the comparator.
In some order.
It turns out it does them in that order. That is kind of a logical order to do things in. [Because in the real world, copying the container is the operation more likely to throw.]
So we copy the container contents...
...and we copy the comparator. Which throws.
Now `desc` is in a bad state. Again, it's broken its class invariants.
Now we do the popping and printing... code not shown, but...
We swap 1 to the end and pop it off.
We re-heapify... and when we re-heapify we look at that comparator —
— that's the ordinary less-than comparator, so we're looking at a max-heap invariant —
4 is greater than 3 and 2. The max-heap invariant is preserved. We're done. But of course, garbage in, garbage out.
We pop off 4. We re-heapify.
We pop off 5. We re-heapify. We need to swap those guys.
We pop off 3... and re-heapify...
And finally we pop off 2.
Again, this is libc++'s output.
libstdc++, again, produces DIFFERENT wrong outputs.
And I have not looked to see exactly what libstdc++ is doing that doesn't match my little CS-101 diagrams.
It seems like it should be doing exactly the same thing, but somehow it gets different answers.
So I was talking with Billy O'Neal, here at the conference.
He helped me greatly with coming up with these examples for this presentation.
So, hat tip to him for this.
This upcoming example is due to Billy. I would totally not have known that this was a thing.
This is fixed in the latest MSVC STL — which was just released on https://github.com/microsoft/STL , by the way!
This upcoming example deals with Visual Studio's `unordered_set`.
So `unordered_set` is a HashSet — what Java would call a `HashSet`.
It's a hash table; you have sort of a vector of buckets. Each bucket has a chain of elements inside itself. Everything whose hash function collides goes in the same chain.
But MSVC does something a little interesting here. Its `unordered_set` is actually implemented by composition of two different STL containers.
When they made `unordered_set`, they decided they were going to do it by having a `vector` of buckets, and...
...instead of having a separate linked list for each bucket, they'll just have one big linked list. And the vector will have iterators into that linked list.
We're going to see a diagram.
So here's my `std::unordered_set`, `s`.
And it has two more primitive members inside itself. By the way, it also has at least two more members —
It's got a member representing the hasher and it's got a member representing the `key_equal` function that it uses for key comparison.
In this particular example, my hasher is actually an empty struct. There's nothing special going on with my hasher.
And my `key_equal` is the default `std::equal_to`. So I didn't bother to show them here.
But I am showing you the linked list, `elts`. That's that first member there.
That's a linked list of all the elements in the hash table.
And the `bkts` vector — the `bkts` vector contains iterators into the linked list.
Actually pairs of iterators, I think. I'm not showing all the details here.
In this case the buckets came out, like, exactly in order. I don't think they can, like, cross over each other. There's lots of implementation details that I'm not showing.
But I make my `unordered_set` and I put this data into it. 1, 2, 5, 6, 7, 10, 15, 66.
MSVC's `unordered_set` will have 8 buckets by default. So I'm showing that accurately here. We have eight hash buckets.
And I have 8 elements in my initializer list, which means that the load factor is 1.
So looking up an element here involves— if I were to look up, let's say, `10` —
step one would be to hash it, giving `size_t(10)`.
Step two, mod the hash value by the number of buckets. So we would expect to find it in the second bucket — that is, the 2th bucket, which is actually the third bucket.
I look in the third bucket, which— Sorry, I should use my mouse cursor here.
0, 1, 2. Follow that down. This is now bucket number two.
And then I start `key_equal`ing the elements... 66 is not 10... Oh, there it is.
And I could keep walking until I hit the next bucket.
[INAUDIBLE FROM AUDIENCE]
Yes, they know when they traverse the list where the next bucket is. And again, the details are going to be handwaved for that.
But you could imagine they just look forward to find the next non-null pointer and then compare with that.
Something like that. I believe it is more involved than that.
Oh! In fact, I believe what it is, is that they have two iterators per bucket, and there's a `begin` and an `end`. So they don't even have to do anything special.
Okay
So that is how we do lookup.
Now what's the other operation that hash tables need to be able to do? Well, when the load factor gets above one, we're going to resize the `bkts` vector.
And we're going to have to rehash all the existing elements.
So in this particular example when we insert a new element, let's say with value `9`...
That brings our load factor above 1. That means we have to rehash from our existing 8 buckets up to 64 buckets.
All 64 buckets are not going to fit on my slide.
During the rehash we're going to update the iterators in the `bkts` vector to point to the first element of each new bucket.
We may need to shuffle the list a little bit.
I have no idea how that MSVC decides it needs to do that or how that's accomplished...
...other than that it's accomplished by shuffling pointers on the linked list and not by literally moving the elements — that wouldn't be allowed.
But for example, in this case, 66 and 2 would end up in the same bucket mod 64. 10 is actually going to end up in a different bucket.
And so I am going to have to swizzle those pointers around. But that is not relevant to our example.
What's relevant to our example is that we're going to have to rehash all the existing elements. How can I screw this up?
Well, I can have my `Hasher::operator()` throw an exception during the rehash operation!
So here's what I'm going to do. I'm going to make my `Hasher`... I am going to initialize my `unordered_set` exactly as we saw in the diagram...
Then I'm going to set global variable `throw_on` to `5`.
Then I'm going to try to emplace 9. That's going to trigger a rehash of all the elements in the hash table.
When it does, it's going to call my hash function. When my hash function reaches element number 5, it is going to throw.
I'm going to catch that exception and ignore it.
What state is `s` going to be in?
Here's my `s`... and I'm going to do the rehash now. We allocate a new `bkts` array of 64 buckets. (I think there's 32 there. Imagine there are 32 more.)
And we're going to begin the rehash. We're going to start with the first element. We're going to hash 1; we're going to put it in bucket number 1.
We're going to hash 66, 10, and 2. We're going to swizzle those pointers around through magic that is handwaved away here.
We're going to put them in bucket 2 and also in bucket 10.
And then we're going to hash 5. And that's going to throw.
So we propagate that upward — we do stack unwinding — and the user catches the exception and ignores it.
So now, `s.begin()` and `s.end()` have not changed. They iterate the entire linked list.
I can certainly use linear `std::find` to find element number `6` in my now "mostly invalid" hash table.
But! If I call `s`'s actual `find` method, that does the hash lookup...
...it will hash 6, look in bucket number 6, and then — because the `bkts` array was incomplete — because we threw in the middle of populating it —
— the bucket appears empty! `std::find` and `s.find()` produce different answers.
`s.find(6)` says "no, 6 does not appear here."
If you try to count how many sixes there are in my `unordered_set`, it will say there are none.
But if you use the linear search, you'll find it just fine.
We've managed to break our class invariants and put our container into a terrible, terrible state.
A state where it is not even really, like, recoverable, at this point. I mean, there's no sense in which this even is a valid `unordered_set`.
It sometimes claims to have an element, sometimes doesn't. Right? This is, I would say, WORSE than the `priority_queue` example.
However, Billy also mentioned this to me because he has fixed it.
"Fixed in master."
Now when this happens... when the rehash happens, the exception is thrown, the rehash is aborted, the container is about to enter an invalid state...
What does the STL do?
Well, it restores the invariant!
Still surprising? Like, "better," but also, not "good"?
So how do I relate this to the ostensible topic of this talk, `flat_set` and `flat_map`?
Well, `flat_set` has the same kind of problem; because it is built the same way as a `priority_queue`. Right? It's a container adaptor.
And it puts together two different components that are both user-provided.
It puts together a user-provided container and a user-provided comparator...
And it has what I'm calling a "cross-component invariant," that the container has to be sorted by the comparator.
It's very easy to break that invariant, when you don't control either the container or the comparator.
Various ways to break that invariant include...
If we are doing copy assignment of `flat_set`s, and container assignment succeeds, but comparator assignment throws; or vice versa...
We saw that in our second `priority_queue` example.
If comparison throws during an operation that temporarily needs to break the invariant, such as insertion or deletion...
That was our first `priority_queue` example.
Also, if the element type itself throws during move assignment, or during swap, that would also be bad.
The element type itself can actually do pathological things that break our container invariant.
So `flat_set` has this kind of problem due to its unusual number of what I call cross-component invariants...
Such as, the container is sorted by the comparator. And also, by the way, the container can't contain duplicates.
So even if I find where I want to insert it, and I put the duplicate in there and then remove it again, or something like that... that would still temporarily break my invariant.
`flat_map` has the same issues plus more. It has one MORE cross-component invariant.
Namely it has a `keys` container and a `values` container.
By the way, this is the proposed `flat_map` that has two containers. This is called the split-storage, uh, paradigm.
The Boost version of flat_map, the prior art for `flat_map`, uses a single container of pairs — key-value pairs — in the same way that a a `std::map` and a `std::unordered_map` are containers of pairs.
The proposed flat map design in https://wg21.link/P0429R7 uses split storage.
There is one vector of keys, and one vector of values; and when you iterate over it,
you use proxy iterators where the first and second members are references to the elements in each one.
So maintaining these two vectors means that you have one more component,
and therefore one more cross-component invariant: that the key container and the value container have to be in sync.
So if key insertion succeeds, but value insertion throws — right? maybe "out of memory," something like that? — I have enough memory to resize the keys container, but not to resize the values container...
...that would also be a problem. That would also put us into an "invalid" state.
So another pernicious case...
Oh, I should have mentioned, by the way... as I'm doing these examples... I should have warned you to watch for this...
Not all of these examples have the same failure mode.
They all look superficially similar because they have the same cause: user-provided code throws. And they all have the same symptom, which is that terrible bad stuff happens.
But the mechanisms for fixing them in the standard may not all be the same.
The root cause here is slightly different in each case.
Okay, so a pernicious case for `flat_map` is the `extract()` member function.
P0429 `flat_map` is trying to be a little more user-friendly than `priority_queue`.
`priority_queue` has the container member and the comparator member, and they are both data members, and they are both `protected`.
How many people here have ever derived from `std::priority_queue` just so you can get out the container?
[NO HANDS RAISED] Ah. Well, I am legit raising my hand, as I have totally done that.
Right. It's a protected member, which means you have to derive from it if you want to get the container out.
`flat_set` and `flat_map`, as proposed, both have public `extract()` members that allow you to move out the container when you want to get at it,
and then `replace()` members that allow you to put the container back.
Which again can allow you to deliberately break the invariant by messing with the container and then putting it back; but—
that would be— We could just say that's undefined behavior.
I have no qualms about saying, if you deliberately replace your container with a messed-up container... right?
That would be on the order of instantiating `flat_set` with a container that's not really a container, and when you `clear` the container it puts 42 in it or something like that.
That's a ridiculous container.
So `extract()` moves out the containers.
It's rvalue-ref-qualified. You say `std::move(my_flat_map).extract()`,
and you get a struct containing a `keys` member and a `values` member, which has the two containers in it.
This allows you to quickly get out the heapified [OOPS: sorted] containers.
So it returns `std::move` of that struct containing the two containers.
Now that leaves your `keys` and `values` containers inside the `flat_map` in their moved-from state, right?
I've moved out of them. They are now in their moved-from state.
The STL has no idea what your container's moved-from state is.
And when you have TWO containers, it has no idea what EITHER of their moved-from states are.
And so it has no idea if those states are compatible.
If they're both empty, that's awesome. I still have a `flat_map`, which is empty.
But if one of them is empty and the other one contains moved-from elements, or something like that, then the overall `flat_map` is in an "invalid" state.
And at that point I can't even ask, you know, what is the `.size()` of my `flat_map`? Because it doesn't even have a size!
It's like, half of it— like, the keys container has size 0, and the values container has size 5. Like, what does that even mean? It doesn't mean anything.
So, P0429 mandates that the extract member function, after moving out of the containers...
...will explicitly call `.clear()` on both containers!
Which is mandated by the container requirements to make both containers empty.
So we're moving-out because we want efficiency. I am clearly done with my `flat_map`, right? It's an rvalue, I'm moving-out-of it and I'm taking over these containers...
And yet, merely in order to preserve these standard requirement that a moved-from `flat_map` is in a valid but unspecified state —
— Not an invalid state, but a valid state — we have to clear it!
Ben?
[INAUDIBLE FROM BEN DEANE]
Can we say that the result of an extracted `flat_map` is still a valid state for the `flat_map`,
but the only things you can do with that particular valid state are over-assign it or destroy it,
but not, say, get its `.size()`?
I believe— I'm not sure, but I believe— that this is what people in the Committee mean when they talk about a "radioactive state."
It puts it into something like a radioactive state, where
it's "valid" in the philosophical sense, and yet you can't even do operations on it that have no preconditions, such as `.size()`.
Right? Normally when we say some container is in a "valid" state, what we mean is that you can do container operations on it that have no precondition.
If I have a vector in an unspecified state, I know I can ask, "How big is it?"
A vector has a size.
In fact, I can `push_back` onto a vector that's in an unspecified state.
The result will be that same vector, in whatever state it was in, with one more element.
Right? That's fine. `push_back` has no precondition.
The only things I can't do with a vector in an unspecified state are, let's say, you know, get its 42nd element — because it might not have one.
There's a precondition on `operator[]`.
But in `flat_map`'s case, a radioactive state, such as we would be in after a move if we didn't clear ourselves,
that would be something where you're saying you can't even ask how big it is,
Because it doesn't have a size. It's still "valid," but it doesn't have a size.
That's a very weird state to be in.
So that's more of a semantic game. And I think LWG is reluctant to play that kind of game.
We do not have a lot of options... which I think is the title of my next slide.
But we will come back, maybe, there.
[INAUDIBLE FROM AUDIENCE]
"The only thing we have to guarantee is that it's destroyable."
Well, I mean, TECHNICALLY you don't even have to guarantee THAT! You could say, "you have to heap-allocate it and drop it on the floor," you know,
"when it gets in the radioactive state, you're not allowed to delete it." Something like that.
Now, that would obviously also be terrible! These are all terrible options!
Over here. [INAUDIBLE FROM AUDIENCE]
"What's the advantage to using the two separate containers?"
I have not benchmarked it. So this is not "the answer"; this is merely the received wisdom.
But the received wisdom is that by putting all the keys very close together, you get better cache locality on lookups.
Because you're looking directly at the keys; there's no values interspersed with them in the cache.
I've seen numbers both ways though.
I've also seen lookup being terrible regardless! Because what are we doing — we're doing a binary search in the keys array.
Binary search, by definition, starts in the middle; hops way over here; hops way over here; hops way over here...
Like, that's not cache-friendly no matter how you localize the data.
There was a question over here somewhere.
[AUDIENCE] So `.extract()` extracts both—
Keys and values, yes. `.extract()` extracts both keys and values.
[AUDIENCE] So you could just state that after you `.extract()`, it's in a moved-from state.
"It could just state that after you extract both keys and values, the `flat_map` is in a moved-from state." [AUDIENCE] Yeah.
But the `flat_map` is NOT in a moved-from state!
Its two sub-components, which are user-provided, are both individually in moved-from states.
The `flat_map` itself has a container invariant: these two sub-components have to have the same size.
[AUDIENCE] Yeah, but once you extract all the meat out of the container, it's like it's in its moved-from state.
"Once you extract the meat out of the container, it's LIKE its moved-from state." It is LIKE a moved-from state...
[AUDIENCE] It might have some rice and beans...
[ARTHUR] But it is not in a valid state, right?
If I can't call `.size()` on it, it's not in a valid state.
[AUDIENCE] You cannot call `.size()` on a moved-from state, right?
Yes, you CAN call `.size()` on a moved-from state. I just finished saying this!
When an object — in the standard library, at least — has been moved-from,
all of these standard objects do have this guarantee, that after you move from it, it is still in a valid state.
You don't necessarily know what state it's in — it's unspecified — but it's in a valid state, where you can still do all the operations that you could do on a container in any unspecified state.
Just like a function parameter coming in. I don't know what state it's in. But I know I can ask for its size, I know I can `push_back` onto it.
This is a different kind of thing.
[INAUDIBLE FROM AUDIENCE]
Yes. Class invariants should hold even for an unspecified state.
It is a state. It's a state of the class. The invariants should hold for every state of the class.
[AUDIENCE] While developing, isn't the loosest requirement just that it is destructible [INAUDIBLE] and basically assignable-to? So that you have some loose requirement [INAUDIBLE].
[AUDIENCE] Also the other thing, about hashing: if I write a hash function, it's sort of common sense that it shouldn't throw, because the world will obviously blow up.
So even if the standard doesn't guarantee or require that the hash function doesn't throw, common sense is that if you write a hash function that throws, you are basically effed.
[ARTHUR] Yes. So there were two comments there.
One was, could we just say that there are no container invariants other than destructibility, and everything else is sort of... "quality of implementation" would be the standardese way of saying it.
You know, like maybe when you call `.top()` on a `priority_queue` you get the top element and sometimes you don't, but at least it's destructible... [LAUGHTER]
Yeah. Errnngh.
Now, is it stupid for the user to write a throwing hash function? Absolutely yes, and I will get to that. Yes.
All right.
So another pernicious case for `flat_map` specifically would be `insert`.
This is the currently proposed wording for `insert`.
"Adds elements to `c` as if by..." this loop where we insert them one at a time.
This is very slow.
It would be nicer if we could insert them all as a batch and then re-sort the entire `keys` vector and the entire `values` vector.
But by mixing in a whole lot of new information at once...
we're increasing the chances that something in there — maybe the comparison, maybe the move assignment — might throw.
and if it does, then we've lost more of the user's data. And so there's this... uh...
Again, terrible, terrible choices we have to make. What do we value?
Do we value trying to be a little user-friendly and preserve more of their data, in this very rare case where we got an exception?
I'll note in passing that the wording there where it calls `ranges::unique` is wrong wrong wrong. `ranges::unique` does not use the comparator...
Oh, it uses `key_equiv(comparator)`! Well, maybe that's right, now.
`key_equiv` would be something that would have to return `==`, not `<`.
So typical LWG response to this is, uh, the way Billy fixed MSVC's `unordered_set`.
"Take off and nuke it from orbit. It's the only way to make sure."
So...
Billy O'Neal — who helped greatly with this talk, by the way, but all the mistakes are mine —
wrote P1843. I think this was presented in Cologne. It might have been in the post-meeting mailing. I'm not sure.
It's called "Comparison and hasher requirements." And it's a response to an LWG issue titled "Throwing swap breaks unordered containers' state,"
which is specifically about that `unordered_set` issue that we saw.
As we see, the issue is much larger. It affects `priority_queue`, and it may affect other types as well.
In preparing this talk, I looked at `std::vector`.
You might think inserting into the middle of a `std::vector` would also have this problem because you have to move everything down and insert in the middle,
and then if that throws, maybe you have to move things back; and if that throws, maybe you have to throw some stuff out.
It turns out that's not a problem.
Many of the containers are already designed — either intentionally, or unintentionally based on the tools available in the 1980s —
but for example what `vector` will do is copy everything down one at a time, with move assignment,
And then we have a moved-from element here which I can then move-assign into.
And if that throws, I still have a vector! It might have a moved-from element in the middle, but I still have a vector.
So P1843 proposes, in part, that in the `priority_queue::swap` method...
It requires, of course, that the container and comparator both be swappable. Not necessarily nothrow-swappable.
But if swapping the containers throws an exception, "either there are no effects on the containers or—"
—and I've informed that the intent of this wording is,
"after the swap happens and before we return, we will clear the container."
Nuke it from orbit.
If we get into one of these "invalid" states, we will restore the container invariant by whatever means necessary.
If that means trashing your user's data, that's what we're going to do.
Now, this seems, to me, user-hostile.
I'm not saying this isn't the best option! But I am saying it seems user-hostile.
Because the current buggy behavior of `priority_queue` — the behavior I demonstrated at the beginning of this talk in those two examples —
— that's EXACTLY what any working programmer would expect, based on a sort of "Computer Science 101" explanation of how `priority_queue` works.
It has a container; it has a comparator; and what it does is, it calls, you know, `push_heap` and `pop_heap`.
...And they throw, and when they throw, they leave the container in an interesting state, you know.
And they could just assume that all of these special members were defaulted.
In practice, they are NOT defaulted; at least on libc++. But it would be an ABI break to make them defaulted.
But you can pretty much understand the existing behavior of `priority_queue` — this "buggy" behavior, this behavior that leads to these invalid states that are not really priority queues.
At least the current behavior is understandable based on a very simplistic understanding of how a computer works.
It makes perfect sense that this would happen.
The standard library would have to actually cruft up their implementation with extra codepaths...
codepaths that would never be hit in any non-pathological case...
And these extra codepaths destroy trivial copyability...
It adds all of this dead try-catch code...
So it would preserve the `priority_queue`'s invariant,
but it would also destroy the programmer's data, and have all of these bad-for-maintainability effects,
to try to deal with this in the way that it seems that we are trying to deal with it.
The "nuke it from orbit" approach.
Unfortunately, that seems to be the only tool at our disposal!
That is, the two options seem to be, "nuke the user's data" or "don't nuke the user's data and then we're in this invalid state" — and we don't want that, either.
So when I was discussing `flat_map` with committee people, the discussion would inevitably go something like this.
Someone would say, "Ah, well, when the values container reallocation fails after the keys container's allocation has succeeded..."
or "When comparison fails..." or something like that...
"...we must either break the container invariant or nuke the entire container, i.e., clear it."
And then someone else would say, "Ooh, well, those are terrible."
"Like, maybe we don't have to nuke ALL the user's data. Maybe we can figure out some clever algorithmic hack that might prevent that bad thing from happening in the first place."
Like, "Maybe, well, what if we swapped the comparators first and then the containers second?"
Or, "Well, what if we put in an `if constexpr` so we could detect whether swapping would fail... and then we could do the fast thing, and then only if it COULD fail do we do the slower thing," you know.
And we'd spend ten minutes talking about that particular thing. And then we'd move on to the next member function and do it all over again, because that one ALSO had issues.
So this took a lot of time, and I don't think any conclusion was really reached, at least not when I was there in Kona.
So unfortunately, again, we don't have many tools at our disposal.
Revision 6 of the `flat_map` paper, which has now been superseded by an R7, had even tried...
Screenshot of the paper here, the proposed wording.
It proposes that there be a non-member `swap` function which is noexcept, but which is constrained.
This template participates in overload resolution only if every single sub-component of the `flat_map` is nothrow-swappable.
If any component provided by the user has a possibly throwing swap, then the `flat_map` will just not be swappable!
Problem solved. No more swapping of `flat_map`s.
Now, as it turned out, that didn't work anyway. Because `std::swap` will totally swap your `flat_map`s! It's fine. There's an unconstrained template to swap things.
But this wording has thankfully already vanished.
The problem is we're not entirely sure what to replace it with,
other than perhaps constraints on the entire `flat_map` .
That for example your comparator had better be nothrow-swappable, or you won't be able to instantiate the type. That's a much saner approach, I think.
[AUDIENCE] Could you drop requirements from the container if you have these behaviors, uh, operators, swappers, whatever, so that you can have a fallback implementation that is associated with those cases?
[ARTHUR] "Could you drop requirements from the container..."? By requirements do you mean things like "it's sorted," or do you mean things like "it has a swap function"?
[AUDIENCE] No. I mean, insertion is O(1), insertion is...
[ARTHUR] Oh. Like algorithmic complexity requirements. [AUDIENCE] Yeah.
[ARTHUR] That is already kind of done for a lot of algorithms.
So yeah, I don't think there would be much resistance to saying things like, "If the comparator is potentially throwing, then we will do linear search instead of binary search."
I don't think there would be resistance necessarily to such an idea if it solved the problem. I don't immediately see how it solves any of these problems, though, to just change the algorithmic requirements.
[AUDIENCE] ...a much fatter implementation of the search, of the mutators [INAUDIBLE] that actually saves all the states before, and if anything goes wrong, you just roll back to the...
[ARTHUR] "You can have a fatter.." Yeah, sort of like when you do copy-and-swap. You can have copy-and-find. Something like that. Yeah, yeah.
I suppose you could. There might be some resistance to that. I'm not sure.
[AUDIENCE] But that's the naïve implementation. You CAN do it.
Yes. Something like copy-and-swap can save us in some of these cases.
All right, so, at first glance, I think the first tool that we would use here would be undefined behavior.
As you said, a user should not be throwing from their hasher. They usually should not be throwing from their comparison function.
If an exception is thrown from one of these things we could just say, well, that's undefined behavior.
And then you don't need to be in a "valid but unspecified state." Your program has undefined behavior; it can do whatever it wants regardless of anything, at that point.
So we should probably just say explicitly in the standard library clauses
that any type which is used as a Hasher — any type used as a KeyEqual — any type used as a Compare — must not, shall not, throw.
If it does throw, you violated a "shall" cause, so you have undefined behavior.
Now, I would explicitly caution (not that the committee needs this reminder, but)
that does NOT mean that the `is_nothrow_invocable` trait should return true!
Because many, many codebases out there have `operator()`s that are not noexcept.
I mean, most codebases, I would say, still have some non-const `operator()`s left in them, let alone `noexcept`!
People generally are not marking their `operator()` noexcept; and I don't think that they should have to.
On the other hand if their `operator()` actually does throw, then I'm perfectly happy saying that they have undefined behavior.
On the other hand! `operator()` of the Hasher, I think we all agree that shouldn't throw. On the other hand, something like copy assignment...
Making a copy of a state? That could totally throw.
[INAUDIBLE FROM AUDIENCE]
"Platonically a Hasher is a function..." okay, but a function can still throw?
[INAUDIBLE FROM AUDIENCE]
Oh, you mean the `operator=` of the hasher shouldn't be allowed to throw, because platonically it's just a function.
Unfortunately, `std::function` has a throwing copy-assignment operator!
So.
So I think that the solution — like, any formal solution to this — is going to have to talk about exception guarantees.
For those unfamiliar, there are basically three or four (depending on how you count) exception guarantees in the literature, that people would be aware of.
When someone says that my function provides the "strong exception guarantee," what they mean is that either the intended operation succeeds,
or else when the exception is thrown, there is no effect.
If an exception is thrown, you roll back; or you had no effect in the first place, using something like copy-and-swap.
The strong guarantee is a reliable building block.
Even when an exception is thrown, I know the state of my program. This is a reliable building block for larger algorithms.
Unfortunately, the STL has no notion of the strong guarantee.
It has no way for users to indicate that they provide it; and in general STL components themselves do not provide the strong guarantee, except as explicitly marked.
And there are a few, particularly around `vector`. `std::vector` is a very strong exception-safe kind of type.
[INAUDIBLE FROM AUDIENCE]
Explicitly marked in the paper standard.
There will be a line that says, "the effects of this function are...", and then, "if it throws then there are no effects."
By the way, "there are no effects" is of course a misnomer. There are always effects. Time is always passing.
But that's the standardese for it.
The basic guarantee says, either the operation succeeds or the component enters a valid but otherwise unspecified state.
The basic exception guarantee is the basics of what we should all strive for.
Even if you do nothing else, If you claim to be writing code that uses exceptions — as opposed to just dicking around with code, right? —
If you're using exceptions, you are providing the basic exception guarantee.
When you throw an exception you are still in a valid state. You may not know what that state is, but at least it is valid and you can recover.
If you can't recover, you are not USING exceptions. Because when you catch an exception, you don't know what you can do.
Nico.
[INAUDIBLE FROM NICOLAI JOSUTTIS]
Yes. [INAUDIBLE FROM NICOLAI]
[ARTHUR] I mean... [NICOLAI] In wording you can.
Right. In wording, the STL can require it of users, of user types, in some cases. Yes.
[INAUDIBLE FROM NICOLAI]
And the STL can also advertise it in writing for their own types [OOPS: operations], such as `push_back`.
But I meant programmatically...
the library vendor cannot actually verify that a provided type has the strong guarantee...
and then metaprogram against that.
"If the user provided `operator()` has the strong guarantee then I can do this; otherwise, I will fall back to that."
There's no way to metaprogram that today, because there's no way to test to say "does it have the strong guarantee or not?"
All we can do is require on paper and advertise on paper.
[NICOLAI] We'd need some indication in the signature that said—?
We need some indication in the signature, or a specialized type trait, or an attribute... I don't know. But we have nothing at the moment. Yes.
Finally there is "no guarantee." This is what `priority_queue` has.
Either the operation succeeds, or the component enters an unspecified and possibly broken state.
If you do this, you are not using exceptions. Because when an exception is thrown, you have zero way to recover from it because all your things are in some absolutely invalid state.
You can't even call `.size()` on them anymore.
And we can get into this state pretty easily through generic programming. Through composing user-defined components.
If any of my user-defined components give only the basic guarantee, such that my higher level algorithm (such as `priority_queue`) has two different components, both of which provide the basic guarantee...
and some cross-component invariant that must be held between them... I'm in trouble. I may end up here.
If an exception is thrown from a user-provided component through an STL object with cross-component invariants, then...
The STL doesn't know the user-provided components' state—
Right? The component provided the basic guarantee. We know it's in some state, but we don't know what that state is.
And we have an invariant that needs preservation between that component and some other components...
But, when we don't know this guy's state, then the invariant may be broken, and we have no way of checking that.
And that means OUR invariant is broken. And that means we are NOT in a valid state.
And that means we can provide no guarantee at all. We cannot even provide the basic guarantee.
...Other than by eliminating container invariants. Such as, `priority_queue` maybe sometimes isn't a priority queue. Right? Quality of implementation!
So the STL object likely ends up in an invalid state; and the only way that we know to recover from this at runtime is to nuke it from orbit.
Clear the container. Or in some other way drastically restore the container invariant by going back to a known state, even if that means losing data.
So, is there a way forward?
As I've heavily indicated here, I suspect the answer will involve somehow codifying the notion of strong exception guarantee into the standard library clauses.
At least to say on paper that certain things "shall" provide the strong exception guarantee.
Again, we can also say that things shall not throw.
We already require that of allocators.
That is, actual memory allocation of course can and is expected to throw;
however, if I have an allocator object and I move-assign it or I copy-assign it or I swap two allocators, THAT is never allowed to throw.
The standard says allocators shall not throw in these cases. For these operations they have the nothrow guarantee.
So we do have precedent for doing that. And I think we could also apply the nothrow guarantee to a hasher.
When you hash— um, when you call the hasher.
I don't think we would require it of the hasher's copy assignment, for example.
Well, maybe we could; but, because `std::function` has a throwing copy assignment, I am reluctant to do that.
And also it could be a lambda that captures a `std::string` — that would have throwing copy assignment.
However, I do think providing the strong exception guarantee would be reasonable. And requiring it.
[NICOLAI JOSUTTIS] I'm not sure, but we have this concept of "enabled hash functions."
[NICOLAI] We say something about that. It should be guaranteed that we don't throw [INAUDIBLE]
[ARTHUR] So, enabled specializations of `std::hash` are not allowed to throw from their `operator()`? Ah. I was not aware of that.
So it is going in that direction. Now, that doesn't cover user-provided hashers at all. [NICOLAI] Yes.
[ARTHUR] But it covers `std::hash` specializations. [INAUDIBLE FROM NICOLAI]
So I suspect that there would not be resistance to requiring the nothrow guarantee on hashers' and comparators' `operator()` in general.
I'm less sure about things like copy-construction...
Yeah.
Things like copy-construction, though, for stateful comparators... because they could be lambdas that capture `std::string`...
Like, you have to allocate, and that might throw.
So I don't think that that will be a short discussion, for THOSE member functions.
For `operator()`, I think that could be a very short discussion.
Right. So that also means that when the STL provides a class that is likely to be used as a hasher, a comparator, et cetera, such as `std::function`,
I think that the STL should go out of its way to provide the strong exception guarantee.
Because it knows that we are going to be using that class as a building block. And the only way to use any class — or any operation, I should say — as a building block, as a reliable building block,
is to make sure it has the strong exception guarantee.
If it has the basic guarantee, you can't use it as a building block for larger things.
[INAUDIBLE FROM AUDIENCE]
[ARTHUR] I'm not sure I heard a question there.
[INAUDIBLE FROM AUDIENCE]
That we may not require non-throwing copy assignment...
[INAUDIBLE FROM AUDIENCE]
[ARTHUR] Okay. ...Maybe?
However, I think this is just about the last slide!
So, as the way forward, strong exception guarantee... The standard should provide strong exception guarantee on some of its types that are going to be used as building blocks...
Probably we should require the nothrow guarantee of `operator()` in some cases; that would be a short discussion, I would hope.
And with that: Thank you!
[APPLAUSE]
[INAUDIBLE FROM AUDIENCE]
So you're asking, "At the moment there is no way to detect a broken invariant— that a specific object has broken its invariant. You would like to detect that in a debug mode. How would you go about doing that?"
In the specific case— Well, actually the first thing that comes to mind, in the specific case of `std::variant`...
`std::variant` sort of has that. `variant` has a sort of invariant... ("Variant has an invariant." Ironic.)
...that it's not valueless by exception. And yet, it has a member function `.valueless_by_exception()` that allows you to test for that state.
So could we put something similar into something like `priority_queue`, or `unordered_set`? Could that be a solution?
You'd have a `.valueless_by_exception()` on just, like, every container, right?
Like when you use an `unordered_set` you can generally assume that it's not in this state; but if you wanted to test, after an exception, you could say, "did this cause a broken invariant in my `unordered_set`?"
That is actually an interesting idea.
But that would be adding state to the container. So it would be an ABI break, so no one would do it.
And also you'd be paying for that extra state. People might not want to pay for it, even if they took the ABI break.
And you'd have to write the runtime code to, you know, someone would have to imagine all the cases that could cause that valueless state, and then insert code...
... just like that code. I was complaining about in `priority_queue`'s copy-assignment operator.
If you're going to insert code there, I already consider that as having lost at least half the battle.
And then it's just, well, WHAT code do we put there. Do we set the `valueless_by_exception` flag, or do we just clear the container and we're done?
So, I think that's very interesting; I haven't heard that before; but I don't think that it is likely to be a good idea.
Other questions?
[INAUDIBLE FROM AUDIENCE]
Oh. Could you use the microphone? Sorry, I totally forgot about the mic.
[AUDIENCE MEMBER AT MIC] Actually, never mind.
All right. Any other questions? comments? and please use the mic if so.
[AUDIENCE MEMBER AT MIC] Is there any way to query if a function or method is "can throw"?
[ARTHUR] Is there any way to query if an operation can throw? [AUDIENCE] Yeah.
Well, the basic answer is yes, and the expert answer is no.
The basic answer: you can certainly ask if it's `noexcept`.
That would be if the user has annotated it with `noexcept`. You can query for that at compile time.
[AUDIENCE] But even `noexcept` things can throw, right?
[ARTHUR] Yes, well, other way around. `noexcept` functions will never throw. They may abort, they may deadlock, they may run forever, but they will never throw. Noexcept functions will never throw, period.
However. Non-noexcept functions, such as `Hasher::operator()`, you know, like nobody marks that `noexcept`, right?
It's just noise at the end of the line. Nobody does that.
But it still doesn't throw. No sane person would write a throwing comparator.
So, unfortunately, the standard library cannot ever, you know, "bondage and discipline," say "you shall put noexcept."
Any more than it could say— I mean, it SHOULD say "thou shalt put `const` on your `operator()`"...
[AUDIENCE] But there could be a trait that tests if the compiler can prove that this will not throw.
[ARTHUR] Ah... could we introspect into the code of the comparator, to see whether it might throw? [AUDIENCE] Yeah.
Even then... Number one, no. The compiler writers are not going to like that, so no.
Also, that would make the implementation of the function part of the interface, which is being resisted on things like `noexcept(auto)`.
So there's precedent for saying "no, we're not going to do that for philosophical reasons."
As well as the technical reasons that it's very hard to— you know. And then different compilers— one would be smarter than the other, they'd give different answers. You don't want that.
And you'd have to somehow do it recursively; because this function might call some other function that's not marked `noexcept`, but also never throws.
So you'd have to go look at that one and you'd look at a huge tree of code.
So, no, I don't think that's feasible at all.
[ELSEWHERE IN AUDIENCE] Other translation unit. [ARTHUR] Yeah, and it might be in some other translation unit, you may not have visibility into it. [AUDIENCE] Yeah, I was thinking that it would be...
...like you said, it would be one compiler could say something is "can throw," and the other one would say "I don't know."
[ARTHUR] Yeah. I mean, users who care about advertising that "I don't throw" WILL use `noexcept`.
That is the standard solution.
The practical problem is the large number of real codebases in industry that don't care about advertising that, but still need to work.
We have to make them compile, even if they— you know, in this case that never happens, you have undefined behavior.
They don't care about that part; it never happens! They would care about their code not compiling.
[AUDIENCE] My reasoning is, if you can provide a stronger guarantee for exceptions when you can prove that this is no-throw...
and a slower implementation that the user pay, if he can throw...
[ARTHUR] Could we have a fallback in the case that things are not noexcept.
This is what I referred to as the vector pessimization. I think it was one of the big mistakes, the original sin of C++11.
Yeah, vector reallocation, in C++03, it would do the only thing you could do in C++03. It would copy each element down from the old buffer to the new buffer,
when you reallocate a vector, and then it would destroy the old buffer.
Now, in C++11, we got move semantics. Move semantics allow us to move, move, move, move, move; and then destroy. That's much much faster.
However, people on the committee were concerned that move construction could throw.
Obviously it never should. Right?
Move construction is one of these things, like hashing, where no sane person would throw. Unless you were MSVC. [NICO] It's not that easy!
And so what happens instead is, they— well, first of all, they said, move construction might throw, so we might get halfway through moving and an exception happens.
And now we have— like we can't provide the strong exception guarantee.
Whereas in C++03, we could provide the strong exception guarantee here.
And so rather than either saying "thou shalt never throw from a move constructor" — which Nico [Josuttis] said, yeah, it's not that easy— Yeah, true—
Or saying, "vector reallocation provides the basic guarantee instead of the strong guarantee,"
which I think would have been quite feasible, in the presence of a throwing move-constructor,
instead they said, "we're going to add a new keyword: `noexcept`. People are going to put it on their move constructors. And if you don't put it on your move constructor...
Like, you're already upgrading to C++11. So you're writing a move constructor. But you've decided not to write `noexcept`.
"If you ever do that, then we will fall back to copying."
"And we will copy everything, and then destroy the originals, just like we did before. You won't get the benefits of move semantics."
This means that a lot of people who got the first part of the memo about C++11 — the move semantics part —
but they missed the later memo about having to mark everything `noexcept` —
And by "everything," I mean just your move constructor. Nothing else in the entire standard library cares about `noexcept`. Just the move constructor, and just because of vector reallocation.
But if you missed that memo you're quietly getting pessimization on every vector reallocation. You're getting copies of things.
[NICOLAI JOSUTTIS] So how many people implement a move constructor? I mean, the default behavior is fine. The default behavior is intact. Whether we are guaranteed to throw or not to throw, so if this is only knowledge people have to know who implement move constructors themselves. And that's not the majority of people. That's those who write special classes.
[ARTHUR] Yes, this is only something you have to know if you write a move constructor yourself.
Or if you follow the Rule of Zero and one of your members as a throwing move constructor.
[NICO] No, no, no! Then you don't have to do that... oh.
[ARTHUR] If you follow the Rule of Zero, and you have a `std::list` member on MSVC,
it will quietly copy the entire list, which is a linear-time operation, every time you resize a vector.
If you want to eliminate that behavior, you write a defaulted move constructor, and you mark it `noexcept` to override the default of non-noexcept.
This is rare. But when people see it — who may not be experts, as some of the people in this room are, you know — they're quite surprised.
I think they would be much less surprised if `vector` got the basic exception guarantee. And all of this stuff just went away.
[NICO] We can't remove it. That's something we introduced in C++98. In '98 we gave the strong guarantee... [ARTHUR] Right. [NICO] ...and we can't silently remove it. End of discussion.
[NICO] Because it would break existing programs. If they say, "we don't have to create a backup of the vector," and suddenly this is a security risk, we can't do that. That was no option at all.
[NICO] So we can argue that, yes, we never should have given the strong exception guarantee to `vector`... but that ship has passed.
[ARTHUR] To repeat for the video: Nico says we gave the strong exception guarantee— you know, possibly unintentionally; you know, that it just fell out that we have the strong exception guarantee—
[NICO] No, no, no, that was a very late paper, just before we finished C++11. It was written by two guys, and we discussed it. That was not an unintentional— [https://akrzemi1.wordpress.com/2011/06/10/using-noexcept/ has much background information on this.]
[ARTHUR] No, I mean in C++98. [NICO] In '98, yeah.
[NICO] It was very important. We had a big discussion on that.
So `vector` provided the strong exception guarantee in C++98; therefore, in C++11, it was not considered feasible to roll it back to the basic guarantee, even for performance.
[NICO] If you decide to break this guarantee, that's not an option. If life depends on that, you don't want to have that. Just recompile and suddenly you can't use your vector anymore, for safety — no.
[ARTHUR] Okay. Ben. [BEN DEANE] So I may be wrong, but it seems to me that many people already, outside of the Committee...
...already think of moved-from objects as what the Committee says is "radioactive." That is, you should only destroy or assign them.
Why is there such distaste in the Committee for adding— for narrowing function preconditions, like `.size()`?
Is it something that the library needs? Because I almost never see that in codebases I've worked in.
I mean, at least part of the distaste would certainly be philosophical.
I mean the idea that if I have a priority queue object — an object of type `priority_queue` — it should BE a priority queue.
And that means it follows the invariants.
Right? That IS type-based design. That is the purpose of C++.
If I can have a `priority_queue` object that doesn't hold a priority queue... what am I doing?
Now, I don't know if there are other technical reasons for it, but that's my sort of visceral philosophical assumption.
But you can say it's still a priority queue; but I can't call any functions on it.
Like, "valid but unspecified" could be interpreted to mean, "I may only destroy or assign, and I'm not allowed to call `.size()`," right?
It's just an engineering choice to allow us to call `.size()`.
But as far as I know — and I may be wrong — I've never seen people calling `.size()` on a moved-from object.
Does this happen in the library? Is there a deeper reason why the Committee is unwilling to narrow the preconditions?
There are certainly some generic algorithms... like, you might want to do this with `std::rotate`, for example...
...that would move from or copy from a moved-from object.
If I have a range where I've already moved from some of it, and now I want to shuffle the elements so that the moved-from elements are at the back... that involves assigning out of a moved-from object.
So that would probably also have to go in your set of things I can do with a radioactive object.
And you might say that's not a problem. But it makes me suspect that there might also be other operations...
Maybe they're all special members! Right? We have to be able to swap moved-from objects...
But it's a lot easier to say "anything with no precondition." That's very clear what it means. And we're unlikely to need to add things to it later for technical reasons. [BEN] Okay.
[ARTHUR] Nico. Can you use the mic, by the way?
[NICO] So we have the intention, in standardization, that we say a moved-from object is always in a valid but unspecified state. End of discussion.
Because if we violate that that means we are in big trouble regarding the guarantees — the basic guarantee and other guarantees we have.
And it's common to use that. If you have a moved-from `string`, you can assign a new value.
If you have a moved-from `string` you can call `.size()`. And ask "is there still something?" And even sometimes people do that.
I mean, if you pass a `unique_ptr` to a function, taking it by rvalue reference,
you can afterward find out, "was the value moved away from the `unique_ptr` or not?"
So the basic guarantee — and that's our real guarantee throughout the whole system, otherwise it breaks down — is,
after move-from, we are in a valid but unspecified state.
This is a different state than when we add exceptions. When we add exceptions, we are no longer in a guaranteed valid state.
So then we have this— if a `vector` has an exception and we don't have the strong guarantee,
all we guarantee is we have the basic guarantee. So there will be no leaks, there will be no memory leaks, et cetera.
So what you should no longer use a vector at all. No. Just destroy it and it's good.
And the basic guarantee only gives us the opportunity to say, "Well now, even if there is an exception, you can continue to use it."
And those are the two things we want to have consistent throughout the whole library because otherwise we are creating some confusion, I think.
So I think we have to discuss a few things here. I don't think we want to violate this policy.
If we find a solution for `flat_map`, et cetera, to have a more expensive implementation that gives us better guarantees in case some people tell us they guarantee not to throw, great! And then...
[ARTHUR] If I might interject on `flat_map` specifically...
...Because it came out of SG14, and it's called a "flat map," there is also strong NON-philosophical objection to having it fall back to inefficient algorithms in ANY case.
[NICO] So the other option is, okay then, to say let's not fall back. Let's, if we run into the case that we have an exception, then no longer use it, end of discussion.
So make sure, as a user, that you only use it for types and comparators and hashers that don't throw. End of discussion. That's all.
[AUDIENCE] If you don't want the pessimization when it throws, mark your stuff as nothrow.
[ARTHUR] I think the target users of this, in industry, are exactly the sort of people who forget to write `noexcept` on some of their functions sometimes... [INAUDIBLE]
[NICO] Yeah, yeah! I agree. But undefined behavior is not a bad thing, and in that area... unless you get it! [ARTHUR] I would agree; undefined behavior is a good tool here.
[NICO] I mean, as long as you don't have an exception, everything is fine. [AUDIENCE] The compiler should warn, too.
[ARTHUR] Now, it does occur to me, by the way, and I should have put this in the talk.. and I hope we're still being recorded...
If not, I'll make a blog post.
...that the "no guarantee, basic guarantee, strong guarantee" — those words could also be used to describe what we require of moved-from objects.
Even though you just said they're different things. [NICO] They're different. [ARTHUR] But a "valid but unspecified state" is exactly the state that is entered when you throw from an operation with a basic guarantee
The strong guarantee would be something like what we require for allocator types when they are moved from.
An allocator type, when you move from it must have the exact same value it had before you moved from it. Their move is tantamount to copy. This is required.
And that's sort of analogous to a strong guarantee. The moved-from object has not changed.
Whereas with other objects, we have the moved-from object DOES change, or MAY change, to some other state; but at least it's still valid.
And then of course you have things like `unique_ptr`. When you move from them, they enter a very well-known state, and there's no problem.
[NICO] I'm not sure that the basic guarantee is the same as valid but unspecified state. We should discuss it.
[ARTHUR] That's the ONE part of what I just said that I AM sure of! So we will discuss that.
But anyway, we are 15 minutes into our half-hour break, and I think we are definitely done at this point. Thank you.
[APPLAUSE]