- So let me tell you a story why I started
to care about state machines and try to
fix some misconceptions about them and that's where
we'll start here today.
So, about few years, I was an intern in a company.
It was a Christmas period time.
I was ready to go back home
and I noticed that a lot of developers were
really hard working
and I asked them, it's like Christmas period,
why do you work so hard right now?
And they replied that they have a contract going
the country which doesn't have
holidays in the Christmas so they had to be on call
and they had to begin into the problems,
possible problems which they may encounter.
So, I came back from Christmas, I was an intern.
I was working, right?
So, when I go back, I ask them, it's like, how did it go?
They said, well, Kris, we had a lot of issues
and specifically one which it took us all the Christmas
to sort out.
So they didn't go back home, they were really annoyed
and it was bad and the problem was that,
just let me just explain the system they were working on.
The system was very simple, it was just having some inputs,
manipulates them and had some outputs.
So not the rocket science but the problem was that
they haven't had
proper practices and
they found that
they didn't reset the Boolean variable
in some switch, in some loop, in some form
and that was introduced by the bad merge request
and the conflict.
So, I learned quite a few things out of that.
The first one, well, I didn't work for that company.
They worked for during the Christmas time.
The second one, I learned that,
well, they didn't have a good practices
because they didn't test it.
The way they found the bug, it was by reviewing the code
so they didn't have any good practices.
So, again I didn't want to work for that company
but the main problem I learned is that
they didn't have the declarative way of expressing
the application flow and before, they had to compare
different solutions using the code review.
The diff between the working and staging
instead of being able to figure out upfront
what was going on.
So, I asked them, why you didn't use
the state machines, for example.
Well, I decided the performance issues and other issues
which we didn't want to deal with.
So, this talk is about how we can actually
clarify those misconceptions
spread the word about the state machines.
So, let's begin.
But first, let's introduce a problem.
So, if you have a healthy company, not the one,
as the one I was working on,
we will get the requirements from our product owner,
or for example, client.
Let's do that in a TDD, BDD style.
So we have a feature connection.
We have a scenario, establish connection,
very simple, TCP approach.
So given I don't have a connection
when I receive a request to connect
then I should try to establish the connection
when I got an acknowledgement.
I should be connected.
Very simple and easy way of defining the requirements.
We have obviously more the requirements than that.
How can we actually implement that guy?
Well, let's take a look into state machines.
Obviously, there are different ways of doing that
but we don't consider them in that.
We will talk about Unified Modeling Language.
Is everyone familiar with that guy?
Yeah, more or less everyone.
So, it's a standard, it's specified in OMG.org.
You can read it.
I think it's even worse than the C++ standard to read
so I don't advise you to do that but
There a lot of features specified it in it so that's useful.
So our features,
scenarios, we can actually translate into the state machine
and how does it look like?
So, I guess everyone have seen some pictures like that.
We have an initial state, disconnected,
we have an event connect and we
and we call the establish action
and we transit to the connection state,
connecting state and if you have, for example, in connected,
we have the ping event,
if the guard which is in the square bracket is valid,
we call the reset timeout and we stayed in that state.
Otherwise, we don't call there is a timeout
and we don't do the transition.
Easy to follow.
Everyone knows that.
So, today, we'll try to implement that guy
many different ways.
So, we'll take a look into naive solutions.
By naive, I mean C++98 features.
If, else, switch, enum, inheritance, state pattern.
We'll take a look into STL and std::variant,
might be used to implement state machines.
Coroutines might be used to implement state machines
and we'll also take a look into the Boost libraries.
Let's first introduce some common implementation.
The common across all the solutions
so that we don't have to repeat ourselves all the time.
So, the event will be just simple structs.
We can start from that although they may have data as well
but we don't really care about it in our case.
We have a guard, one guard is valid.
It returns true by default.
It takes an event.
Nothing special about it.
We have an action which will just print to the output.
We'll use put just in order to see in the Godbolt
whether it's inline or not,
otherwise, we would have a lot of code
from SCDL derivative streams
which we don't really care about.
Sorry about that.
So, let's take a look into a naive solutions.
So, the first one will be there if, else.
We'll start doing it from implanted a connection class.
We'll have obviously free Booleans to represent our state.
The state will be implicit because they'll be
implemented in the code.
By default, the disconnected state is true,
other ones are false.
And when in process event which is the connect event,
we verify with the UI.
In disconnected state if we are,
we call establish which is the action
and after that, we have to reset all the states
because that would you learn from the story in the beginning
that you always have to reset all the states
just in case because
if you have really nested version,
you might actually forget about it and the transition
is just set in the connecting state to true.
What about processing the ping?
We get the connected.
We verify whether we're in the connected state.
We check whether the guard is valid and we set the time,
timeout and we stay in the current state.
Makes sense, right?
So what about the Godbolt?
As you notice on the left side on the bottom,
there are 64 lines of code.
Not bad, not a lot of boilerplate
and everything is very well inline
and optimized with flux.
You can see on the top.
So, that's good.
And why is that optimized?
Well, if anything else, it's really easy for compilers
to see it through.
So that's a great thing about the if, else, right?
So it's inlined in GCC and in Clang.
I just want Clang but GCC produced the same output.
There's no heap usage.
That's great, we don't have to care about
shared point as unique points or anything like that.
There's a smallish memory footprint,
however, it's not ideal because we have three Booleans.
As a state machine, we can only be in one state
at the given time so that's a waste.
It's hard to reuse and to illustrate that.
Let's see, it's so hard to reuse that
I guess everyone have seen the code when you have
multiple lifts, switches and everything is nested,
it's like how much can you go?
I don't know.
It depends on your,
I guess, it depends how many spaces use you've used
for your taps, I don't know.
We can go quite far.
But, yeah, obviously, it's hard to reuse
because it's hard to implement those guys and maintain.
So doesn't the best solution.
Let's take a look into a different one
which is still very easy and everyone probably have seen it.
Switch and enum.
Everyone's love it, I guess because it's so popular.
You can see it in any legacy code base.
So, it's good for the state machines
from the perspective that only one state might be active
at the given time so we use enum.
So, it's pretty much the same as the previous solutions.
Instead of Boolean variables, we use the enum state
with the three disconnected connected and connected states.
Disconnected is the default one.
When we process the event, instead of if, we do the switch
to handle current state.
By default, we break
and if you don't now default main is just enough.
Label so it might be put anywhere in the switch.
In case of disconnected, we call establish.
We change the state, we break.
Nothing really spectacular here.
And the same case for the ping.
If you are in connected, we verify their,
the even is valid, we set the timeout and we break.
So far so good.
What about the performance of that guy?
Again, compile is already good in optimizing simple things
and switches, one of the main constructions of C++
in which compile is very good at.
So we have 65 lines of code pretty similar to if, else
and everything is optimized very well.
So that's great.
But what about the switch, what's bad about it?
Well, it's inlined, that's good.
We fix the issue of the memory footprint.
It's one byte, that's fantastic.
There's no heap usage again.
It's still hard to reuse.
We still have a lot of nested switches and ifs and force
and everything like that.
It will be hard to add a new state and to maintain that.
So, in a sense like, you code for six minutes
and debug for six hours.
Even with the good test coverage,
it might be difficult to find those bugs.
So, yeah, maybe that's not the best solution either.
Let's try a different one.
We all know that object-oriented design
is great, right?
We can use short pointers everywhere, throw them away.
So yeah, let's do it
because we care about the performance so,
so yeah, we'll see.
So we have our state which is an interface
and with a lot of process event
for all the events we can handle.
And after that, we implement it
per state, on per state basis.
That's the thing which actually gives us a lot of,
a lot of goodness because we can implement a state
transition, we can add a state and transitions to it
without changing the connection itself.
So that's valuable.
It means that a lot of teams can work on
different transitions and combine them together.
So that's positive.
So how do we do that?
We have a disconnected state we inherit from the state,
we have the constructor,
well, we have the connections we have to pass through
and in case of connect, notice that I use final here
because that'll be important for the devirtualizations,
And we change the state to connecting.
It's pretty much the same implementation
as we've seen before, just a bit different way.
The same about the ping.
We have to implement here the ping
and probably also other events
to satisfy our requirements from the UML state machine
which we described before
but all in all, it's very simple.
It's playing with me.
What about the performance?
Well, we have a bit more lines of code
because, well, it's object-oriented design.
We write a lot of code,
boilerplate code just to satisfy the design.
And unfortunately, we have the vtable.
It wasn't devirtualized although we had final.
We have for Clang and GCC,
we have around 220 lines.
And I'm using the newest Clang and the newest GCC here
and all the examples are all in the Godbolt,
you can check them out yourself and experiment if you want.
So, what's good about the state pattern?
As I said, it's easy to reuse an extend
because we can add the states quite easily
by just adding a state inherit from the interface
and just add your transitions you want.
That's what we want from it.
Does highish memory footprint
because we have to allocate everything on the heap
which we have to use and we have dynamic allocations
all over the place,
probably shared pointers as unique pointers,
all the goodness.
But because of that, it's not inlined.
It's not even devirtualized in our case.
so that's bad because we care about the performance.
So although this guy is pretty good
from the developer perspective,
it's not really good from the production perspective
because it doesn't give us
what we really need.
So okay, so we've come through the naive solution
Nothing really new about it.
Let's take a look into something else.
It's like if I have to,
if I could pick one thing which C++ is really good for,
I'd say it's the writing libraries.
And one of the best libraries available is STL.
I hope you agree with that.
And since C++17 and C++20,
we have at least two new ways
of implementing state machines.
one of them is std::variant.
Everyone is using std::variant?
Who likes it?
Yeah, it seems like half half of the crowd.
I think it's very powerful idea.
Maybe we'll see how powerful it might be used,
how powerful it might be used for the state machines.
So here, let's take a look into connection.
We go back to the one connection class
instead of having it per state.
And right now, the states
are implemented using just types and struts,
which is very powerful for us because they can have data
which we couldn't have before when we used enum, for example
or a Boolean variable.
So keep in mind that although we won't have an example here,
we can have the data for the states
which may have a different stuff in it.
So disconnected may not have anything
but when we connected, we can have an IP, specific IP
which will be just active and available for us to use
when we are in that state because variant
might be in one given state as well.
So that might be very handy for us.
And we declared the variant as the state
with the disconnected by default.
How do we actually handle the transitions?
Well, we have to do the visit on it.
Overload is just an idea and it's proposed for C++20
to choose which lambda to call
so we don't have to implement a lot of boilerplate
to satisfy that.
We could use the generic one and use if constexpr,
for example, if you want
but overload seems to be cleaner in that case.
So if you get the disconnected,
it means that we ended disconnected state
and we got the connect event,
we do the same.
I guess all of you can see
the similarity to previous solutions.
So we call establish and we change the state to connecting.
The next case is just to satisfy the compiler
because it's generated compile time
so we have to call it for all the events,
all the states possible.
So in case of connecting and connected,
we just discard those guys
because we don't care about them in that transition.
So, what about the ping?
In the ping, we have actually very similar
but even an easier way of doing it.
We can just verify whether in connected states using get_if
which is the part of the interface of std::variant.
Verify whether the guard is valid and satisfied
and there is a timeout being called afterwards.
We stayed in the same state so there is no issue here.
What about the performance?
From the Godbolt and the newest and greatest STD leap
and Clang 7, we actually get it be inlined.
So that's fantastic.
It's not as good for GCC.
GCC doesn't inline it at all.
It's 140 lines of generated assembly
with mysterious vtable being there.
We have 70 lines of code to implement that
so all of the implementation is very similar
to the previous revisions.
As you could probably see, it's very similar
to switch and enum.
Instead of enums, with just use types
which gives us RAII and a lot of benefits.
So, if you sum up the variant,
it's quite small memory footprint
because we use the struct,
the struct is one byte.
So if you start the variant,
we serve the max size of all the states
which by default would be let's say one byte in our case
because we just have them to struct
plus we have to store something to,
which one is actually active
so that depends on the compile implementation.
That might be one byte, eights bytes, whatever.
But the footprint will be quite small
and it'll be really efficient because we can use
those structs to add the data to it.
So, for example, we can have an IDE or last ping
or whatever we need for those guys.
So that's handy.
Also, variant integrates very well with the expected
and static exceptions, ideas
because we can actually return an error
which is just part of the variant or expected type
with some error case which we couldn't do before.
I will be talking here is like we have exceptions
in disabled in our cases just for the performance reasons
but error handling is quite important
and handling the errors in enum or switch, it's not ideal.
We probably would end up from an exceptions
or something like that or using the return code
or global variables or something like that.
With variant, we actually don't have to do that,
we can return something unexpected
which is nice, which is good.
And it's inlined on Clang.
It's not inlined on GCC, I guess that's bad
but yeah, there is still room for improvement
and it's quite a decent solution
for the simple state machines.
It doesn't offer a lot of capabilities
besides simple transitions but it's still very useful
and having the efficient memory footprint is quite neat.
It's quite hard to reuse though
because it's very similar to the switch and enum approach
so we have to, you know,
if you have nested state machines
like calculus state machines or composite state machine,
we'll have to have a visit inside the visit
inside the visit.
You see the pattern.
It's not much better than if else, if else, if else, right?
So let's get to the good part.
And the newest part.
We can actually use in C++20.
Coroutines are a new way of implementing
state machines as well in C++20.
Has anyone experienced the coroutines so far?
Tried them out?
Yeah, quite a few hand.
Not a lot of hands, actually.
So yeah, coroutines.
We'll try to dig it a bit more into them in a second.
So, notice that in our case,
the connection won't be as tracked anymore.
It's not that type, it's a function.
So that's the first difference between coroutines,
when using coroutines as a state machine
because coroutines are functions.
They're called resumable functions and we can use them
basically as a functions.
So notice on the line, the second line,
we have the for, infinite for loop.
That's odd, right?
Why we would have infinite for loop?
Well, we need that guy because we can actually
use the coroutines to resume and suspend
and that's where the co_await is for
and if you don't get an event
which we do want, we want to come back,
do the loop and suspend again in the co_await.
It's a different way of thinking but, you know,
you can think of it like the state
is the position in the functions when we,
in the function code in a sense where we at.
So here, on the third line,
we have the co_await, we wait on the event.
If we get it and the event is the connect
we do establish and we go further
which means that we'll be in the further in the function.
If we don't get event which we satisfied,
we do the loop and co_await will suspend again
and wait for it to be resumed by,
by a process event call.
Notice that I'm using here a single further application.
These coroutines are often associated
with asynchronous ideas but you can use them
in the synchronous way very efficiently as well.
So there's nothing about being here
like multi-threaded environment or something like that.
It could be, could be the case but it's not in our case.
So that's very important.
So you can use coroutines in a single threaded way.
So right now, we do a second loop
because we want to be a bit further in the function.
We never come back to the first loop,
again, in our case, for example,
because we transit a bit further and right now, we stay here
because when we are resumed,
we are resumed from the co_await
which is on the second line on the second part.
We won't be resumed before that guy.
So I hope,
I'm trying to make it clear but it's like it's really hard
to think of it if you don't know what the coroutines are.
We are going to be resumed from the co_await call
in that place.
We won't be resumed anywhere else.
So we do the loop because we have to verify
whether we only established,
whether we got an established event or not.
I hope that makes sense.
There's a question?
The question is what is the end label for?
And that what you asked, we figure out in a sec
because we need it for that guy.
So, as you see, we have a lot of for loops
and the best way to get out of the nested for loop is to,
is goto, right?
So I hope that answering a question
that to get to the end.
It's a different way of thinking
but you have to think of it it's like we have dysfunctions,
we are in the position
at some point in that function,
we are resumed from the point, the co_await was being called
and after that, we react on the event
and all this hassle with the continue break and goto
is just to go to the different for loop part of the function
because we always stayed in that function.
So yeah, it is possible with the coroutines.
That specific condition is quite difficult
but as was mentioned, we use goto so maybe.
States are really implicit here, by the way.
As you notice, I pointed them with arrows
because we don't really need to know the state
because the state is represented
by where we added the function
which is like intriguing, right?
It's not really what we actually think of states but maybe,
maybe we would like to take a look into that a bit further.
So what about the performance?
We have more code.
There's more boilerplate code to implement by default
because we don't have all these facilities.
Coroutines are implemented in the core language
but we will have quite a bit of hassle
to satisfy the interfaces.
You can check it at Godbolt and should try it out.
The interface is quite difficult to deal with.
And therefore, the different proposals have to deal with it.
However, it wasn't optimized very well.
Although coroutines can be optimized
especially yield might be optimized very well
than the generators.
The co_await seems not
and we play for the
So to sum up the coroutines.
What is it good about it?
The structured code is actually really, really neat, right?
We can use ifs, else, force and before loop
not especially in the way we would like to
but we can still use them.
So, everyone who is new to the language doesn't have to
learn libraries or anything like that.
They may just use the language
and just have to understand the co_await
will resume you at at the point you were suspended.
As I pointed out,
you can use it in a synchronous environment
but you can also use it in a synchronous environment
very easily which is not really a case
with an older solution which I will be presented here.
There's is a learning curve to it, definitely.
It's different way of thinking so yeah,
you have to get used to it.
Might be worth it.
Requires a heap.
That might be not the case
in the future.
I don't know, I just checked out the Clang implementation
because GCC doesn't support coroutines.
I'm sorry, I didn't bother with Visual Studio
but I wouldn't expect it to be faster than
Even Gary said that as we see that most experimental
in Clank is doing it right.
So, it wasn't allowed it,
We have these implicit states
which is good and bad,
it depends how you look at it.
Depending on where we are in the function,
we know whether we end the given state or not,
that's good or bad.
I don't know, it depends.
And the co_await retains a type
so it has to be a common type, sort of, right?
Because we cannot return different types
unless we use variant something like that
so that's not ideal either.
And I have to admit that these for loops are quite weird.
Okay, so let's just do the goto just for the fun of it.
Because we can so it's pretty much the same example
instead of having the for loops,
we need the first for loop because the state machine
has to go forever.
But besides that, we don't actually need them.
Instead, we can have labels
if you really want.
Disconnected label means disconnected state.
Connecting label is a connecting state.
Gonna connect, yeah.
And basically, after that we go to the connected state,
everything is just described by the label.
And after that, we can do goto,
which seamlessly quite
even more need when, you know, that's the way we
exit the loops if you have really nested ones
and we don't wanna do the checking of the Booleans.
Well, that's a good solution, I don't know
but it's definitely valuable solutions with coroutines.
It seems like a function.
Changing the position in the function
might be used with the goto.
Please don't quote me on that
that I'm not saying they goto is the solution here.
I'm just saying there's an option.
You can explore it yourself and yeah.
So if it comes to performance,
still the same amount of code basically.
We still have allocations didn't help with that.
We didn't expect the goto to help with allocations though.
So what about the summary?
We don't have these infinite loops anymore.
Pretty cool, we didn't like them.
We have explicit states which we can treat
as a positive or negative depends
whether we liked it before or not.
And we have the goto
and they story about the goto,
it's you never know where you end up.
Maybe goto it's not the best idea.
One more different variant we can use.
We can use coroutines with functions and variant.
So if you think about the disconnected
state as a function,
we can actually come back to our for infinite loops,
sorry about that.
And just have the state being represented as a function,
separate functions and use the co_return
to go to the different state
which is quite handy because we can separate the functions.
It's pretty much the same for any other event,
state transition we want.
So you have a connected state
which uses the for loop for the same reasons as before
and we just,
Right, so the question is where is the in?
So, those guys are part of the connected class
which has an in.
Sorry about it.
So yeah, we actually in the a bit higher scope.
We could possibly pass the in into the,
our functions but yeah,
it's basically we have a connection
class which has an in as a variable
and we can use those guys
So yeah, that's very neat.
Performance, didn't help, actually made it worse
because the variant,
we use the variant to have the different types
of the return of co_await because we couldn't
deal with their common type so we wanted to
have something more
flexible but it didn't help.
So the functions, we can use them quite easily.
We can add a new state behavior.
We can we have types of events
and it's fancy.
It's a bit like Apple new headphones.
They can easily get lost
and you can easily get lost with coroutines, I guess.
So what about the Boost?
So, we talked about libraries.
STL is great, Boost is maybe even better sometimes.
Sometimes, it's not.
So let's compare a few other solutions.
We'll take a look into Statechart, MSM
which are available in Boost.
You can download them, use them right now and SML.
SML is the library I'm the author of
and we'll just take a look.
It's not a Boost library.
It just called Boost to confuse everyone.
No, the reason is it is aimed to be a Boost library.
Okay, sorry about that.
So, let's take a look into Statechart.
Has anyone used Statechart?
Not a lot of you guys.
So let's take a look how we can actually implement
a state machine in Statechart.
So we have to change our events
to inherit from the Statechart event
which is a CRTP thing.
Well, not the best start.
We have to do some more boilerplate code
but you can deal with that.
Statechart is more like the state pattern
in the object-oriented kind of design
so we have to inherit from the state machine.
The first argument parameter here is the CRTP thing
and the other one is the initial state
so you have to have a bit of knowledge how to use that guy
and all the actions have to be put into their connection
implementations just in order to
to be usable by the Statechart.
So as in the state pattern, we implement that
using the per state basis.
So we have a simple state which we have to inherit from,
we passed the type you have,
the connection we use and we have the reactions
and the reactions are basically the list of transitions.
So you have to be able to read it.
So is a transition from
the disconnected state on a connect event
to the connecting state?
Connection is the class and we have the establish event
so it's basically like before.
We've see then other cases, it's just more less robust,
less easy to follow, I'd say.
We can also have a custom reaction.
So on the bottom on the connected,
we have the ping which is a custom reaction
which allows us to do the guard in and call in the,
it allows us to do anything we want with the Statechart
and the transitions.
And we have to discard this event because we don't transit,
otherwise, we will just call the transit on it.
60 lines of code and 2,000,
almost 3,000 line of generated code.
Well, not great.
So to sum up, it has a lot of features though
UML related which we don't even cover here so that's good.
Is there a learning curve?
That's not great.
It has dynamic allocations, dispatch, high memory footprint.
So they introduced MSM into Boost
which is much faster.
However, it's really based on macros.
There's a different prominence for MSM which is,
I'm using this one because I find it
the best to be more expressive
and that's what I really care about,
the expressiveness and the performance.
So you have the event which you have to define
with using macro.
There's another macro to define the state.
There's another macro to define actions.
We have to pass all these parameter around for guards.
Guard is an action, just returns Boolean variable.
It's quite a lot of boilerplate but we do that,
all of that boilerplate is just for this guy.
And this guy is awesome.
It's a transition table in which we can actually read
and reason about the flow of our program.
So if you compare all the solutions,
this one, in my opinion, is so much better.
You can easily read it.
You read it the way that you say
I'm in disonnected state
when I get a connect event,
I call establish and I'm going to connect him.
That's really powerful to reason about and look at.
So all those macros are worth it
just in order to have that guy, in my opinion,
because it gives us
really easy way to reason about the application flow.
Obviously, we have more macros
one another macro thing.
What about the performance of that guys?
Well, it's not a lot of code, 80 line.
And the performance, you can see that there's quite a bit
of assembly generated
but it's not a bad assembly.
It's not always about the size if it comes to assembly.
You can see we have a jump table generated compile time
so all the transitions are just jumps which is not
but if it comes to performance
and we'll take a look into that as well.
So just keep in mind that
having a longer assembly is not always bad
unless it's inline when we want the inline version.
The summary, so it's declarative and expressive.
As you notice, I hope, in the state machine
transitions table, I love it.
I think that's the way to go.
There's dispatch which is O(1).
Jump table during that compile time.
It has UML features.
Small memory footprint.
There's a learning curve.
It's DSL based.
Not everyone likes DSLs.
That's terrible, we don't want that.
It compiles very slowly so if you run it on Godbolt,
you'll get a timeout.
So in order to get the results of Godbolt
to run it locally
but you can still click it.
I don't know, maybe Matt will extend the time,
I don't know.
It's very slow to compile.
And the error messages,
well, you don't wanna look at them.
it's NPO 98.
So, that's the reason.
That was the main reason why I actually implemented the
a slightly different version of MSM
using modern C++ features.
At first, I started with extending MSM
but that was just quite painful because it's C++98.
And in SML, we can do that.
I think that's even more powerful than the MSM
because, well, we don't have macros.
Yes, we don't have gotos, that's eve better.
Everything is easy to follow.
We have the transition table,
really easy to read and follow.
Do you find that easy to follow?
Easy to follow than if else nested switch?
Yeah, I see the thumbs up, I like it.
So yeah, that's great.
But, you know,
we don't want it just for the sake
of being able to write
expressive way of code in a C++
because C++ is all about
don't pay for what you don't use, right?
We care about the performance.
We want to have the best performance possible.
We want to be able to pick how we want to optimize that guy.
But the interface is pretty neat
so let's think of the interface
and let's give us the capability
to switch how to this,
how to change the strategy of how we actually dispatch
So SML has a different policies at compile time.
So we don't play at runtime for anything in SML
but at compile time, we can choose which version
of the dispatching strategy we want.
We can have a jump table,
that will be generated compile time jump table
like with MSM.
We can have a nested switch.
We'll take a look into those guys.
If else, fold expressions.
We don't have coroutines yet so we can use
coroutines at the back end.
Well, that's as an option as well if you really want.
So I think that's very actually
neat idea to be a to switch the backend policy
and have really nice common interface
because if it comes to performance,
you always have to measure, right?
So it's not always that the jump table will be the best
and from my experience, it's not always the best.
So let's take a look into the dispatching policies.
So assuming that we have some mappings and states
and we have just a dispatch call
which takes the current state and the event,
so that'll be the backend interface
for dispatching the policy.
What we can do with that?
Well, we can actually call it recursively
for as long as we find whether we are in the current state.
If we are in the current state,
we will call the execute which is just the transition.
If we not, we'll call ourselves again
with next iterations and the next state
so that we'll be able to call it
at the end possibly.
If we run out of states, well,
we don't call any transition
and probably it's just unexpected.
And that would be very well optimized in,
This is in Clang.
It gives us exactly the same code in the MSM as before
so that's fantastic.
Switch, we have the same case and we do
yet another trick of the switch, nested switch.
So in default, we call a dispatch again
which will call with the next ID
which we'll call the switch again
so we have this nested switch which is generated that
compiled time for us and if you have the case
in which we care about, we call the execute.
So is that optimized?
Well, yes it is.
it's optimizing Clang and GCC
and it gives us the same result as the switch did.
Notice that on the left side,
I highlighted how to switch the policy.
By default we don't have to pass it through
and the policy will be
the best one we use for the compiler
in a way that compiler could actually optimize that guy.
We go to jump table.
Jump table is not,
not straightforward but it's not difficult either.
We have the states, we have the dispatch table
kind of type for their function pointers
and we just generate an RAII for all the events
and after that, we just jump according to the current state
with the event.
So, that will give us
more code, less than MSM actually
but it won't be fully optimized
neither in Clang nor in GCC
but from the performance perspective,
sometimes, it's much better than the inline versions.
We have to measure those guys.
And yet another solution is the fault expressions
which is like addition for C++17
and what we can do here, we can just
all the available events with the available IDs
and we just execute when in matches.
It's pretty much the same as recursive version,
just not recursive.
So that's pretty handy
and full expressions are actually
very well optimized as well.
So, it gives us the same output as if else or a switch enum.
So, yeah, that's handy.
Nice to have.
So the summary.
The main benefit of using libraries like SML
is to be being able to build an expressive.
We have this UML transitions
been visible quite well for us,
the declarative flow.
Being declarative is always better than not.
We can customize it at compile time.
Awesome, that what all C++ is about, right?
Customizes compile time.
This part is the one, fantastic.
Has fast compilation times.
You'll see that in the results in the benchmarks in a sec.
It has additional features which we don't talk about
but it's obviously libraries
so it's much more powerful than just the transition.
We can't compare that to the variant
which is just a way of using for the transitions
but doesn't have anything else.
There's a minimal memory footprint.
So the size of connection is just one byte.
There's a learning curve.
Obviously, it's a library
and it's DSL-based which is a good and bad.
Depends on how you look at it.
So, if we sum up the solutions,
we can quickly get through them
as depending on how the state is represented,
how the transition table is represented
and how the transition is represented.
So, state might be represented by Boolean variable
in if else, enum, maybe a class, maybe an union,
maybe a function or maybe even a type like in SML.
And transition table might be per state
which means that object-oriented design using per state
or you can have it global where you see it all of them.
And the transition is implicit or kind of explicit.
Implicit means that you have to look through it somewhere
in the code to find where you actually do the change state
and explicit which is usually better than implicit
means that we can actually take a look
into one simple
source code, one file, a few lines of code
and we see everything is happening.
There's no surprises
Okay, so, sorry.
Let's take a look into benchmarks then.
How does it compare?
Yeah, I was talking about for almost 50 minutes
about those solutions,
we compared the Godbolt results but we have to measure
to be sure which one is the best
if it comes to the measurements we'll going to take.
So in order to do that, we change our actions
and guards to have the,
to verify, to access a memory
so that it won't be totally inlined
and the idea is that we randomized the events
and we run the process event on those guys
with GCC and Clang.
The result would be for Clang but with GCC, it's the same.
Basically, besides the variant
and we have to use Clang
because we have codes in Clang,
we don't have them anywhere else for now.
So lines of code.
SML is really good here because it's like optimized
for being expressive.
Design pattern, state pattern is not the best one.
We have to write a lot of code as usual
with the object oriented design, it's a boilerplate.
MSM, although, it's really expressive,
it has a lot of boilerplate code to write
with those markers.
But besides that, it's quite even.
So, that's what's we compared before in a Godbolt.
We see that there's like
naive switch and if else and SML
since it's using switch or if underneath
in those cases, a very well optimized an inlined,
If you care about performance, don't use a guy.
It's not the best one.
Coroutines and variant.
Yeah, they're not terrible.
Not ideal either.
It depends obviously on your case.
If you care about the performance,
probably, you want to use them
but if you're not that bothered, then you can.
So that's the main kind of benchmark
which shows the run-time performance.
As I said have been longer assembly
doesn't mean that it will be faster or slower.
So you can see MSM had really long assembly lines
because it generated jump table compile time
but actually it's quite fast.
Variant is quite fast as well on Clang,
not necessarily in GCC.
Statechart is slow but we didn't expect it to be fast
since it generated so much code.
SML, switch, if else, they're very well optimized
because they are really easy to follow
for the compiler to optimize.
State pattern, well, if you care about performance,
you don't really want to use virtual dispatch, right?
Although, it might be devirtualized sometimes,
most likely, it won't, you will have the heap
and shared pointers and all that badness has to deal with.
So, you don't wanna do that
if you care about the performance.
Coroutines, coroutines are quite decent.
Probably, they might be better
and it's just first stages of them being introduced
so maybe at some point, we can actually improve them.
Instructions per cycle.
Well, the more instructions we can handle per cycle,
It just shows how good we actually utilize the CPU.
If you have less instructions, we may not use,
we might have not as good
results here but it shows that
all the solutions are more or less the same.
Coroutines, a bit smaller than the other ones
but having a value around two is really good to have
or keep in mind.
So, it's interesting how many branches actually MSM has
because of the jump table.
Besides that, inline versions have more branches as well
in comparison to all the others
but it doesn't mean that will be slower
because it's all about how many misses you got.
And as you see
with the state pattern and we throw dispatch,
well, we don't really hit it very well.
Compilers are not really,
CPUs are not really good in predicting where we are going.
In other cases, it's much better.
Statechart, since it's using the virtual dispatch as well,
it does have quite a few misses but
it still not as bad as the state pattern.
Compilation time, that's what I was referring to earlier.
MSM compiles quite slowly
in comparison to any other solution.
So that's not good.
If you really have to be productive,
you can be productive with MSM
by having declarative way of using it
but you pay for the compilation times a lot.
Really, a lot.
SML compiles quite fast.
And the release, it's pretty much the same
as MSM is very slow to compile.
And the size, because we care about the size as well.
Size is really important for us to,
if you care about performance,
you probably care about the size.
We have some restrictions on that.
MSM produces quite a huge binaries in the debug mode
because of the debug symbols.
As I told you before, it's all MPL based
so we produce a lot of code
in comparison to other solutions.
But in executable, in the release mode,
they're really small in comparison in MSM,
in comparison to the debug mode.
So that's really good
which means that everything is just thrown away.
So, yeah, we have like 10 seconds left
so the mission is to embrace
zero-cost state machine libraries
and with that, we have actually zero seconds for questions
so thank you.