Commit Graph

6 Commits

Author SHA1 Message Date
Neels Hofmeyr b1bbe24c09 fsm: refuse state chg and events after term
Refuse state changes and event dispatch for FSM instances that are already
terminating.

It is assumed that refusing state changes and events after FSM termination is
seen as the sane expected behavior, hence this change in behavior is merged
without being configurable.

There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a
changed expected output, since it is explicitly creating complex FSM structures
that terminate. Currently no other C test in Osmocom code needs adjusting.

Rationale:

Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc),
a terminating FSM instance often causes events to be dispatched back to itself,
or causes state changes in FSM instances that are already terminating. That is
hard to avoid, since each FSM instance could be a cause of failure, and wants
to notify all the others of that, which in turn often choose to terminate.

Another use case: any function that dispatches events or state changes to more
than one FSM instance must be sure that after the first event dispatch, the
second FSM instance is in fact still allocated. Furthermore, if the second FSM
instance *has* terminated from the first dispatch, this often means that no
more actions should be taken. That could be done by an explicit check for
fsm->proc.terminating, but a more general solution is to do this check
internally in fsm.c.

In practice, I need this to avoid a crash in libosmo-mgcp-client, when an
on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The
earlier dealloc-in-main-loop patch fixed part of it, but not all.

Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-29 17:28:30 +01:00
Neels Hofmeyr 988f6d72c5 add osmo_fsm_set_dealloc_ctx(), to help with use-after-free
This is a simpler and more general solution to the problem so far solved by
osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary
functions, not only FSM instances during termination.

The aim is to defer talloc_free() until back in the main loop.

Rationale: I discovered an osmo-msc use-after-free crash from an invalid
message, caused by this pattern:

void event_action()
{
       osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL);
       osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL);
}

Usually, FOO_EVENT takes successful action, and afterwards we also notify bar.
However, in this particular case, FOO_EVENT caused failure, and the immediate
error handling directly terminated and deallocated bar. In such a case,
dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector
just from sending messages that cause *any* failure during the first event
dispatch.

Instead, when this is enabled, we do not deallocate 'foo' until event_action()
has returned back to the main loop.

Test: duplicate fsm_dealloc_test.c using this, and print the number of items
deallocated in each test loop, to ensure the feature works. We also verify that
the deallocation safety works simply by fsm_dealloc_test.c not crashing.

We should probably follow up by refusing event dispatch and state transitions
for FSM instances that are terminating or already terminated:
see I0adc13a1a998e953b6c850efa2761350dd07e03a.

Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-29 16:46:04 +01:00
Neels Hofmeyr d28aa0c2f1 fsm_dealloc_test: no need for ST_DESTROYING
A separate ST_DESTROYING state originally helped with certain deallocation
scenarios. But now that fsm.c avoids re-entering osmo_fsm_inst_term() twice and
gracefully handles FSM instance deallocations for termination cascades, it is
actually just as safe without a separate ST_DESTROYING state. ST_DESTROYING was
used to flag deallocation and prevent entering osmo_fsm_inst_term() twice,
which works only in a very limited range of scenarios.

Remove ST_DESTROYING from fsm_dealloc_test.c to show that all tested scenarios
still clean up gracefully.

Change-Id: I05354e6cad9b82ba474fa50ffd41d481b3c697b4
2019-04-11 05:36:36 +00:00
Neels Hofmeyr 1f9cc01861 fsm: support graceful osmo_fsm_inst_term() cascades
Add global flag osmo_fsm_term_safely() -- if set to true, enable the following
behavior:

Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term():
- collect deallocations until the outermost osmo_fsm_inst_term() is done.
- call osmo_fsm_inst_free() *after* dispatching the parent event.

If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already
within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent
it to a separate talloc context, to be deallocated with the outermost FSM inst.

The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term()
cascade will stay allocated until all osmo_fsm_inst_term() are complete and all
of them will be deallocated at the same time.

Mark the deferred deallocation state as __thread in an attempt to make cascaded
deallocation handling threadsafe.  Keep the enable/disable flag separate, so
that it is global and not per-thread.

The feature is showcased by fsm_dealloc_test.c: with this feature, all of those
wild deallocation scenarios succeed.

Make fsm_dealloc_test a normal regression test in testsuite.at.

Rationale:

It is difficult to gracefully handle deallocations of groups of FSM instances
that reference each other. As soon as one child dispatching a cleanup event
causes its parent to deallocate before fsm.c was ready for it, deallocation
will hit a use-after-free. Before this patch, by using parent_term events and
distinct "terminating" FSM states, parent/child FSMs can be taught to wait for
all children to deallocate before deallocating the parent. But as soon as a
non-child / non-parent FSM instance is involved, or actually any other
cleanup() action that triggers parent FSMs or parent talloc contexts to become
unused, it is near impossible to think of all possible deallocation events
ricocheting, and to avoid running into freeing FSM instances that were still in
the middle of osmo_fsm_inst_term(), or FSM instances to enter
osmo_fsm_inst_term() more than once. This patch makes deallocation of "all
possible" setups of complex cross referencing FSM instances easy to handle
correctly, without running into use-after-free or double free situations, and,
notably, without changing calling code.

Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-04-11 05:36:36 +00:00
Neels Hofmeyr 3b414a4adc fsm: add flag to ensure osmo_fsm_inst_term() happens only once
To prevent re-entering osmo_fsm_inst_term() twice for the same osmo_fsm_inst,
add flag osmo_fsm_inst.proc.terminating. osmo_fsm_inst_term() sets this to
true, or exits if it already is true.

Update fsm_dealloc_test.err for illustration. It is not relevant for unit
testing yet, just showing the difference.

Change-Id: I0c02d76a86f90c49e0eae2f85db64704c96a7674
2019-04-11 05:36:36 +00:00
Neels Hofmeyr 223d66a414 add fsm_dealloc_test.c
Despite efforts to properly handle "GONE" events and entering a ST_DESTROYING
only once, so far this test runs straight into a heap use-after-free. With
current fsm.c, it is hard to resolve the situation with the objects named
"other" also causing deallocations besides the FSM instance parent/child
relations.

For illustration, add an "expected" test output file fsm_dealloc_test.err,
making this pass will follow in a subsequent patch.

Change-Id: If801907c541bca9f524c9e5fd22ac280ca16979a
2019-04-11 05:36:36 +00:00