libosmocore/tests/fsm/fsm_dealloc_test.err

4957 lines
304 KiB
Plaintext
Raw Permalink Normal View History

add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
test_osmo_fsm_term_safely()
DLGLOBAL DEBUG scene_alloc()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at root
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(root){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Deallocated, including all deferred deallocations
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG --- all deallocated.
DLGLOBAL DEBUG scene_alloc()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (root.alive(),__twig1b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 1 (root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (root.alive(),__twig1a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 1 (root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 2 (root.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (root.alive(),_branch1.cleanup(),other.alive(),other.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 3 (root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 4 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 5 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 6 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 5 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 6 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 5 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 6 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 5 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 6 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 5 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG 4 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 3 (root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG 2 (root.alive(),_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 1 (root.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG 2 (root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(root){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (root.alive())
DLGLOBAL DEBUG test(root){alive}: Deallocated, including all deferred deallocations
DLGLOBAL DEBUG 0 (-)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG --- all deallocated.
DLGLOBAL DEBUG scene_alloc()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch0))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 1 (__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch0))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 1 (__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (__twig0a.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),other.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (__twig0a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 1 (_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
DLGLOBAL DEBUG test(_branch0){alive}: removing reference _branch0.other[0] -> other
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch0.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch0.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[0] = _branch0
DLGLOBAL DEBUG 2 (_branch0.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch0))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch0.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[1] -> _branch1
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),_branch1.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: EV_OTHER_GONE: Dropped reference _branch1.other[0] = other
DLGLOBAL DEBUG 4 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch0))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch0))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 5 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch0))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 5 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 6 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 7 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 6 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch0))
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 7 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 6 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG 5 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG 3 (_branch0.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (_branch0.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(_branch0)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch0.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated, including all deferred deallocations
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG --- all deallocated.
DLGLOBAL DEBUG scene_alloc()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch0))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 2 (_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 1 (_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch0))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 2 (_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (_branch0.alive(),__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (_branch0.alive(),__twig0a.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG 3 (_branch0.alive(),__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG 2 (_branch0.alive(),__twig0a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 1 (_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 2 (_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
DLGLOBAL DEBUG test(_branch0){alive}: removing reference _branch0.other[0] -> other
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (_branch0.alive(),_branch0.cleanup(),other.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[0] = _branch0
DLGLOBAL DEBUG 3 (_branch0.alive(),_branch0.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch0))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 4 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[1] -> _branch1
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 5 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 6 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),_branch1.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: EV_OTHER_GONE: Dropped reference _branch1.other[0] = other
DLGLOBAL DEBUG 5 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch0))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch0))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 6 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch0))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 6 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 7 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 8 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.othe
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 7 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch0))
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 8 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.clea
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 7 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG 6 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 6 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG 4 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 3 (_branch0.alive(),_branch0.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(_branch0)
DLGLOBAL DEBUG 2 (_branch0.alive(),_branch0.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 1 (_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated, including all deferred deallocations
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG --- all deallocated.
DLGLOBAL DEBUG scene_alloc()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 1 (__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (__twig0a.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),other.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (__twig0a.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 3 (__twig0a.cleanup(),_branch0.alive(),_branch0.child_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE: Dropped reference _branch0.child[0] = __twig0a
DLGLOBAL DEBUG test(_branch0){alive}: still exists: child[1]
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),_branch0.alive())
DLGLOBAL DEBUG 1 (__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 1 (_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(root){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Deallocated, including all deferred deallocations
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (__twig0a.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 2 (__twig0a.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (__twig0a.alive(),__twig0a.cleanup(),other.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (__twig0a.alive(),__twig0a.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG 3 (__twig0a.alive(),__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG 2 (__twig0a.alive(),__twig0a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 3 (__twig0a.alive(),__twig0a.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 4 (__twig0a.alive(),__twig0a.cleanup(),_branch0.alive(),_branch0.child_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE: Dropped reference _branch0.child[0] = __twig0a
DLGLOBAL DEBUG test(_branch0){alive}: still exists: child[1]
DLGLOBAL DEBUG 3 (__twig0a.alive(),__twig0a.cleanup(),_branch0.alive())
DLGLOBAL DEBUG 2 (__twig0a.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 1 (__twig0a.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig0a.alive(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 1 (__twig0a.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(root){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Deallocated, including all deferred deallocations
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at __twig0b
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 1 (__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig0b.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 3 (__twig0b.cleanup(),_branch0.alive(),_branch0.child_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE: Dropped reference _branch0.child[1] = __twig0b
DLGLOBAL DEBUG test(_branch0){alive}: still exists: child[0]
DLGLOBAL DEBUG 2 (__twig0b.cleanup(),_branch0.alive())
DLGLOBAL DEBUG 1 (__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 1 (_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(root){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Deallocated, including all deferred deallocations
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at __twig0b
DLGLOBAL DEBUG test(__twig0b){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (__twig0b.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 2 (__twig0b.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 3 (__twig0b.alive(),__twig0b.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 4 (__twig0b.alive(),__twig0b.cleanup(),_branch0.alive(),_branch0.child_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE: Dropped reference _branch0.child[1] = __twig0b
DLGLOBAL DEBUG test(_branch0){alive}: still exists: child[0]
DLGLOBAL DEBUG 3 (__twig0b.alive(),__twig0b.cleanup(),_branch0.alive())
DLGLOBAL DEBUG 2 (__twig0b.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 1 (__twig0b.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig0b.alive(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 1 (__twig0b.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(root){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Deallocated, including all deferred deallocations
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at _branch1
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch1))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch1))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),root.other_gone())
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch1))
DLGLOBAL DEBUG test(root){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Deferring: will deallocate with test(_branch1)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch1))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch1))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch1))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch1))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(_branch1)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated, including all deferred deallocations
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG --- all deallocated.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at _branch1
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch1))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (_branch1.alive(),__twig1b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 1 (_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch1))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (_branch1.alive(),__twig1a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (_branch1.alive(),__twig1a.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (_branch1.alive(),__twig1a.cleanup(),root.alive(),root.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 3 (_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch1))
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 4 (_branch1.alive(),__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 3 (_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG 2 (_branch1.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 1 (_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 2 (_branch1.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (_branch1.alive(),_branch1.cleanup(),other.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 3 (_branch1.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch1))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 4 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 5 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 6 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 5 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(_branch1))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch1))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 6 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_PARENT, caused by: test(_branch1))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 6 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 6 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG 4 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 3 (_branch1.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(_branch1)
DLGLOBAL DEBUG 2 (_branch1.alive(),_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated, including all deferred deallocations
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG --- all deallocated.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),root.other_gone())
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(__twig1a))
DLGLOBAL DEBUG test(root){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(__twig1a))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_PARENT, caused by: test(__twig1a))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),__twig1b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),_branch1.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 4 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(__twig1a))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 5 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 6 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 7 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 6 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(__twig1a))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 6 (cause = OSMO_FSM_TERM_PARENT, caused by: test(__twig1a))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 7 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 6 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 6 (cause = OSMO_FSM_TERM_PARENT, caused by: test(__twig1a))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 7 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 6 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 7 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 6 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG 5 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 4 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Deferring: will deallocate with test(__twig1a)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated, including all deferred deallocations
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG --- all deallocated.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (__twig1a.alive())
DLGLOBAL DEBUG test(__twig1a){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (__twig1a.alive(),__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (__twig1a.alive(),__twig1a.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),root.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 3 (__twig1a.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(__twig1a))
DLGLOBAL DEBUG test(root){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(__twig1a))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_PARENT, caused by: test(__twig1a))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 4 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),__twig1b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 3 (__twig1a.alive(),__twig1a.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 4 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 5 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 6 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 5 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(__twig1a))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 6 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 7 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 8 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 7 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(__twig1a))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 6 (cause = OSMO_FSM_TERM_PARENT, caused by: test(__twig1a))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 8 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 7 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 6 (cause = OSMO_FSM_TERM_PARENT, caused by: test(__twig1a))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 8 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 7 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 8 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 7 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG 6 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 5 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG 4 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 3 (__twig1a.alive(),__twig1a.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG 4 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 3 (__twig1a.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Deferring: will deallocate with test(__twig1a)
DLGLOBAL DEBUG 2 (__twig1a.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 1 (__twig1a.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated, including all deferred deallocations
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG --- all deallocated.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at __twig1b
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig1b.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 3 (__twig1b.cleanup(),_branch1.alive(),_branch1.child_gone())
DLGLOBAL DEBUG test(_branch1){alive}: EV_CHILD_GONE: Dropped reference _branch1.child[1] = __twig1b
DLGLOBAL DEBUG test(_branch1){alive}: still exists: child[0]
DLGLOBAL DEBUG 2 (__twig1b.cleanup(),_branch1.alive())
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 1 (_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch1){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(root){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Deallocated, including all deferred deallocations
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at __twig1b
DLGLOBAL DEBUG test(__twig1b){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (__twig1b.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (__twig1b.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 3 (__twig1b.alive(),__twig1b.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 4 (__twig1b.alive(),__twig1b.cleanup(),_branch1.alive(),_branch1.child_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: EV_CHILD_GONE: Dropped reference _branch1.child[1] = __twig1b
DLGLOBAL DEBUG test(_branch1){alive}: still exists: child[0]
DLGLOBAL DEBUG 3 (__twig1b.alive(),__twig1b.cleanup(),_branch1.alive())
DLGLOBAL DEBUG 2 (__twig1b.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 1 (__twig1b.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig1b.alive(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch1){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 1 (__twig1b.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(root))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 5 (cause = OSMO_FSM_TERM_PARENT, caused by: test(root))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(root)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(root)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(root){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Deallocated, including all deferred deallocations
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at other
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 1 (other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(other))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(other))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(other))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),__twig0a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 4 (other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 5 (other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive(),root.child_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: EV_CHILD_GONE: Dropped reference root.child[0] = _branch0
DLGLOBAL DEBUG test(root){alive}: still exists: child[1]
DLGLOBAL DEBUG 4 (other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive())
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(root){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(other)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (other.cleanup())
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[1] -> _branch1
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch1.alive(),_branch1.other_gone())
DLGLOBAL DEBUG test(_branch1){alive}: EV_OTHER_GONE: Dropped reference _branch1.other[0] = other
DLGLOBAL DEBUG 2 (other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(other))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(other))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch1.alive(),__twig1b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 2 (other.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(other))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 4 (other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(other))
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 5 (other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 4 (other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 2 (other.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch1.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 2 (other.cleanup(),_branch1.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(other)
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(other){alive}: Deallocated, including all deferred deallocations
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG --- all deallocated.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 2 (other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(other))
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(other))
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(other))
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 5 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 6 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive(),root.child_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: EV_CHILD_GONE: Dropped reference root.child[0] = _branch0
DLGLOBAL DEBUG test(root){alive}: still exists: child[1]
DLGLOBAL DEBUG 5 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive())
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(root){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG 2 (other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[1] -> _branch1
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch1.alive(),_branch1.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: EV_OTHER_GONE: Dropped reference _branch1.other[0] = other
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Terminating in cascade, depth 2 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(other))
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(other))
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch1.alive(),__twig1b.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating in cascade, depth 3 (cause = OSMO_FSM_TERM_PARENT, caused by: test(other))
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 5 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 6 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.other_gone())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 5 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating in cascade, depth 4 (cause = OSMO_FSM_TERM_REGULAR, caused by: test(other))
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 6 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 5 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch1.alive())
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch1.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch1.alive())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
DLGLOBAL DEBUG test(_branch1){alive}: Deferring: will deallocate with test(other)
DLGLOBAL DEBUG 2 (other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 1 (other.alive())
DLGLOBAL DEBUG test(other){alive}: Deallocated, including all deferred deallocations
fsm: support graceful osmo_fsm_inst_term() cascades Add global flag osmo_fsm_term_safely() -- if set to true, enable the following behavior: Detect osmo_fsm_inst_term() occuring within osmo_fsm_inst_term(): - collect deallocations until the outermost osmo_fsm_inst_term() is done. - call osmo_fsm_inst_free() *after* dispatching the parent event. If a struct osmo_fsm_inst enters osmo_fsm_inst_term() while another is already within osmo_fsm_inst_term(), do not directly deallocate it, but talloc-reparent it to a separate talloc context, to be deallocated with the outermost FSM inst. The effect is that all osmo_fsm_inst freed within an osmo_fsm_inst_term() cascade will stay allocated until all osmo_fsm_inst_term() are complete and all of them will be deallocated at the same time. Mark the deferred deallocation state as __thread in an attempt to make cascaded deallocation handling threadsafe. Keep the enable/disable flag separate, so that it is global and not per-thread. The feature is showcased by fsm_dealloc_test.c: with this feature, all of those wild deallocation scenarios succeed. Make fsm_dealloc_test a normal regression test in testsuite.at. Rationale: It is difficult to gracefully handle deallocations of groups of FSM instances that reference each other. As soon as one child dispatching a cleanup event causes its parent to deallocate before fsm.c was ready for it, deallocation will hit a use-after-free. Before this patch, by using parent_term events and distinct "terminating" FSM states, parent/child FSMs can be taught to wait for all children to deallocate before deallocating the parent. But as soon as a non-child / non-parent FSM instance is involved, or actually any other cleanup() action that triggers parent FSMs or parent talloc contexts to become unused, it is near impossible to think of all possible deallocation events ricocheting, and to avoid running into freeing FSM instances that were still in the middle of osmo_fsm_inst_term(), or FSM instances to enter osmo_fsm_inst_term() more than once. This patch makes deallocation of "all possible" setups of complex cross referencing FSM instances easy to handle correctly, without running into use-after-free or double free situations, and, notably, without changing calling code. Change-Id: I8eda67540a1cd444491beb7856b9fcd0a3143b18
2019-03-24 04:56:21 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG --- all deallocated.
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
test_osmo_fsm_term_safely() done
test_osmo_fsm_set_dealloc_ctx()
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at root
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG --- all deallocated.
*** loop_ctx contains 33 blocks, deallocating.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (root.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 1 (root.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (root.alive(),__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 1 (root.alive())
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 2 (root.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (root.alive(),_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 3 (root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 4 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 5 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 6 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 5 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 6 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 5 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 6 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 5 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 6 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 5 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 4 (root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 3 (root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 2 (root.alive(),_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 1 (root.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
DLGLOBAL DEBUG 2 (root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 1 (root.alive())
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG --- all deallocated.
*** loop_ctx contains 33 blocks, deallocating.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 1 (__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 1 (__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (__twig0a.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG 1 (__twig0a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 1 (_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
DLGLOBAL DEBUG test(_branch0){alive}: removing reference _branch0.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch0.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch0.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[0] = _branch0
DLGLOBAL DEBUG 2 (_branch0.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch0.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[1] -> _branch1
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),_branch1.other_gone())
DLGLOBAL DEBUG test(_branch1){alive}: EV_OTHER_GONE: Dropped reference _branch1.other[0] = other
DLGLOBAL DEBUG 4 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 5 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 5 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 6 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 7 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.other_gone())
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 6 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 7 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 6 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG 5 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 3 (_branch0.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 2 (_branch0.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 1 (_branch0.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG --- all deallocated.
*** loop_ctx contains 33 blocks, deallocating.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 2 (_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 1 (_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 2 (_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (_branch0.alive(),__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (_branch0.alive(),__twig0a.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG 3 (_branch0.alive(),__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG 2 (_branch0.alive(),__twig0a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 1 (_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 2 (_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
DLGLOBAL DEBUG test(_branch0){alive}: removing reference _branch0.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (_branch0.alive(),_branch0.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[0] = _branch0
DLGLOBAL DEBUG 3 (_branch0.alive(),_branch0.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 4 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[1] -> _branch1
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 5 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 6 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),_branch1.other_gone())
DLGLOBAL DEBUG test(_branch1){alive}: EV_OTHER_GONE: Dropped reference _branch1.other[0] = other
DLGLOBAL DEBUG 5 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 6 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 6 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 7 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 8 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.othe
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 7 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 8 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.clea
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 7 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG 6 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 6 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 4 (_branch0.alive(),_branch0.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 3 (_branch0.alive(),_branch0.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 2 (_branch0.alive(),_branch0.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 1 (_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG --- all deallocated.
*** loop_ctx contains 33 blocks, deallocating.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 1 (__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (__twig0a.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG 1 (__twig0a.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 3 (__twig0a.cleanup(),_branch0.alive(),_branch0.child_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE: Dropped reference _branch0.child[0] = __twig0a
DLGLOBAL DEBUG test(_branch0){alive}: still exists: child[1]
DLGLOBAL DEBUG 2 (__twig0a.cleanup(),_branch0.alive())
DLGLOBAL DEBUG 1 (__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 1 (_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
*** loop_ctx contains 5 blocks, deallocating.
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (__twig0a.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 2 (__twig0a.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (__twig0a.alive(),__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (__twig0a.alive(),__twig0a.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG 3 (__twig0a.alive(),__twig0a.cleanup(),other.alive())
DLGLOBAL DEBUG 2 (__twig0a.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 3 (__twig0a.alive(),__twig0a.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 4 (__twig0a.alive(),__twig0a.cleanup(),_branch0.alive(),_branch0.child_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE: Dropped reference _branch0.child[0] = __twig0a
DLGLOBAL DEBUG test(_branch0){alive}: still exists: child[1]
DLGLOBAL DEBUG 3 (__twig0a.alive(),__twig0a.cleanup(),_branch0.alive())
DLGLOBAL DEBUG 2 (__twig0a.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 1 (__twig0a.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig0a.alive(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 1 (__twig0a.alive())
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
*** loop_ctx contains 5 blocks, deallocating.
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at __twig0b
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 1 (__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig0b.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 3 (__twig0b.cleanup(),_branch0.alive(),_branch0.child_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE: Dropped reference _branch0.child[1] = __twig0b
DLGLOBAL DEBUG test(_branch0){alive}: still exists: child[0]
DLGLOBAL DEBUG 2 (__twig0b.cleanup(),_branch0.alive())
DLGLOBAL DEBUG 1 (__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 1 (_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
*** loop_ctx contains 5 blocks, deallocating.
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at __twig0b
DLGLOBAL DEBUG test(__twig0b){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (__twig0b.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 2 (__twig0b.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 3 (__twig0b.alive(),__twig0b.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 4 (__twig0b.alive(),__twig0b.cleanup(),_branch0.alive(),_branch0.child_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE: Dropped reference _branch0.child[1] = __twig0b
DLGLOBAL DEBUG test(_branch0){alive}: still exists: child[0]
DLGLOBAL DEBUG 3 (__twig0b.alive(),__twig0b.cleanup(),_branch0.alive())
DLGLOBAL DEBUG 2 (__twig0b.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 1 (__twig0b.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig0b.alive(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch0){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 1 (__twig0b.alive())
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
*** loop_ctx contains 5 blocks, deallocating.
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at _branch1
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),root.other_gone())
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG --- all deallocated.
*** loop_ctx contains 33 blocks, deallocating.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at _branch1
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (_branch1.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 1 (_branch1.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (_branch1.alive(),__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (_branch1.alive(),__twig1a.cleanup(),root.alive(),root.other_gone())
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 3 (_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 4 (_branch1.alive(),__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 3 (_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG 2 (_branch1.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 1 (_branch1.alive())
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 2 (_branch1.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (_branch1.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 3 (_branch1.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 4 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 5 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 6 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 5 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 6 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 6 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 6 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 5 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 4 (_branch1.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 3 (_branch1.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 2 (_branch1.alive(),_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 1 (_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG --- all deallocated.
*** loop_ctx contains 33 blocks, deallocating.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),root.other_gone())
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 4 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 5 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 6 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 7 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 6 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 7 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 6 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 7 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 6 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 7 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 6 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 5 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 4 (__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
DLGLOBAL DEBUG 3 (__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 2 (__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG --- all deallocated.
*** loop_ctx contains 33 blocks, deallocating.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (__twig1a.alive())
DLGLOBAL DEBUG test(__twig1a){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (__twig1a.alive(),__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (__twig1a.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),root.other_gone())
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 3 (__twig1a.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 4 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 3 (__twig1a.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 4 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 5 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 6 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 5 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 6 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 7 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 8 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 7 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 8 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 7 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 8 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 7 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 8 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 7 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 6 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 5 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 4 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 3 (__twig1a.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
DLGLOBAL DEBUG 4 (__twig1a.alive(),__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 3 (__twig1a.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG 2 (__twig1a.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 1 (__twig1a.alive())
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG --- all deallocated.
*** loop_ctx contains 33 blocks, deallocating.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at __twig1b
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig1b.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 3 (__twig1b.cleanup(),_branch1.alive(),_branch1.child_gone())
DLGLOBAL DEBUG test(_branch1){alive}: EV_CHILD_GONE: Dropped reference _branch1.child[1] = __twig1b
DLGLOBAL DEBUG test(_branch1){alive}: still exists: child[0]
DLGLOBAL DEBUG 2 (__twig1b.cleanup(),_branch1.alive())
DLGLOBAL DEBUG 1 (__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 1 (_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch1){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
*** loop_ctx contains 5 blocks, deallocating.
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at __twig1b
DLGLOBAL DEBUG test(__twig1b){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (__twig1b.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 2 (__twig1b.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 3 (__twig1b.alive(),__twig1b.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 4 (__twig1b.alive(),__twig1b.cleanup(),_branch1.alive(),_branch1.child_gone())
DLGLOBAL DEBUG test(_branch1){alive}: EV_CHILD_GONE: Dropped reference _branch1.child[1] = __twig1b
DLGLOBAL DEBUG test(_branch1){alive}: still exists: child[0]
DLGLOBAL DEBUG 3 (__twig1b.alive(),__twig1b.cleanup(),_branch1.alive())
DLGLOBAL DEBUG 2 (__twig1b.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 1 (__twig1b.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 2 (__twig1b.alive(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(_branch1){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 1 (__twig1b.alive())
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG other
DLGLOBAL DEBUG --- 7 objects remain. cleaning up
*** loop_ctx contains 5 blocks, deallocating.
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_ERROR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 1 (__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 1 (_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
DLGLOBAL DEBUG test(_branch1){alive}: removing reference _branch1.other[0] -> other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.other_gone())
DLGLOBAL DEBUG test(other){alive}: EV_OTHER_GONE: Dropped reference other.other[1] = _branch1
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 5 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 4 (_branch1.cleanup(),other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 3 (_branch1.cleanup(),other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 2 (_branch1.cleanup(),other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 1 (_branch1.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
DLGLOBAL DEBUG 1 (root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: removing reference root.other[0] -> __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before term cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- term at other
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 1 (other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
DLGLOBAL DEBUG test(root){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 4 (other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 5 (other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive(),root.child_gone())
DLGLOBAL DEBUG test(root){alive}: EV_CHILD_GONE: Dropped reference root.child[0] = _branch0
DLGLOBAL DEBUG test(root){alive}: still exists: child[1]
DLGLOBAL DEBUG 4 (other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive())
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
DLGLOBAL DEBUG test(root){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 3 (other.cleanup(),_branch0.alive(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(root){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 2 (other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG 1 (other.cleanup())
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[1] -> _branch1
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 2 (other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch1.alive(),_branch1.other_gone())
DLGLOBAL DEBUG test(_branch1){alive}: EV_OTHER_GONE: Dropped reference _branch1.other[0] = other
DLGLOBAL DEBUG 2 (other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch1.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 2 (other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch1.alive(),__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 4 (other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 5 (other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.other_gone())
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 4 (other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 5 (other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 4 (other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG 3 (other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 2 (other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 3 (other.cleanup(),_branch1.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 2 (other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 1 (other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG --- after term cascade:
DLGLOBAL DEBUG --- all deallocated.
*** loop_ctx contains 33 blocks, deallocating.
DLGLOBAL DEBUG scene_alloc()
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(_branch0){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: is child of test(_branch0)
DLGLOBAL DEBUG test(root){alive}: Allocated
DLGLOBAL DEBUG test(root){alive}: is child of test(root)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(_branch1){alive}: Allocated
DLGLOBAL DEBUG test(_branch1){alive}: is child of test(_branch1)
DLGLOBAL DEBUG test(other){alive}: Allocated
DLGLOBAL DEBUG test(_branch0){alive}: _branch0.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[0] = _branch0
DLGLOBAL DEBUG test(__twig0a){alive}: __twig0a.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = __twig0a
DLGLOBAL DEBUG test(_branch1){alive}: _branch1.other[0] = other
DLGLOBAL DEBUG test(other){alive}: other.other[1] = _branch1
DLGLOBAL DEBUG test(__twig1a){alive}: __twig1a.other[0] = root
DLGLOBAL DEBUG test(root){alive}: root.other[0] = __twig1a
DLGLOBAL DEBUG ------ before destroy-event cascade, got:
DLGLOBAL DEBUG root
DLGLOBAL DEBUG _branch0
DLGLOBAL DEBUG __twig0a
DLGLOBAL DEBUG __twig0b
DLGLOBAL DEBUG _branch1
DLGLOBAL DEBUG __twig1a
DLGLOBAL DEBUG __twig1b
DLGLOBAL DEBUG other
DLGLOBAL DEBUG ---
DLGLOBAL DEBUG --- destroy-event at other
DLGLOBAL DEBUG test(other){alive}: Received Event EV_DESTROY
DLGLOBAL DEBUG 1 (other.alive())
DLGLOBAL DEBUG test(other){alive}: alive(EV_DESTROY)
DLGLOBAL DEBUG test(other){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(other){alive}: pre_term()
DLGLOBAL DEBUG 2 (other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup()
DLGLOBAL DEBUG test(other){alive}: scene forgets other
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[0] -> _branch0
DLGLOBAL DEBUG test(_branch0){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.other_gone())
DLGLOBAL DEBUG test(_branch0){alive}: EV_OTHER_GONE: Dropped reference _branch0.other[0] = other
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch0){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0b){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),__twig0b.cleanup())
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0b){alive}: scene forgets __twig0b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0b){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig0a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig0a){alive}: Removing from parent test(_branch0)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),__twig0a.cleanup())
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig0a){alive}: scene forgets __twig0a
DLGLOBAL DEBUG test(__twig0a){alive}: removing reference __twig0a.other[0] -> other
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(other){alive}: FSM instance already terminating, not dispatching event EV_OTHER_GONE
DLGLOBAL DEBUG test(_branch0){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig0a){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(__twig0a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig0a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch0){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup()
DLGLOBAL DEBUG test(_branch0){alive}: scene forgets _branch0
DLGLOBAL DEBUG test(root){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 5 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG 6 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive(),root.child_gone())
DLGLOBAL DEBUG test(root){alive}: EV_CHILD_GONE: Dropped reference root.child[0] = _branch0
DLGLOBAL DEBUG test(root){alive}: still exists: child[1]
DLGLOBAL DEBUG 5 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup(),root.alive())
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),_branch0.cleanup())
DLGLOBAL DEBUG test(_branch0){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG test(_branch0){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch0){alive}: Deallocated
DLGLOBAL DEBUG test(root){alive}: Received Event EV_CHILD_GONE
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch0.alive(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_CHILD_GONE)
DLGLOBAL DEBUG test(root){alive}: EV_CHILD_GONE with NULL data, must be a parent_term event. Ignore.
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch0.alive())
DLGLOBAL DEBUG 2 (other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: removing reference other.other[1] -> _branch1
DLGLOBAL DEBUG test(_branch1){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch1.alive(),_branch1.other_gone())
DLGLOBAL DEBUG test(_branch1){alive}: EV_OTHER_GONE: Dropped reference _branch1.other[0] = other
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(_branch1){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1b){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1b){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch1.alive(),__twig1b.cleanup())
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1b){alive}: scene forgets __twig1b
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1b){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(__twig1b){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1b){alive}: Deallocated
DLGLOBAL DEBUG test(__twig1a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT)
DLGLOBAL DEBUG test(__twig1a){alive}: pre_term()
DLGLOBAL DEBUG test(__twig1a){alive}: Removing from parent test(_branch1)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup()
DLGLOBAL DEBUG test(__twig1a){alive}: scene forgets __twig1a
DLGLOBAL DEBUG test(__twig1a){alive}: removing reference __twig1a.other[0] -> root
DLGLOBAL DEBUG test(root){alive}: Received Event EV_OTHER_GONE
DLGLOBAL DEBUG 5 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: alive(EV_OTHER_GONE)
DLGLOBAL DEBUG 6 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.other_gone())
DLGLOBAL DEBUG test(root){alive}: EV_OTHER_GONE: Dropped reference root.other[0] = __twig1a
DLGLOBAL DEBUG 5 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR)
DLGLOBAL DEBUG test(root){alive}: pre_term()
DLGLOBAL DEBUG test(_branch1){alive}: Ignoring trigger to terminate: already terminating
DLGLOBAL ERROR test(root){alive}: Internal error while terminating child FSMs: a child FSM is stuck
DLGLOBAL DEBUG 6 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive(),root.cleanup())
DLGLOBAL DEBUG test(root){alive}: cleanup()
DLGLOBAL DEBUG test(root){alive}: scene forgets root
DLGLOBAL DEBUG test(root){alive}: cleanup() done
DLGLOBAL DEBUG 5 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup(),root.alive())
DLGLOBAL DEBUG test(root){alive}: Freeing instance
DLGLOBAL DEBUG test(root){alive}: Deallocated
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch1.alive(),__twig1a.cleanup())
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(__twig1a){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(__twig1a){alive}: Freeing instance
DLGLOBAL DEBUG test(__twig1a){alive}: Deallocated
DLGLOBAL DEBUG test(_branch1){alive}: Removing from parent test(root)
DLGLOBAL DEBUG 4 (other.alive(),other.cleanup(),_branch1.alive(),_branch1.cleanup())
DLGLOBAL DEBUG test(_branch1){alive}: cleanup()
DLGLOBAL DEBUG test(_branch1){alive}: scene forgets _branch1
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG test(_branch1){alive}: cleanup() done
DLGLOBAL DEBUG 3 (other.alive(),other.cleanup(),_branch1.alive())
DLGLOBAL DEBUG test(_branch1){alive}: Freeing instance
DLGLOBAL DEBUG test(_branch1){alive}: Deallocated
fsm: refuse state chg and events after term Refuse state changes and event dispatch for FSM instances that are already terminating. It is assumed that refusing state changes and events after FSM termination is seen as the sane expected behavior, hence this change in behavior is merged without being configurable. There is no fallout in current Osmocom code trees. fsm_dealloc_test needs a changed expected output, since it is explicitly creating complex FSM structures that terminate. Currently no other C test in Osmocom code needs adjusting. Rationale: Where multiple FSM instances are collaborating (like in osmo-bsc or osmo-msc), a terminating FSM instance often causes events to be dispatched back to itself, or causes state changes in FSM instances that are already terminating. That is hard to avoid, since each FSM instance could be a cause of failure, and wants to notify all the others of that, which in turn often choose to terminate. Another use case: any function that dispatches events or state changes to more than one FSM instance must be sure that after the first event dispatch, the second FSM instance is in fact still allocated. Furthermore, if the second FSM instance *has* terminated from the first dispatch, this often means that no more actions should be taken. That could be done by an explicit check for fsm->proc.terminating, but a more general solution is to do this check internally in fsm.c. In practice, I need this to avoid a crash in libosmo-mgcp-client, when an on_success() event dispatch causes the MGCP endpoint FSM to deallocate. The earlier dealloc-in-main-loop patch fixed part of it, but not all. Change-Id: Ia81a0892f710db86bd977462730b69f0dcc78f8c
2019-10-05 03:13:23 +00:00
DLGLOBAL DEBUG test(root){alive}: FSM instance already terminating, not dispatching event EV_CHILD_GONE
add osmo_fsm_set_dealloc_ctx(), to help with use-after-free This is a simpler and more general solution to the problem so far solved by osmo_fsm_term_safely(true). This extends use-after-free fixes to arbitrary functions, not only FSM instances during termination. The aim is to defer talloc_free() until back in the main loop. Rationale: I discovered an osmo-msc use-after-free crash from an invalid message, caused by this pattern: void event_action() { osmo_fsm_inst_dispatch(foo, FOO_EVENT, NULL); osmo_fsm_inst_dispatch(bar, BAR_EVENT, NULL); } Usually, FOO_EVENT takes successful action, and afterwards we also notify bar. However, in this particular case, FOO_EVENT caused failure, and the immediate error handling directly terminated and deallocated bar. In such a case, dispatching BAR_EVENT causes a use-after-free; this constituted a DoS vector just from sending messages that cause *any* failure during the first event dispatch. Instead, when this is enabled, we do not deallocate 'foo' until event_action() has returned back to the main loop. Test: duplicate fsm_dealloc_test.c using this, and print the number of items deallocated in each test loop, to ensure the feature works. We also verify that the deallocation safety works simply by fsm_dealloc_test.c not crashing. We should probably follow up by refusing event dispatch and state transitions for FSM instances that are terminating or already terminated: see I0adc13a1a998e953b6c850efa2761350dd07e03a. Change-Id: Ief4dba9ea587c9b4aea69993e965fbb20fb80e78
2019-10-04 18:37:17 +00:00
DLGLOBAL DEBUG 2 (other.alive(),other.cleanup())
DLGLOBAL DEBUG test(other){alive}: cleanup() done
DLGLOBAL DEBUG 1 (other.alive())
DLGLOBAL DEBUG test(other){alive}: Freeing instance
DLGLOBAL DEBUG test(other){alive}: Deallocated
DLGLOBAL DEBUG 0 (-)
DLGLOBAL DEBUG --- after destroy-event cascade:
DLGLOBAL DEBUG --- all deallocated.
*** loop_ctx contains 33 blocks, deallocating.
test_osmo_fsm_set_dealloc_ctx() done