Commit Graph

22 Commits

Author SHA1 Message Date
Philipp Maier c6cef24546 mgw_fsm: refactor helper function handle_rab_release()
the function handle_rab_release() is difficult to read and it is not
immediately clear what happens and why. Lets split this up and add some
comments to improve this.

Related: OS#5916
Change-Id: I3595502b98ea5febbde7f2fab3999e2533766b48
2023-03-21 09:46:33 +01:00
Philipp Maier 065a719294 mgw_fsm: use __func__ to mention function name in log line
Change-Id: I23050fc5f644340dfbd0323eef6309cff6fc4515
2023-03-21 09:46:33 +01:00
Philipp Maier cc7d2b1313 mgw_fsm: fix log line
The function name mentioned in the log line does not match the actual
function name.

Change-Id: Iefc005f10e3c8f165c5686781747460a10ada1e0
2023-03-21 09:43:36 +01:00
Philipp Maier 1c4b05d026 mgw_fsm: fix typo
Change-Id: I16ee37bbfda01b541ad7a6f5269689c4b9e92c8c
2023-03-20 16:43:42 +01:00
Neels Hofmeyr ed424d5be4 context map: introduce RUA and SCCP FSMs to fix leaks
Refactor the entire RUA <-> SCCP connection-oriented message forwarding:
- conquer confusion about hnbgw_context_map release behavior, and
- eradicate SCCP connection leaks.

Finer points:

== Context map state ==
So far, we had a single context map state and some flags to keep track
of both the RUA and the SCCP connections. It was easy to miss connection
cleanup steps, especially on the SCCP side.
Instead, the two FSMs clearly define the RUA and SCCP conn states
separately, and each side takes care of its own release needs for all
possible scenarios.
- When both RUA and SCCP are released, the context map is discarded.
- A context map can stay around to wait for proper SCCP release, even if
  the RUA side has lost the HNB connection.
- Completely drop the async "context mapper garbage collection", because
  the FSMs clarify the release and free steps, synchronously.
- We still keep a (simplified) enum for global context map state, but
  this is only used so that VTY reporting remains mostly unchanged.

== Context map cleanup confusion ==
The function context_map_hnb_released() was the general cleanup function
for a context map. Instead, add separate context_map_free().

== Free context maps separately from HNB ==
When a HNB releases, talloc_steal() the context maps out of the HNB
specific hnb_ctx, so that they are not freed along with the HNB state,
possibly leaving SCCP connections afloat.
(It is still nice to normally keep context maps as talloc children of
their respective hnb_ctx, so talloc reports show which belongs to
which.)

So far, context map handling found the global hnb_gw pointer via
map->hnb_ctx->gw. But in fact, a HNB may disappear at any point in time.
Instead, use a separate hnb_gw pointer in map->gw.

== RUA procedure codes vs. SCCP prims ==
So far, the RUA rx side composed SCCP prims to pass on:

 RUA rx ---SCCP-prim--> RANAP handling ---SCCP-prim--> SCCP tx

That is a source of confusion: a RUA procedure code should not translate
1:1 to SCCP prims, especially for RUA id-Disconnect (see release charts
below).
Instead, move SCCP prim composition over to the SCCP side, using FSM
events to forward:

 RUA rx --event--> RUA FSM --event--> SCCP FSM --SCCP-prim--> SCCP tx
         +RANAP             +RANAP              +RANAP

 RUA tx <--RUA---- RUA FSM <--event-- SCCP FSM <--event-- SCCP rx
          +RANAP             +RANAP              +RANAP

Hence choose the correct prim according to the SCCP FSM state.
- in hnbgw_rua.c, use RUA procedure codes, not prim types.
- via the new FSM events' data args, pass msgb containing RANAP PDUs.

== Fix SCCP Release behavior ==
So far, the normal conn release behavior was

 HNB                 HNBGW                   CN
  | --id-Disconnect--> | ---SCCP-Released--> |  Iu-ReleaseComplete
  |                    | <--SCCP-RLC-------- |  (no data)

Instead, the SCCP release is now in accordance with 3GPP TS 48.006 9.2
'Connection release':

 The MSC sends a SCCP released message. This message shall not contain
 any user data field.

i.e.:

 HNB                 HNBGW                    CN
  | --id-Disconnect--> | ---Data-Form-1(!)--> |  Iu-ReleaseComplete
  |                    | <--SCCP-Released---- |  (no data)
  |                    | ---SCCP-RLC--------> |  (no data)

(Side note, the final SCCP Release Confirm step is taken care of
implicitly by libosmo-sigtran's sccp_scoc.c FSM.)

If the CN fails to respond with SCCP-Released, on new X31 timeout,
osmo-hnbgw will send an SCCP Released to the CN as fallback.

== Memory model for message dispatch ==
So far, an osmo_scu_prim aka "oph" was passed between RUA and SCCP
handling code, and the final dispatch freed it. Every error path had to
take care not to leak any oph.
Instead, use a much easier and much more leakage proof memory model,
inspired by fixeria:
- on rx, dispatch RANAP msgb that live in OTC_SELECT.
- no code path needs to msgb_free() -- the msgb is discarded via
  OTC_SELECT when handling is done, error or no error.
- any code path may also choose to store the msgb for async dispatch,
  using talloc_steal(). The user plane mapping via MGW and UPF do that.
- if any code path does msgb_free(), that would be no problem either
  (but none do so now, for simplicity).

== Layer separation ==
Dispatch *all* connection-oriented RUA tx via the RUA FSM and SCCP tx
via the SCCP FSM, do not call rua_tx_dt() or osmo_sccp_user_sap_down()
directly.

== Memory model for decoded ranap_message IEs ==
Use a talloc destructor to make sure that the ranap_message IEs are
always implicitly freed upon talloc_free(), so that no code path can
possibly forget to do so.

== Implicit cleanup by talloc ==
Use talloc scoping to remove a bunch of explicit cleanup code. For
example, make a chached message a talloc child of its handler:
  talloc_steal(mgw_fsm_priv, message);
  mgw_fsm_priv->ranap_rab_ass_req_message = message;
and later implicitly free 'message' by only freeing the handler:
  talloc_free(mgw_fsm_priv)

Related: SYS#6297
Change-Id: I6ff7e36532ff57c6f2d3e7e419dd22ef27dafd19
2023-02-24 15:19:24 +01:00
Neels Hofmeyr 43cc12bac3 various comment tweaks
Change-Id: Ie40aa672948062282c566c90300f6e96963e05ec
2023-02-21 00:57:03 +01:00
Neels Hofmeyr a08b8a595a fix msgb leak for RANAP RAB Ass. Req.
Fix leaked msgb introduced by the MGW support recently added, and from
there copied to the UPF support added after that.

Fixes leaked "RANAP Tx" msgb, one per RAB Assignment that involves an
MGW or UPF proxying of user data.

Related: SYS#6297
Change-Id: Ie30e880301346ffca72f98f8c467e56d622fb03f
2023-01-17 23:40:00 +01:00
Neels Hofmeyr 28619961a9 fix segfault on MGCP timeout
bisect shows that the segfault was introduced by using the MGCP client
pool:

 e62af4d46a is the first bad commit
 Author: Pau Espin Pedrol <pespin@sysmocom.de>
    Introduce support for libosmo-mgcp-client MGW pooling
    Change-Id I371dc773b58788ee21037dc25d77f556c89c6b61

The segfault:

 20230117224550365 DLMGCP DEBUG MGCP_CONN(to-HNB)[0x612000003ca0]{ST_CRCX_RESP}: Timeout of T1 (fsm.c:317)
 [...]
 20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: Deallocated (fsm.c:568)
 20230117224550366 DMGW DEBUG mgw(mgw-fsm-14429752-0)[0x612000003820]{MGW_ST_CRCX_HNB}: Received Event MGW_EV_MGCP_TERM (mgcp_client_endpoint_fsm.c:869)
 =================================================================
 ==255699==ERROR: AddressSanitizer: heap-use-after-free on address 0x62b000000260 at pc 0x7f282a6ee143 bp 0x7fff0d9bcae0 sp 0x7fff0d9bcad8
 READ of size 8 at 0x62b000000260 thread T0
     #0 0x7f282a6ee142 in osmo_mgcpc_ep_client ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:223
     #1 0x55e2a84f1889 in mgw_fsm_allstate_action ../../../../src/osmo-hnbgw/src/osmo-hnbgw/mgw_fsm.c:504
     #2 0x7f2829d50c56 in _osmo_fsm_inst_dispatch ../../../src/libosmocore/src/fsm.c:863
     #3 0x7f2829d55a08 in _osmo_fsm_inst_term ../../../src/libosmocore/src/fsm.c:962
     #4 0x7f282a72679a in osmo_mgcpc_ep_fsm_check_state_chg_after_response ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:869
     #5 0x7f282a6f1869 in on_failure ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:414
     #6 0x7f282a727ac6 in osmo_mgcpc_ep_fsm_handle_ci_events ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:935
 [...]

When a CRCX times out, MGCP_CONN fsm terminates (libosmo-mgcp-client).
In turn the parent mgw-endp fsm terminates (libosmo-mgcp-client).
This generates an MGW_EV_MGCP_TERM event to the mgw_fsm (osmo-ttcn3-hacks).
This attempts to retrieve a pointer from mgw_fsm state:
mgw_fsm_priv->mgcpc_ep->mgcp_client
where the middle one, mgcpc_ep, is the 'mgw-endp' that already deallocated above.

To fix, add to /osmo-hnbgw/mgw_fsm.c a separate pointer to the
mgcp_client, to call mgcp_client_pool_put() on it. Do not use mgcpc_ep
to get the mgcp_client, because mgcpc_ep deallocates independently.

Related: OS#5862
Change-Id: I460d7249f4fc7edcfd94f6084fc8f933b491520c
2023-01-17 23:39:46 +01:00
Neels Hofmeyr 9bc7649b95 drop bogus error log 'no MGW fsm'
Looking at a customer's log, these error logs got my attention. There
seems to be no point in logging this at all.

Change-Id: I89dd4fb6913bfb84b6667b159e09968734e2102a
2023-01-03 00:29:58 +01:00
Pau Espin e62af4d46a Introduce support for libosmo-mgcp-client MGW pooling
Large RAN installations may benefit from distributing the RTP voice
stream load over multiple media gateways.

libosmo-mgcp-client supports MGW pooling since version 1.8.0 (more than
one year ago). OsmoBSC has already been making use of it since then (see
osmo-bsc.git 8d22e6870637ed6d392a8a77aeaebc51b23a8a50); lets use this
feature in osmo-hngw too.

This commit is also part of a series of patches cleaning up
libosmo-mgcp-client and slowly getting rid of the old non-mgw-pooled VTY
configuration, in order to keep only 1 way to configure
libosmo-mgcp-client through VTY.

Related: SYS#5091
Related: SYS#5987
Change-Id: I371dc773b58788ee21037dc25d77f556c89c6b61
2022-10-20 17:03:06 +02:00
Neels Hofmeyr 223aeda282 ranap_rab_ass_req_encode(): return msgb
ranap_rab_ass_req_encode() forms a msgb, then copies the data to a
buffer provided by the caller. Instead, just return the msgb. This
removes one unnecessary memcpy() and simplifies some code.

In ranap_rab_ass_test.c, actually ensure the correct size of the
returned data. See also the fix of expected test data in patch
Ifb98a52e56db1227a834c0d7b7a260314d9f547e

Related: SYS#5895
Change-Id: I85e715326e1d8f4f301f82f78da109f1a7a92f30
2022-07-27 15:45:18 +02:00
Neels Hofmeyr 05aaccc42d mgw_fsm: move MGCP timeout to mgw_fsm_T_defs
For the tdefs used by libosmo-mgcp-client, passed via
osmo_mgcpc_ep_alloc(), do not use the separate mgw_tdefs. Instead, move
X2427 to mgw_fsm_T_defs. This makes X2427 VTY configurable.

Related: SYS#5895
Change-Id: I2aa67121c20dc3da5fd937a02b6747468622f317
2022-07-27 15:44:51 +02:00
Pau Espin 62fb1dea61 mgw_fsm: Simplify cleanup paths
Let's have a unified way of freeing the FSM instance once it was
allocated, otherwise it's far more difficult to understand and maintain.

Change-Id: I8883e737fa112cff57834abae7ef272388a54edb
2022-06-15 11:55:21 +02:00
Pau Espin 304f7646c9 mgw_fsm: Fix error path accessing uninitialized fsm ptr
The error handling of the error path was wrong. Let's remove the "fi"
variable to avoid more of such errors. Furthermore, add an assert to
clarify for the reader that the map->mgw_fi will be freed before
allocating a new FSM instance below.

Change-Id: I9d3bca552bfa77f5e18f75bedad8d422f74df1f8
2022-06-14 18:41:48 +02:00
Pau Espin 87e03208af mgw_fsm: Change macro to not use local variables implicitly
This is misleading for readers since it may access variables which may
be uninitialized or in a wrong state. Furthermore, we want to pass some
other variable name in a follow up patch.

This effectively allows the compiler to warn about uninitialized used of
a fi var in line 661.

Change-Id: Id694f51bb2918fd27da87b3f4a905727cd7f5de6
2022-06-14 18:40:11 +02:00
Pau Espin de8b170d1a cosmetic: mgw_fsm: Fix typo in log
Change-Id: I80aa61a288ab37c51510af67c784498f5949fc50
2022-06-14 18:07:52 +02:00
Pau Espin 1d1839a34b mgw_fsm: Improve logging
Change-Id: I14785b6bc798c3bae8c552bccb55ca4fa9f2f416
2022-06-14 18:07:32 +02:00
Pau Espin 8c7aae87b0 mgw_fsm: Mark structs as static const
Change-Id: Ie62f28587c08296429c0dabda7b6add67ffa010c
2022-06-14 17:46:56 +02:00
Neels Hofmeyr ff2fbdf998 fix segfault in error handling for mgw_fi == NULL
In mgw_fsm_handle_rab_ass_resp(), a NULL mgw_fi is handled as error,
but the error handling fails to return. The function continues to
dereference mgw_fi. Add missing return.

Related: SYS#5995
Change-Id: I3e98dc3a00145ec1f71c678bbf45debfd4276237
2022-06-10 11:40:33 +02:00
Philipp Maier be9ed71631 ranap_rab_ass: check for more than one RAB assignment req
The spec permits RAB AssignmentRequests with multiple RABs at a time.
Even though one voice call is assigned only one RAB in practice. Since
the current FSM implementation only supports a 1:1 scenario, lets check
if the MSC really assigns only one RAB and block RAB Assignments that do
not fit in this scheme.

Change-Id: I0f1d868fd0b4dc413533d6fcc5482862825181be
Related: OS#5152
2022-02-28 10:22:16 +01:00
Philipp Maier d1f4b9b9a1 mgw_fsm: release call when FSM is not created
While the FSM is created the RAB Assignment Requests is checked and
parsed. In case of failure the context is freed, but the CN is not
informed about the problem. The RAB AssignmentRequest will then most
likely time out. However, lets make sure the call is released by re
requesting an IU Release.

Change-Id: I1904f7e95d86bbcecee14f8721bd4075d0e33ab4
Related: OS#5152
2022-02-25 15:12:18 +01:00
Philipp Maier 81f1751896 mgw_fsm: add MGW support to osmo-hnbgw
osmo-hnbgw lacks support for an co-located media gateway. This makes it
virtually impossible to isolate the HNB from the core network properly.

Lets add MGCP support to osmo-hnbgw so that it can control a co-located
media gateway to relay the RTP streams between HNB and core network.

Change-Id: Ib9b62e0145184b91c56ce5d8870760bfa49cc5a4
Related: OS#5152
2022-02-24 10:51:30 +01:00