Commit Graph

30 Commits

Author SHA1 Message Date
Neels Hofmeyr 311bfb5983 log osmo_fsm timeouts
set osmo_fsm_log_timeouts(true);

Change-Id: Ic1ca03f06fbdef5a3fbe503e4414a780eb3e0fcc
2023-02-24 15:19:24 +01:00
Neels Hofmeyr ed424d5be4 context map: introduce RUA and SCCP FSMs to fix leaks
Refactor the entire RUA <-> SCCP connection-oriented message forwarding:
- conquer confusion about hnbgw_context_map release behavior, and
- eradicate SCCP connection leaks.

Finer points:

== Context map state ==
So far, we had a single context map state and some flags to keep track
of both the RUA and the SCCP connections. It was easy to miss connection
cleanup steps, especially on the SCCP side.
Instead, the two FSMs clearly define the RUA and SCCP conn states
separately, and each side takes care of its own release needs for all
possible scenarios.
- When both RUA and SCCP are released, the context map is discarded.
- A context map can stay around to wait for proper SCCP release, even if
  the RUA side has lost the HNB connection.
- Completely drop the async "context mapper garbage collection", because
  the FSMs clarify the release and free steps, synchronously.
- We still keep a (simplified) enum for global context map state, but
  this is only used so that VTY reporting remains mostly unchanged.

== Context map cleanup confusion ==
The function context_map_hnb_released() was the general cleanup function
for a context map. Instead, add separate context_map_free().

== Free context maps separately from HNB ==
When a HNB releases, talloc_steal() the context maps out of the HNB
specific hnb_ctx, so that they are not freed along with the HNB state,
possibly leaving SCCP connections afloat.
(It is still nice to normally keep context maps as talloc children of
their respective hnb_ctx, so talloc reports show which belongs to
which.)

So far, context map handling found the global hnb_gw pointer via
map->hnb_ctx->gw. But in fact, a HNB may disappear at any point in time.
Instead, use a separate hnb_gw pointer in map->gw.

== RUA procedure codes vs. SCCP prims ==
So far, the RUA rx side composed SCCP prims to pass on:

 RUA rx ---SCCP-prim--> RANAP handling ---SCCP-prim--> SCCP tx

That is a source of confusion: a RUA procedure code should not translate
1:1 to SCCP prims, especially for RUA id-Disconnect (see release charts
below).
Instead, move SCCP prim composition over to the SCCP side, using FSM
events to forward:

 RUA rx --event--> RUA FSM --event--> SCCP FSM --SCCP-prim--> SCCP tx
         +RANAP             +RANAP              +RANAP

 RUA tx <--RUA---- RUA FSM <--event-- SCCP FSM <--event-- SCCP rx
          +RANAP             +RANAP              +RANAP

Hence choose the correct prim according to the SCCP FSM state.
- in hnbgw_rua.c, use RUA procedure codes, not prim types.
- via the new FSM events' data args, pass msgb containing RANAP PDUs.

== Fix SCCP Release behavior ==
So far, the normal conn release behavior was

 HNB                 HNBGW                   CN
  | --id-Disconnect--> | ---SCCP-Released--> |  Iu-ReleaseComplete
  |                    | <--SCCP-RLC-------- |  (no data)

Instead, the SCCP release is now in accordance with 3GPP TS 48.006 9.2
'Connection release':

 The MSC sends a SCCP released message. This message shall not contain
 any user data field.

i.e.:

 HNB                 HNBGW                    CN
  | --id-Disconnect--> | ---Data-Form-1(!)--> |  Iu-ReleaseComplete
  |                    | <--SCCP-Released---- |  (no data)
  |                    | ---SCCP-RLC--------> |  (no data)

(Side note, the final SCCP Release Confirm step is taken care of
implicitly by libosmo-sigtran's sccp_scoc.c FSM.)

If the CN fails to respond with SCCP-Released, on new X31 timeout,
osmo-hnbgw will send an SCCP Released to the CN as fallback.

== Memory model for message dispatch ==
So far, an osmo_scu_prim aka "oph" was passed between RUA and SCCP
handling code, and the final dispatch freed it. Every error path had to
take care not to leak any oph.
Instead, use a much easier and much more leakage proof memory model,
inspired by fixeria:
- on rx, dispatch RANAP msgb that live in OTC_SELECT.
- no code path needs to msgb_free() -- the msgb is discarded via
  OTC_SELECT when handling is done, error or no error.
- any code path may also choose to store the msgb for async dispatch,
  using talloc_steal(). The user plane mapping via MGW and UPF do that.
- if any code path does msgb_free(), that would be no problem either
  (but none do so now, for simplicity).

== Layer separation ==
Dispatch *all* connection-oriented RUA tx via the RUA FSM and SCCP tx
via the SCCP FSM, do not call rua_tx_dt() or osmo_sccp_user_sap_down()
directly.

== Memory model for decoded ranap_message IEs ==
Use a talloc destructor to make sure that the ranap_message IEs are
always implicitly freed upon talloc_free(), so that no code path can
possibly forget to do so.

== Implicit cleanup by talloc ==
Use talloc scoping to remove a bunch of explicit cleanup code. For
example, make a chached message a talloc child of its handler:
  talloc_steal(mgw_fsm_priv, message);
  mgw_fsm_priv->ranap_rab_ass_req_message = message;
and later implicitly free 'message' by only freeing the handler:
  talloc_free(mgw_fsm_priv)

Related: SYS#6297
Change-Id: I6ff7e36532ff57c6f2d3e7e419dd22ef27dafd19
2023-02-24 15:19:24 +01:00
Neels Hofmeyr 60a52e4121 cosmetic: rename context_map_deactivate
Rename context_map_deactivate() to context_map_hnb_released().

This function is called only when the HNB is released / lost.

Change-Id: I6dcb26c94558fff28faf8f823e490967a9baf2ec
2023-02-23 01:15:54 +01:00
Neels Hofmeyr 3f4d645890 Deprecate 'sccp cr max-payload-len', remove SCCP CR limit code
The SCCP CR payload length limit is now implemented in libosmo-sigtran
v1.7.0. The limit no longer needs to be enforced in osmo-hnbgw.

This reverts commit 2c91bd66a1,
except for keeping the cfg option, marked deprecated, and not doing
anything.

Fixes: OS#5906
Related: SYS#5968
Related: OS#5579
Depends: I174b2ce06a31daa5a129c8a39099fe8962092df8 (osmo-ttcn3-hacks)
Change-Id: I18dece84b33bbefce8617fbb0b2d79a7e5adb263
2023-02-15 03:34:43 +01:00
Neels Hofmeyr 8eefcbee92 UE state leak: when HNB re-registers, discard previous UE state
User reports show leaked UE contexts over time, in a scenario where HNB
regularly disconnect and reconnect.

So far, when we receive a HNB REGISTER REQ, we log as "duplicated" and
continue to use the hnb_context unchanged. But it seems obvious that a
HNB that registers does not expect any UE state to remain. It will
obviously not tear down UE contexts (HNBAP or RUA) that have been
established before the HNB REGISTER REQUEST. These hence remain in
osmo-hnbgw indefinitely.

When receiving a HNB REGISTER REQUEST, release all its previous UE
state. Allow the HNB to register with a clean slate.

The aim is to alleviate the observed build-up of apparently orphaned UE
contexts, in order to avoid gradual memory exhaustion.

Related: SYS#6297
Change-Id: I7fa8a04cc3b2dfba263bda5b410961c77fbed332
2023-02-11 03:32:34 +01:00
arehbein 76c4203552 osmo-hnbgw: Transition to use of 'telnet_init_default'
Related: OS#5809
Change-Id: Id3256d09f62e802cc62fa9ba8aaafd403ccbb53e
2022-12-23 11:13:46 +00:00
Max a7fcbe100c ctrl: take both address and port from vty config
Change-Id: If5b80364c28fb1ca2bc00f4ece851de64c8ce6b1
2022-12-17 21:24:13 +03:00
Pau Espin e62af4d46a Introduce support for libosmo-mgcp-client MGW pooling
Large RAN installations may benefit from distributing the RTP voice
stream load over multiple media gateways.

libosmo-mgcp-client supports MGW pooling since version 1.8.0 (more than
one year ago). OsmoBSC has already been making use of it since then (see
osmo-bsc.git 8d22e6870637ed6d392a8a77aeaebc51b23a8a50); lets use this
feature in osmo-hngw too.

This commit is also part of a series of patches cleaning up
libosmo-mgcp-client and slowly getting rid of the old non-mgw-pooled VTY
configuration, in order to keep only 1 way to configure
libosmo-mgcp-client through VTY.

Related: SYS#5091
Related: SYS#5987
Change-Id: I371dc773b58788ee21037dc25d77f556c89c6b61
2022-10-20 17:03:06 +02:00
Pau Espin b9be0ea93e Clear SCTP tx queue upon SCTP RESTART notification
Depends: libosmo-netif.git Change-Id Iecb0a4bc281647673d2930d1f1586a2df231af52
Related: SYS#6113
Change-Id: I60adf35e5b5713d38c4584615e059875dcb74bd7
2022-10-17 13:57:17 +02:00
Pau Espin bbad8dec36 hnb_read_cb(): -EBADF must be returned if conn is freed to avoid use-after-free
Otherwise the libosmo-netif stream API may continue accessing the conn
after returning if the socket has the WRITE flag active in the same main
loop iteration.

Change-Id: I628c59a88d94d299f432f405b37fbe602381d47e
2022-10-01 21:21:24 +02:00
Pau Espin c923d19b7b hnb_read_cb: use local var to reduce get_ofd() calls
Change-Id: Ic7058b5a05b0d34b80617006d4e929a523212221
2022-10-01 21:21:24 +02:00
Pau Espin 5f19597b02 Close conn when receiving SCTP_ASSOC_CHANGE notification
It was seen on a real pcap trace (sctp & gsmtap_log) that the kernel
stack may decide to kill the connection (sending an ABORT) if it fails
to transmit some data after a while:
ABORT Cause code: "Protocol violation (0x000d)",
      Cause Information: "Association exceeded its max_retrans count".
When this occurs, the kernel sends the
MSG_NOTIFICATION,SCTP_ASSOC_CHANGE,SCTP_COMM_LOST notification when
reading from the socket with sctp_recvmsg(). This basically signals that
the socket conn is dead, and subsequent writes to it will result in
send() failures (and receive SCTP_SEND_FAILED notification upon follow
up reads).
It's important to notice that after those events, there's no other sort
of different event like SHUTDOWN coming in, so that's the time at which
we must tell the user to close the socket.
Hence, let's signal the caller that the socket is dead by returning 0,
to comply with usual recv() API.

Related: SYS#6113
Change-Id: If35efd404405f926a4a6cc45862eeadd1b04e08c
2022-10-01 21:21:06 +02:00
Pau Espin 1906a30ca9 Fix handling of sctp SCTP_SHUTDOWN_EVENT notification
SCTP_SHUTDOWN_EVENT is a first class event, and not a subtype of
SCTP_ASSOC_CHANGE (such as SCTP_SHUTDOWN_COMP).

Related: SYS#6113
Change-Id: I7fa648142c07f63c55091d2a15b9d7312bcd4cec
2022-09-30 14:43:06 +02:00
Pau Espin 3bf5395102 hnbgw: Fix recent regression not closing conn upon rx of SCTP_SHUTDOWN_EVENT
Before handling of OSMO_STREAM_SCTP_MSG_FLAGS_NOTIFICATION in recent
commit (see Fixes: below), osmo_stream_srv_recv() and
internal _sctp_recvmsg_wrapper() in libosmo-netif would return either
-EAGAIN or 0 when an sctp notification was received from the kernel.

After adding handling of OSMO_STREAM_SCTP_MSG_FLAGS_NOTIFICATION, the
code paths for "rc == -EAGAIN" and "rc == 0" would not be executed
anymore since the first branch takes preference in the if-else
tree. For "rc == -EAGAIN" it's fine because the new branch superseeds
what's done on the "rc == -EAGAIN" branch. However, for the "rc == 0",
we forgot to actually destroy the connection. The "rc == 0" branch was
basically reached when SCTP_SHUTDOWN_EVENT was received because
osmo_stream_srv_recv() tried to resemble the interface of regular
recv(); let's hence check for that explicitly and destroy the conn
object (and the related hnb context in the process) when we receive
that event.

Fixes: 1de2091515
Related: SYS#6113
Change-Id: I11b6af51a58dc250975a696b98d0c0c9ff3df9e0
2022-09-22 16:39:51 +02:00
Pau Espin 1de2091515 hnbgw: Unregister HNB if SCTP link is restarted
Sometimes an hNodeB may reconnect (SCTP INIT) using same SCTP tuple without
closing the previous conn. This is handled by the SCTP stack by means of
pushing a RESET notification up the stack to the sctp_recvmsg() user.
Let's handle this by marking the HNB as unregistered, since most
probably a HNB Register Req comes next as the upper layer state is
considered lost.

Depends: libosmo-netif.git Change-Id I0ee94846a15a23950b9d70eaaef1251267296bdd
Related: SYS#6113
Change-Id: Ib22881b1a34b1c3dd350912b3de8904917cf34ef
2022-09-19 14:58:11 +02:00
Pau Espin d046306b63 Change log level about conn becoming closed to NOTICE
Change-Id: I8973990e2cc435422e62dd2a38192e7a6da4a716
2022-09-16 10:37:00 +00:00
Pau Espin f9825cbd4a Improve logging around hnb_context and sctp conn lifecycle
Change-Id: I44c79d86924ead84246b3d4937a6becae5d29185
2022-09-14 12:16:38 +02:00
Pau Espin 930ed702b6 hnb_context_release(): Make sure assigned conn is freed
Otherwise, some paths calling hnb_context_release() (like failing to
transmit HNB-REGISTER-REJECT) would end up with a conn object alive with
no assigned hnb_context, which is something not wanted.

This way an alive conn object always has an associated hnb_context, and
they are only disassociated during synchronous release path.

Related: OS#5676
Change-Id: I44fea7ec74f14e0458861c92da4acf685ff695c1
2022-09-14 12:16:18 +02:00
Harald Welte fe7c34737d Don't process RUA messages if HNB is not registered
Related: OS#5676
Change-Id: I85442e8adfefadc3bf3ed795eaef7677eb0b36e9
2022-09-13 13:00:01 +02:00
Harald Welte c971c657c5 Abort if processing SCTP connection without HNB context
It was observed that under some circumstances (after HNBAP
HNB-De-Register) we end up crashing because a connection has no HNB
assigned to it. Let's explicitly assert if that happens, in order
clarify and avoid same sort of thing happening without clear view on
what's going on.
The issue will be fixed in a follow-up patch.

Closes: OS#5676
Change-Id: I1eedab6f3ac974e942b02eaae41556f87dd8b6ba
2022-09-13 11:31:46 +02:00
Pau Espin eadf523393 hnbgw: Log new SCTP HNB connections
Change-Id: I07b98ff4c3199eeab11a8c1cfd9ce44ab99bca85
2022-09-13 11:31:46 +02:00
Pau Espin 419e832473 cosmetic: Fix typo in log and whitespace
Change-Id: Ie2be6937bb0f44ea66397c905c5d380caa2d4cef
2022-09-13 11:31:40 +02:00
Daniel Willmann d129e0c86e hnbgw_hnbap: Fix memory leaks in HNBAP handling
* Use osmo_stream closed_cb to call hnb_context_release() in all cases
* Also call hnbap_free_hnbregisterrequesties() when sending hnb register
  reject

Related: OS#5656
Change-Id: I3ba02b0939413c67bc8088ea1a8f2252fc2bda31
2022-08-23 18:15:02 +02:00
Daniel Willmann 2dfeb1e218 Install show talloc-context VTY commands
Related: OS#5656
Change-Id: Ia4b0023028405ce065f618f536c92ea2bcd0ce15
2022-08-23 17:51:51 +02:00
Neels Hofmeyr e6201765cf build: add --enable-pfcp, make PFCP dep optional
Related: SYS#5895
Change-Id: I6d50c60bccda767910217243bdfb4a6fad1e39c1
2022-08-09 17:57:43 +02:00
Neels Hofmeyr 1496498713 add ps_rab_ass FSM to map GTP via UPF
Related: SYS#5895
Depends: If80c35c6a942bf9593781b5a6bc28ba37323ce5e (libosmo-pfcp)
Change-Id: Ic9bc30f322c4c6c6e82462d1da50cb15b336c63a
2022-08-08 20:20:34 +00:00
Neels Hofmeyr 2c91bd66a1 add option to send SCCP CR without payload
It is reported that a third-party SGSN is rejecting SCCP CR when the
SCCP message part exceeds a certain length. The solution is to first
send an SCCP CR without payload, and send the payload in a DT later.

Add config option

  hnbgw
   sccp cr max-payload-len <0-999999>

If the RANAP payload surpasses the given length, osmo-hnbgw will first
send an SCCP CR without payload, cache the RANAP payload, and put that
in an SCCP DT once the SCCP CC is received.

The original idea was to limit the size of the entire SCCP part of the
message, but I'm currently not sure how to determine that without
copying much of the osmo_sccp code. I figured using a limit on the RANAP
payload is sufficient. To avoid the error with above third-party SGSN,
the easy solution is to set max-payload-len to 0, so that we always get
a separate SCCP CR without payload.

Related: SYS#5968
Related: I827e081eaacfb8e76684ed1560603e6c8f896c38 (osmo-ttcn3-hacks)
Change-Id: If0c5c0a76e5230bf22871f527dcb2dbdf34d7328
2022-06-07 22:51:26 +02:00
Neels Hofmeyr 0ca9567fb2 use osmo_select_main_ctx(), tweak log in handle_cn_conn_conf()
Upcoming patch adds to this function. Let me first combine those four
LOGP() to a single one, use proper osmo_sccp_addr_to_str_c(OTC_SELECT).

To be able to use OTC_SELECT, switch hnbgw.c to osmo_select_main_ctx().

Related: SYS#5968
Change-Id: I1e0ea0a883e8cf65e6cfb45ed9b6f3d8fb7c59eb
2022-06-07 18:09:19 +02:00
Philipp Maier 81f1751896 mgw_fsm: add MGW support to osmo-hnbgw
osmo-hnbgw lacks support for an co-located media gateway. This makes it
virtually impossible to isolate the HNB from the core network properly.

Lets add MGCP support to osmo-hnbgw so that it can control a co-located
media gateway to relay the RTP streams between HNB and core network.

Change-Id: Ib9b62e0145184b91c56ce5d8870760bfa49cc5a4
Related: OS#5152
2022-02-24 10:51:30 +01:00
Pau Espin dce3870429 Initial structure + import code from osmo-iuh.git
Imported from osmo-iuh.git 9b4de3f401c890fc2c0dfae9e827daaaadd80db0.

Change-Id: I569d221aeb83d352c1621c44c013a0e4c82fc8a8
2022-01-04 19:48:52 +01:00