Add external dependency libnftables.
When an hNodeB registers, set up nftables rules to count GTP-U packets
(UDP port 2152) to and from that hNodeB's address -- we are assuming
that it is the same address that Iuh is connecting from.
This is a "workaround" to get performance indicators per hNodeB, without
needing a UPF that supports URR.
This patch reads counters from the nftables response using "manual"
string parsing. See also Id4e7fa017c31945388a010d8581715d71482116b which
modifies this to full JSON parsing.
Tweaked-by: osmith
Related: SYS#6773
Depends: libosmocore I0df84b4bb8cb5d8434b735fa3a38e7f95be43e91
Change-Id: I35b7e97fd039e36633dfde1317170527c82f9f68
Usually, a hnb_context still has a hnb_persistent associated at release
time. But that is not guaranteed.
See also further below, where the function tests for ctx->persistent
correctly.
Change-Id: I77ddd627ebfe96c7674c6a197af8b2c4b1a4024c
Add rate_ctr based statistics for RAB activation, deactivation and
failures. This requires us to parse RANAP in both uplink and downlink
and to iterate deep into the setup/modify/release/failure lists.
Given the way how the protocol works, the only way to distinguish an
activation from a modification is because sender and recipient know
whether a given RAB is already active at the time of the message. So we
also need to track the activation state of each RAB.
Depends: osmo-iuh.git Change-Id I328819c650fc6cefe735093a846277b4f03e6b29
Change-Id: I198fa37699e22380909764de6a0522ac79aa1d39
Adding counters for number of paging succeeded is much harder,
as we currently don't parse connection-oriented DL RANAP and/or any
L3 NAS in it.
Change-Id: I7648caa410dba8474d57121a8128579ba200b18f
Let's add a per-hnb rate_ctr and stat_item group. Only one initial
counter (number of Iuh establishments) and one initial stat_item
(uptime/downtime) is implemented.
Related: SYS#6773
Change-Id: I26d7c3657cdaf7c6ba5aa10a0677381ab099f8dd
This allows us to add a new "hnb-policy closed", which means we do not
accept any random HNB connection anymore, but only those whose identity
is pre-configured explicitly using administrative means.
Furthermore, this new data structure will allow us (in future patches)
to keep persistent data like uptime / downtime of each HNB.
Related: SYS#6773
Change-Id: Ife89a7a206836bd89334d19d3cf8c92969dd74de
We are about to introduce the stringified UMTS cell identifier to the
VTY, and for that we need to not only print but also parse the related
string.
Related: SYS#6773
Change-Id: I6da947d1f2316241e44e53bb6aaec4221cfaa2c0
Printing the PLMN 001-01 as "1-1" like the existing code is just weird,
and also doesn't differentiate between 2-digit and 3-digit MNC in the
output.
Change-Id: I015ad84a6f61b4420f6bfdaa60e8e1b53a71589c
According to 3GPP TS 25.413 8.26.2.2, "The RNC shall include the Global
RNC-ID IE in the RESET message." To be able to do that, osmo-hnbgw needs
to know the local PLMN.
Introduce explicit knowledge of the local PLMN by config, and use this
configured local PLMN in places where the local PLMN was so far derived
otherwise.
Subsequent patches will separately add the RNC-ID to the RANAP RESET
messages.
Since the PLMN is required to send correct RESET and RESET ACK, include
the new 'plmn' config item in all example configurations.
Related: SYS#6441
Change-Id: If404c3859acdfba274c719c7cb7d16f97d831a2a
As simple as it may seem, turning the RANAP CS/PS domain indicator to a
string so far is cumbersome. It is usually only CS or PS, but to be
accurate we should also expect invalid values. After the umpteenth
switch statement breaking up otherwise trivial code, I decided that yes,
after all, a value_string[] is a good idea even for just two enum values
that have only two-letter names.
Callers will follow in upcoming patches.
Related: SYS#6412
Change-Id: Ib3c5d772ccea8c854eec007de5c996d1b6640bc0
Make lots of small tweaks in logging around RUA to SCCP context maps, CN
link selection and HNBAP:
- remove duplicate log context (e.g. LOGHNB() already shows the HNB Id)
- bring more sense into logging categories and levels / reduce noise
- add and clarify the details being logged at what points in time
Related: SYS#6412
Change-Id: I41275d8c3e272177976a9302795884666c35cd06
Upcoming logging tweaks will call this from hnbgw_hnbap.c as well.
See I41275d8c3e272177976a9302795884666c35cd06
Related: SYS#6412
Change-Id: I499d18876685062d066a9a66d2def827efb77640
The Cell ID is an important part to distinguish HNB in logging, add it
to the output string.
Use OTC_SELECT buffer instead of static buffer.
Related: SYS#6412
Change-Id: Id903b8579ded8b908e644808e440d373bcca8da4
Introduce rate counter stats to osmo-hnbgw.
Add the first rate counters -- they will be fed in upcoming commit
"cnpool: select CN link from pool by NRI or round robin"
I66fba27cfbe6e2b27ee3443718846ecfbbd8a974
Related: SYS#6412
Change-Id: I0132d053223a38e5756cede74106019c47ddcd94
Implement only the VTY configuration part, applying NRI to select CN
links follows in I66fba27cfbe6e2b27ee3443718846ecfbbd8a974.
Use the osmo_nri_ranges API to manage each cnlink's NRI ranges by VTY
configuration.
Analogous to osmo-bsc
MSC pooling: make NRI mappings VTY configurable
4099f1db504c401e3d7211d9761b034d62d15f7c
I6c251f2744d7be26fc4ad74adefc96a6a3fe08b0
plus
vty: fix doc: default value for 'nri bitlen'
f15d236e3eee8e7cbdc184d401f33b062d1b1aac
I8af840a4589f47eaca6bf10e37e0859f9ae53dfa
Related: SYS#6412
Change-Id: Ifb87e01e5971962e5cfe5e127871af4a67806de1
Allow configuring multiple MSCs and SGSNs. Show this in new transcript
test cnpool.vty and in sccp.dot chart.
Still use only the first CN link; actual CN link selection from the pool
follows in I66fba27cfbe6e2b27ee3443718846ecfbbd8a974.
Config examples and VTY tests cfg files will be adjusted to the new cfg
syntax in patch If999b71a8a8237699f6ccfcaa31d1885e66c0518. Only the VTY
write() changes are visible here, to pass all transcript tests.
Related: SYS#6412
Change-Id: I5479eded786ec26062d49403a8be12967f113cdb
Prepare for CN pooling; allow using separate SS7 instances for IuCS and
IuPS.
New struct hnbgw_sccp_user describes a RANAP SCCP user, one per cs7.
Limit struct hnbgw_cnlink to describing a CN peer, using whichever SCCP
instance that is indicated by hnbgw_sccp_user.
Chart sccp.dot shows the changes made.
Related: SYS#6412
Change-Id: Iea1824f1c586723d989c80a909bae16bd2866e08
Fix the following compile error, seen on ubuntu 18.04 with GCC 7.5.0,
opensuse 15.4 and our OE builds:
hnbgw.c:549:15: error: initializer element is not constant
.copyright = hnbgw_copyright,
^~~~~~~~~~~~~~~
Fixes: 04844415 ("move main() to separate file")
Change-Id: I13b2569a369724e0298b064a0876b95d6dafd9d0
Conform to recent osmocom practice by having an osmo_hnbgw_main.c file
for main() and its immediate infrastructure.
Prepares for adding a non-installed libhnbgw.la -- which must not
contain a main() function -- so that linking regression tests is easy.
Separating to a new file seems appropriate because hnbgw.c has gathered
a bunch of stuff that is not strictly main() related / might be useful
to access in a C test.
Change-Id: I5f26a6b68f0d380617d73371f40f3b9055cac362
So far we jump through hoops everywhere to pass around osmo-hnbgw's
global singleton state to all code paths. Some files choose to spawn a
static "global" pointer. And we also have the talloc root tall_hnb_ctx.
Simplify:
- Have a single global g_hnbgw pointer. Drop all function args and
backpointers to the global state.
- Use that global g_hnbgw as talloc root context, instead of passing a
separate tall_hnb_ctx around.
(Cosmetic preparation for separate osmo_hnbgw_main.c file)
Change-Id: I3d54a5bb30c16a990c6def2a0ed207f60a95ef5d
It's really just a bloated llist_count() wrapper.
(Cosmetic preparation for separate osmo_hnbgw_main.c file)
Change-Id: Icd4100ddeb98f4dc2e2ec6de2d44a4b861a66078
Include talloc_asn1_context in 'show talloc-context application'. It
belongs there IMO.
Related: SYS#6297
Change-Id: Iffc320c64713b225b57211da4fb95a868bf1f369
Without "null tracking" enabled, the VTY 'show talloc-ctx all' shows
nothing at all.
Since the talloc_asn1_ctx is "created in NULL ctx", it was not visible
in any talloc reports before this patch.
This patch uncovers a slur of leaks, which accumulate in
talloc_asn1_ctx. Fixes follow.
Related: SYS#6297
Change-Id: I5cb4e9a3b393100877f03d68a09acf5817c5a878
Don't fall back to the legacy config if the pool is configured but no
connection to any pool member can be established.
Depends: osmo-mgw I009483ac9dfd6627e414f14d43b89f40ea4644db
Related: OS#5993
Change-Id: Icf3f3f0524c9d27f645c68851aaf9c6f33842927
When receiving SCTP_RESTART for a given HNB, directly clear the UE
Contexts.
(The HNB typically connects via HNBAP shortly after this causing a UE
Context clearing too, but UE state should always be cleared on
SCTP_RESTART, no matter what.)
Change-Id: I583922193ba73e17ab85152005535188c2762b85
Refactor the entire RUA <-> SCCP connection-oriented message forwarding:
- conquer confusion about hnbgw_context_map release behavior, and
- eradicate SCCP connection leaks.
Finer points:
== Context map state ==
So far, we had a single context map state and some flags to keep track
of both the RUA and the SCCP connections. It was easy to miss connection
cleanup steps, especially on the SCCP side.
Instead, the two FSMs clearly define the RUA and SCCP conn states
separately, and each side takes care of its own release needs for all
possible scenarios.
- When both RUA and SCCP are released, the context map is discarded.
- A context map can stay around to wait for proper SCCP release, even if
the RUA side has lost the HNB connection.
- Completely drop the async "context mapper garbage collection", because
the FSMs clarify the release and free steps, synchronously.
- We still keep a (simplified) enum for global context map state, but
this is only used so that VTY reporting remains mostly unchanged.
== Context map cleanup confusion ==
The function context_map_hnb_released() was the general cleanup function
for a context map. Instead, add separate context_map_free().
== Free context maps separately from HNB ==
When a HNB releases, talloc_steal() the context maps out of the HNB
specific hnb_ctx, so that they are not freed along with the HNB state,
possibly leaving SCCP connections afloat.
(It is still nice to normally keep context maps as talloc children of
their respective hnb_ctx, so talloc reports show which belongs to
which.)
So far, context map handling found the global hnb_gw pointer via
map->hnb_ctx->gw. But in fact, a HNB may disappear at any point in time.
Instead, use a separate hnb_gw pointer in map->gw.
== RUA procedure codes vs. SCCP prims ==
So far, the RUA rx side composed SCCP prims to pass on:
RUA rx ---SCCP-prim--> RANAP handling ---SCCP-prim--> SCCP tx
That is a source of confusion: a RUA procedure code should not translate
1:1 to SCCP prims, especially for RUA id-Disconnect (see release charts
below).
Instead, move SCCP prim composition over to the SCCP side, using FSM
events to forward:
RUA rx --event--> RUA FSM --event--> SCCP FSM --SCCP-prim--> SCCP tx
+RANAP +RANAP +RANAP
RUA tx <--RUA---- RUA FSM <--event-- SCCP FSM <--event-- SCCP rx
+RANAP +RANAP +RANAP
Hence choose the correct prim according to the SCCP FSM state.
- in hnbgw_rua.c, use RUA procedure codes, not prim types.
- via the new FSM events' data args, pass msgb containing RANAP PDUs.
== Fix SCCP Release behavior ==
So far, the normal conn release behavior was
HNB HNBGW CN
| --id-Disconnect--> | ---SCCP-Released--> | Iu-ReleaseComplete
| | <--SCCP-RLC-------- | (no data)
Instead, the SCCP release is now in accordance with 3GPP TS 48.006 9.2
'Connection release':
The MSC sends a SCCP released message. This message shall not contain
any user data field.
i.e.:
HNB HNBGW CN
| --id-Disconnect--> | ---Data-Form-1(!)--> | Iu-ReleaseComplete
| | <--SCCP-Released---- | (no data)
| | ---SCCP-RLC--------> | (no data)
(Side note, the final SCCP Release Confirm step is taken care of
implicitly by libosmo-sigtran's sccp_scoc.c FSM.)
If the CN fails to respond with SCCP-Released, on new X31 timeout,
osmo-hnbgw will send an SCCP Released to the CN as fallback.
== Memory model for message dispatch ==
So far, an osmo_scu_prim aka "oph" was passed between RUA and SCCP
handling code, and the final dispatch freed it. Every error path had to
take care not to leak any oph.
Instead, use a much easier and much more leakage proof memory model,
inspired by fixeria:
- on rx, dispatch RANAP msgb that live in OTC_SELECT.
- no code path needs to msgb_free() -- the msgb is discarded via
OTC_SELECT when handling is done, error or no error.
- any code path may also choose to store the msgb for async dispatch,
using talloc_steal(). The user plane mapping via MGW and UPF do that.
- if any code path does msgb_free(), that would be no problem either
(but none do so now, for simplicity).
== Layer separation ==
Dispatch *all* connection-oriented RUA tx via the RUA FSM and SCCP tx
via the SCCP FSM, do not call rua_tx_dt() or osmo_sccp_user_sap_down()
directly.
== Memory model for decoded ranap_message IEs ==
Use a talloc destructor to make sure that the ranap_message IEs are
always implicitly freed upon talloc_free(), so that no code path can
possibly forget to do so.
== Implicit cleanup by talloc ==
Use talloc scoping to remove a bunch of explicit cleanup code. For
example, make a chached message a talloc child of its handler:
talloc_steal(mgw_fsm_priv, message);
mgw_fsm_priv->ranap_rab_ass_req_message = message;
and later implicitly free 'message' by only freeing the handler:
talloc_free(mgw_fsm_priv)
Related: SYS#6297
Change-Id: I6ff7e36532ff57c6f2d3e7e419dd22ef27dafd19
Rename context_map_deactivate() to context_map_hnb_released().
This function is called only when the HNB is released / lost.
Change-Id: I6dcb26c94558fff28faf8f823e490967a9baf2ec
The SCCP CR payload length limit is now implemented in libosmo-sigtran
v1.7.0. The limit no longer needs to be enforced in osmo-hnbgw.
This reverts commit 2c91bd66a1,
except for keeping the cfg option, marked deprecated, and not doing
anything.
Fixes: OS#5906
Related: SYS#5968
Related: OS#5579
Depends: I174b2ce06a31daa5a129c8a39099fe8962092df8 (osmo-ttcn3-hacks)
Change-Id: I18dece84b33bbefce8617fbb0b2d79a7e5adb263
User reports show leaked UE contexts over time, in a scenario where HNB
regularly disconnect and reconnect.
So far, when we receive a HNB REGISTER REQ, we log as "duplicated" and
continue to use the hnb_context unchanged. But it seems obvious that a
HNB that registers does not expect any UE state to remain. It will
obviously not tear down UE contexts (HNBAP or RUA) that have been
established before the HNB REGISTER REQUEST. These hence remain in
osmo-hnbgw indefinitely.
When receiving a HNB REGISTER REQUEST, release all its previous UE
state. Allow the HNB to register with a clean slate.
The aim is to alleviate the observed build-up of apparently orphaned UE
contexts, in order to avoid gradual memory exhaustion.
Related: SYS#6297
Change-Id: I7fa8a04cc3b2dfba263bda5b410961c77fbed332
Large RAN installations may benefit from distributing the RTP voice
stream load over multiple media gateways.
libosmo-mgcp-client supports MGW pooling since version 1.8.0 (more than
one year ago). OsmoBSC has already been making use of it since then (see
osmo-bsc.git 8d22e6870637ed6d392a8a77aeaebc51b23a8a50); lets use this
feature in osmo-hngw too.
This commit is also part of a series of patches cleaning up
libosmo-mgcp-client and slowly getting rid of the old non-mgw-pooled VTY
configuration, in order to keep only 1 way to configure
libosmo-mgcp-client through VTY.
Related: SYS#5091
Related: SYS#5987
Change-Id: I371dc773b58788ee21037dc25d77f556c89c6b61
Otherwise the libosmo-netif stream API may continue accessing the conn
after returning if the socket has the WRITE flag active in the same main
loop iteration.
Change-Id: I628c59a88d94d299f432f405b37fbe602381d47e
It was seen on a real pcap trace (sctp & gsmtap_log) that the kernel
stack may decide to kill the connection (sending an ABORT) if it fails
to transmit some data after a while:
ABORT Cause code: "Protocol violation (0x000d)",
Cause Information: "Association exceeded its max_retrans count".
When this occurs, the kernel sends the
MSG_NOTIFICATION,SCTP_ASSOC_CHANGE,SCTP_COMM_LOST notification when
reading from the socket with sctp_recvmsg(). This basically signals that
the socket conn is dead, and subsequent writes to it will result in
send() failures (and receive SCTP_SEND_FAILED notification upon follow
up reads).
It's important to notice that after those events, there's no other sort
of different event like SHUTDOWN coming in, so that's the time at which
we must tell the user to close the socket.
Hence, let's signal the caller that the socket is dead by returning 0,
to comply with usual recv() API.
Related: SYS#6113
Change-Id: If35efd404405f926a4a6cc45862eeadd1b04e08c
SCTP_SHUTDOWN_EVENT is a first class event, and not a subtype of
SCTP_ASSOC_CHANGE (such as SCTP_SHUTDOWN_COMP).
Related: SYS#6113
Change-Id: I7fa648142c07f63c55091d2a15b9d7312bcd4cec
Before handling of OSMO_STREAM_SCTP_MSG_FLAGS_NOTIFICATION in recent
commit (see Fixes: below), osmo_stream_srv_recv() and
internal _sctp_recvmsg_wrapper() in libosmo-netif would return either
-EAGAIN or 0 when an sctp notification was received from the kernel.
After adding handling of OSMO_STREAM_SCTP_MSG_FLAGS_NOTIFICATION, the
code paths for "rc == -EAGAIN" and "rc == 0" would not be executed
anymore since the first branch takes preference in the if-else
tree. For "rc == -EAGAIN" it's fine because the new branch superseeds
what's done on the "rc == -EAGAIN" branch. However, for the "rc == 0",
we forgot to actually destroy the connection. The "rc == 0" branch was
basically reached when SCTP_SHUTDOWN_EVENT was received because
osmo_stream_srv_recv() tried to resemble the interface of regular
recv(); let's hence check for that explicitly and destroy the conn
object (and the related hnb context in the process) when we receive
that event.
Fixes: 1de2091515
Related: SYS#6113
Change-Id: I11b6af51a58dc250975a696b98d0c0c9ff3df9e0
Sometimes an hNodeB may reconnect (SCTP INIT) using same SCTP tuple without
closing the previous conn. This is handled by the SCTP stack by means of
pushing a RESET notification up the stack to the sctp_recvmsg() user.
Let's handle this by marking the HNB as unregistered, since most
probably a HNB Register Req comes next as the upper layer state is
considered lost.
Depends: libosmo-netif.git Change-Id I0ee94846a15a23950b9d70eaaef1251267296bdd
Related: SYS#6113
Change-Id: Ib22881b1a34b1c3dd350912b3de8904917cf34ef
Otherwise, some paths calling hnb_context_release() (like failing to
transmit HNB-REGISTER-REJECT) would end up with a conn object alive with
no assigned hnb_context, which is something not wanted.
This way an alive conn object always has an associated hnb_context, and
they are only disassociated during synchronous release path.
Related: OS#5676
Change-Id: I44fea7ec74f14e0458861c92da4acf685ff695c1
It was observed that under some circumstances (after HNBAP
HNB-De-Register) we end up crashing because a connection has no HNB
assigned to it. Let's explicitly assert if that happens, in order
clarify and avoid same sort of thing happening without clear view on
what's going on.
The issue will be fixed in a follow-up patch.
Closes: OS#5676
Change-Id: I1eedab6f3ac974e942b02eaae41556f87dd8b6ba