This fixes TBF objects leaking and ending up alive when the MS object is
explicitly freed through talloc_free (and sporadically
crashing TbfTest once a timeout for them occur).
This mostly affects unit tests, where most of the explicit free()
happens.
In osmo-pcu, in general, the GprsMs object only gets _free() called when
its resource count reaches 0, aka no more TBFs are attached to it. Hence
in general GprsMs object is freed() only when no TBFs (to be leaked) are
present.
However, in the unit tests it's usual that we want to wipe the entire
context by eg. feeing the PCU, the BTS or MS object, which should also
free the related TBFs.
When running osmo-pcu this may only be an issue when the MS object is
freed explicitly, which could happen for instance when a BTS is torn down,
ie. PCUIF going down, moment at which all GprsMs of that BTS are freed.
But in there actually it iterates over PDCHs to free all TBFs, so it's
fine.
If we iterated over MS, this could have ended up in a crash, like
it happened in TbfTest sporadically, but it's not a bit problem if we
crash + restart at that time since anyway the BTS is gone ore just
getting up around that time.
Related: OS#6359
Change-Id: Ibbdec94acb8132be20508d3178d88da44bfaf91d
Scenario: A DL TBF is assigned over PCH (CCCH) and we start transmitting
DL data blocks blindly after X2002, but at the same time the MS start
packet-access-procedure to request an UL TBF.
Right now osmo-pcu correctly detects the MS is available in PDCH and
re-assigns a DL TBF using PACCH, but the LLC frames it transmitted in
the old PCH-assigned DL TBF get lost when that older TBF is freed
(because the DL blocks were removed from the GprsMs llc_queue).
This issue is now more frequent since X2002 timer was added recently to
delay starting requesting USF for a UL TBF, hence the contention
resolution in general takes more time and hence the PACCH assignment of
the DL TBF takes more time too, so more DL data blocks are transmitted
to the DL TBF assigned over PCH during that time.
This patch improves the situation to at least recover the DL blocks
transmitted if the DL TBF is freed (due to MS merge trigger by scenario
mentioned above), where no DL ACK/NACK was ever received by the MS.
Ideal solution would be to have complete tracking of which LLC PDUs from
the llc_queues were completely ACKed at RLC/MAC level, but that really
requires a lot of work and major refactoring, which are left as a future
improvement.
Change-Id: I9be4035fb2cf2b3ee56e91dcc12cc8c24028b4aa
Otherwise the scheduler selects the UL TBF for USF during that time, and
of course no one has that USF assigned yet, so no answer and that ends
up triggering MAX_N3101 on the UL TBF.
This is specially important when PCU is connected to the BSC, since the
amount of time for the ImmAss to be scheduled on the BTS and reach the
MS can be bigger.
Change-Id: I48babd70ca44f11110240cbcfbab43d0e3a0fb59
T3192 on the MS side (boadcasted by the network on SI13) is used for the
MS to stay monitoring the PDCH of a DL TBF after sending PKT DL ACK/NACK
for the last DL data block (FinalAck=1).
T3193 is the counterpart in the network, defined as >T3192, which is
used to make sure the TFI is no longer used by the MS and hence can be
reused/re-assigned again.
Hence, we want to differentiate between those 2 timers, since the first
one gives us information on how late can we still tx PktDlAss on PACCH
on that DL TBF, while the later only provides info on when to free the
DL TBF in order to reuse the TFI.
Change-Id: I7421342461bf159d945651037e6fe690f88c6fae
When sending the last DL block, the FSM moves FLOW->FINISHED. Hence, it
is impossible to receive the FINAL_ACK in state flow, when we didn't yet
sent the last DL block (it's only possible if there's a bug in the MS).
TbfTest need to be adapted since they where acting wrong.
Change-Id: I3288743ff01ff84c3d345b6efb7c3fb6ca934791
Dispatching TBF_EV_CONTENTION_RESOLUTION_MS_SUCCESS means the UL TBF is
able to be used to assign DL TBFs over PACCH.
However, there's an extra implicit restriction: if the PCU already sent
the final UL ACK/NACK to that UL TBF, then whatever message sent
after it cannot be reliably answered, since the MS will go back to idle
state after issues the PKT CTRL ACK for that final UL ACK/NACK.
This condition is already being checked in
contention_resolution_success():
"""
if (ms_need_dl_tbf(ms()) && !tbf_ul_ack_waiting_cnf_final_ack(this))
ms_new_dl_tbf_assigned_on_pacch(ms(), this);
"""
Since we are considering the UL TBF to have done contention resolution
when we transmit the *first* UL ACK/NACK, it mans we are doing both things
on the same code path iif the *first* UL ACK/NACK is at the same time
the *final* UL ACK for that UL TBF. This can happen if the UL TBF is
only sending 1 block, which will have CV=0. This can usually happen
during GMM Attach Request.
In this scenario, with current code goes into a situation where first
the TBF_EV_CONTENTION_RESOLUTION_MS_SUCCESS is triggered and *afterwards*
the UL_ACK/NACK state is updated. As a result, the code snippet check
above (!tbf_ul_ack_waiting_cnf_final_ack()) will be true and the DL TBF
be assigned, but afterwards in the same code path it figures out the
final ack happens.
In order to fix this, first update the ACK/NACK state and only
afterwards trigger the Contention Resolution Success event.
Change-Id: I62ae91b494e4fd0ade3f4a3ba3817bcaedbdebf5
First print what is going to be attempted, later alarm about the
possible error, finally early return.
Change-Id: I417e9689f60e7f5d3c8ef67543e56fea87c8eebd
It makes no sense to continue trying to assign the UL TBF over PACCH
after T3168 * 4 retrans time out.
This helps in releasing the TBF after we got rid of incorrect use of
N3015 in UL TBFs.
While at it, update tbf_ul_fsm to use T3168 instead of X2001, since it
really needs to match T3168. Ideally it would not even have a timer
itself and receive an event from tbf_ul_ass_fsm, but that's left as a
TODO (it was already before) and simply the timer is updated.
Change-Id: I87dff68dedd06b60501e7586d20faf02bb1f0c93
T3195 is triggered as consequence of N3105 reaching MAX, which only
occurs in DL TBFs. Hence, T3195 also only applies to DL TBFs.
The references to T3195 were there as a remains from time
where we had a single tbf_fsm for both UL and DL TBFs. It can now be
further simplified.
Change-Id: I07a43c13289d50707115187a3f22df21443a7b4a
This counter is only used in DL TBFs as per TS 44.060.
section 13.4 "N3105":
"When the network after sending a RRBP field in the _downlink RLC data block_ ..."
(DL data blocks are only sent in DL TBFs).
section 8.1.2.1:
"If N3105 = N3105max, the network shall release the downlink TBF internally
and start timer T3195 for that TBF. When T3195 expires, the network may reuse
the TFI"
Change-Id: I4d4f4d4d3e6e0539ea8ec2395bed00d059b84e04
The FN time counter is not really PDCH specific, but to the whole BTS,
and all other calls t the bts_set_current_frame_number() are already
laced in pcu_l1_if.cpp; move it there.
Change-Id: If36f22a1067c904fa7fda87bed5062b6738f0dd1
The previous code was really confusing, passing full FNs as RFNs under
certain external conditions, and then assuming the RFN input of
rfn_to_fn() function could actually be a FN.
As a result, we had a lot of code behaving slightly different depending
on whether the incomding FN from pcuif was filled in by a BSC or a BTS.
Avoid this b ehavior differentiation and always assume the most
restricted one, aka RFN.
Change-Id: Ib3b5702168195b595711cd0ff32c211b9aba429d
Change format to print the state at the end, to resemble more the same
format used by FSMs.
Furthermore, by moving it at the end, print it only when "enclousure" is
requested, aka when not requested by FSM to update its internal name.
The consequence of this logc is that log lines printed from FSM don't
end up with the same state string printed twice in different places.
While at it, shorten the EGPRS/GPRS indicator to one character, which
should be understandable enough since it matches what's usually seen in
mobile phones to signal one or another.
Change-Id: I86b5f042fae77721b22fc026228677bd56768ba9
The functions l1if_open_pdch and l1if_close_pdch have a misleading
naming since what they actually do is opening and closing the TRX since
they return and accept a context (obj) that is valid for a whole TRX.
This also explains why the other functions accept a timeslot as
parameter in addition to the context. Let's rename those functions so
that it is clear what they do.
Related: OS#6022
Change-Id: I395a60b2fba39bac4facec78989bac20f0cef0d3
This patch finally decouples TBF allocation from resource allocation.
This will allow in the future reserving resources without having to
require a TBF object to exist.
Change-Id: I2856c946cb62d6e5372a1099b60e5f3456eb8fd4
This way the alloc_algo() becomes idempotent, simplifying implementation
of new alloc_algos as well as rolling back if allocation fails (for
instance because some resource is exhausted at the time).
For now the code applying the results is moved to tbf::alloc_algo(), but
it will eventually get out of tbf code, so that the MS object is
responsible for running it. As a result, there's no even need to create
TBF object before trying to allocate resources, which will help furher in
rollback operations described above.
Change-Id: I5ffd00f5f80bde4b73b78db44896f65e70e12b20
Simply apply the content of the configured timer when the MS goes idle.
Having that field is convenient to do tricky stuff in unit tests, but
makes the main osmo-pcu app more complex for no good enough reason.
Change-Id: I8d44318b37b6605afd84db8ccec0d75e6db293b9
This is useful to track the lifecycle of MS objects. The RLCMAC cateogry
used so far in those log messages is too broad.
Change-Id: Ib4ce88d0f7309ac77c064a94bb0d667e8dbc33dd
Make the caller hold a reference to the MS object just allocated, so
that it hs to explicitly unref it and, in turn, if no new references
were added during its use, trigger release of the MS object.
This is useful to avoid leaking MS object if it was allocated and then
no TBF is attached to it because allocation of TBF failed.
Related: OS#6002
Change-Id: I2088a7ddd76fe9157b6626ef96ae4315e88779ea
Whether the TBF is GPRS or EGPRS is known at allocation time since it
comes from the information known in the MS object used to create it.
Hence, no need to delay calling it to later steps such as setup().
So far it was probably left in setup() due to the constrains about
requiring the subclass to be constructed (use of window() virtual API).
Change-Id: I2e9d2a98c666a930333d52fb6c0463d7593c2615
This commit changes lots of stuff in the MS release lifecycle, but
there's no really good way to split this into patches which make sense,
since all the chaos is intensively entangled.
Get rid of the ms_callback complex mess, it is not needed at all.
Previous MS release was strange due to the existance of previous
ms_callback.idle concept and MS storage: the MS signalled when it went
idle (no TBFs attached) and waited for somebody outside to free it,
while then arming itself the release timer to release itself if it was
not released by whoever.
The new lifecycle follows an easier (expected) approach: Whenever all
TBFs become detached from the MS and it becomes idle (use_count becomes
0), then it frees its reserved resources (TFI, etc.) and either:
* frees itself immediatelly under certain conditions (release timeout
configured = 0 or MS garbage with TLLI=GSM_RESERVED_TMSI)
* Arms release_timer and frees itself when it triggers.
If during release_timer the MS is required again (for instance because a
new TBF with TLLI/IMSI of the MS is observed), then a TBF is attached to
the MS and it is considered to become active again, hence the release_timer
is stopped.
OS#6002
Change-Id: Ibe5115bc15bb4d76026918adc1be79469c2f4839
It is interesting to log that a tbf is being detached *before* it
actually happens, so that the reader can see which TBF is being
detached.
Change-Id: I2008beb9ab8f97f7ea5ed7b45cfb3f23dfe7b27f
gprs_default_cb_ms_idle() is changed to have the same implementation as
previous bts_ms_idle_cb(), since that's the only one being used in
osmo-pcu code. It makes no sense to use different callback logic in unit
tests.
This is another step towards simplifying the code and getting rid of the
idle/active_cb().
Change-Id: I2a06d17588572a21dc5a14ddbde83766076b446d
That class is mostly a C++ class holding a llist plus some callbacks.
Having that in a separate class makes code more complex for no good
reason. This patch moves the llist into bts and allocates stuff directly
from within bts.
This will allow further cleanup of MS lieficyle in future patches.
Change-Id: I627f5db5073189c23ddf2b7f09c90abb24846f62
That information is not required during allocation of the object, and
most times it is not known.
Defer setting it only to meaningul values in paths obtaining the
information from peers.
Change-Id: I36f07dc389f7abe205fc4bcddbde93735f5d5cfc
Inside osmo-pcu code, the code path is to always call ms_alloc with
tlli=GSM_RESERVED_TMSI; a different value is only passed during unit
tests.
It makes no sense to have unit tests using differnet code paths than
ther app itself, since in the app it always desired to go through the
ms_set_tlli() and ms_confirm_tlli() which does more stuff.
Hence, drop the tlli param in ms_alloc and change the unit tests to use
the available APIs used by the application.
Change-Id: I730ec911a43b0f4e78faee4eeffb3ca8601564f8
When debugging frame number offset problems betwen l1 and below and the
upper layers of the PCU, it may be helpful do know which blocks/frame
numbers got reserverd for uplink transmissions.
Change-Id: I4277c572a4cc6cbbf3ac4e67442c9036be687627
Related: OS#5198
The previous gprs/protocol/gsm_04_60.h header from libosmogsm was
misleading because it was placed in the subdirectory used by libosmogb,
and hence was recently deprecated in favour of gsm/protocol/gsm_04_60.h.
Let's follow the pragma message advising to move to the new header file.
Depends: libosmocore 0158b05337a825352d9fd7f074170b686e9fd1e5
Change-Id: I027abbf3ed4c71331000565af1ef4f08f10cfafc
There is a function l1if_connect_pdch, but no complementary function
like we have it with l1if_open_pdch and l1if_close_pdch. The reason for
this is that the PHY implementations that rely on a femtocell DSP do not
need to disconnect the pdch explcitly. However, the planned support for
the E1 based Ercisson RBS CCU will require an explicit disconnect. So
lets add a function call for this.
Change-Id: Ied88f3289bda87c48f5f9255c4591470633cc805
Related: OS#5198
In case of an FN jump the expected value is logged. Lets also log the
delta between the expected and the current FN as it may give a better
clue what goes wrong
Change-Id: Ie361df30852570fe8a47347a42e962db869ccf82
The DL-TBF assigned to another MS object may have a totally different
set of reserved resources (TS set, TRX, etc.), so one cannot simply move
those to the new MS. To start with, if the 2 MS are on different TRX it
is clear that one of them will not be really in operation. That's most
probably the DL-TBF being in ASSIGN state on CCCH waiting for PCUIF_CNF
and later X2002 to trigger to start sending DL blocks, but without
confirmation whether the MS is really there. Since the other new MS
object probably has a UL-TBF, that's the one probably operative, and
hence a new DL-TBF can be created at that same time and assigned through
UL-TBF's PACCH.
Unit test test_ms_merge_dl_tbf_different_trx showcases the above
scenario.
Related: SYS#6231
Related: OS#5700
Related: 677bebbe5c
Change-Id: Ie8cb49d5492cfc4cbf8340f3f376b0e6105e8c82
Add a test which showcases a scenario where the PCU ends up with 1
GprsMs object holding a DL-TBF in a weird condition half in one TRX and
half in other due to ms_merge().
This test (slightly adapted) used to cause a crash in osmo-pcu.git
586ddda9bc (a few versions behind current
master).
Related: SYS#6231
Change-Id: Ic16b5e96debf91e72684833cd64253687857f3aa
This allows having full TS information, not only ts_no, which will be
needed later on followup patches.
Change-Id: I1efccb32c5afa4fe62233bf114ea95b6fbbe1a06
The DL-TBF in the test gets its TS allocated on TS4 (only that one is
enabled through setup_bts(bts, ts_no=4)), and hence it makes no sense to
ask it to send data on TS7.
Change-Id: Ibe6fc073b7438b8024c4d3ddb0e60c6112752351
This allows having full information on the control TS easily reachable
(like TRX ofthe PDCH), and makes it easy to compare TS by simply
matching the pointer address.
Change-Id: I6a97b6528b2f9d78dfbca8fb97ab7c621f777fc7
First commit towards trying to have alloc algorithm as isolated as
possible from others parts of the code trying to avoid state changes on
data structures.
Change name also because the alloc_algo not only allocated TS, but TFIs
and USFs.
Change-Id: I33a6c178c64a769f05d3880a69c38acb154afa62
* Make it similar to the already existing TBF allocation procedures
* Pass pdch pointer instead of trx and ts numbers
Change-Id: I04b3b65942732cc652adeaa507529b849292ff61
The triggering of the assignment for the new TBF allocation is part of
the "upgrade to multislot" process, hence move that inside the function.
Change-Id: Ic2b959f2476b900cb263ccd8f6698ed843bafd29
* Make clear the code relates to DL-TBF and not UL-TBF.
* Change wording to "upgrade" to match the existing field and API
"tbf_can_upgrade_to_multislot()".
* Free the TBF if we cannot allocate new resources.
Change-Id: I0e4f8d7e46235a471b2124b280c81ff07b6967a4