397 lines
19 KiB
Plaintext
397 lines
19 KiB
Plaintext
[[osmux]]
|
|
= OSmux: reduce of SAT uplink costs by protocol optimizations
|
|
|
|
== Problem
|
|
|
|
In case of satellite based GSM systems, the transmission cost on the back-haul
|
|
is relatively expensive. The billing for such SAT uplink is usually done in a
|
|
pay-per-byte basis. Thus, reducing the amount of bytes transfered would
|
|
significantly reduce the cost of such uplinks. In such environment, even
|
|
seemingly small protocol optimizations, eg. message batching and trunking, can
|
|
result in significant cost reduction.
|
|
|
|
This is true not only for speech codec frames, but also for the constant
|
|
background load caused by the signalling link (A protocol). Optimizations in
|
|
this protocol are applicable to both VSAT back-haul (best-effort background IP)
|
|
as well as Inmarsat based links (QoS with guaranteed bandwidth).
|
|
|
|
== Proposed solution
|
|
|
|
In order to reduce the bandwidth consumption, this document proposes to develop
|
|
a multiplex protocol that will be used to proxy voice and signalling traffic
|
|
through the SAT links.
|
|
|
|
=== Voice
|
|
|
|
For the voice case, we propose a protocol that provides:
|
|
|
|
* Batching: that consists of putting multiple codec frames on the sender side
|
|
into one single packet to reduce the protocol header overhead. This batch
|
|
is then sent as one RTP/UDP/IP packet at the same time. Currently, AMR 5.9
|
|
codec frames are transported in a RTP/UDP/IP protocol stacking. This means
|
|
there are 15 bytes of speech codec frame, plus a 2 byte RTP payload header,
|
|
plus the RTP (12 bytes), UDP (8 bytes) and IP (20 bytes) overhead. This means
|
|
we have 40 byte overhead for 17 byte payload.
|
|
|
|
* Trunking: in case of multiple concurrent voice calls, each of them will
|
|
generate one speech codec frame every 20ms. Instead of sending only codec
|
|
frames of one voice call in a given IP packet, we can 'interleave' or trunk
|
|
the codec frames of multiple calls into one IP. This further increases the
|
|
IP packet size and thus improves the payload/overhead ratio.
|
|
|
|
Both techniques should be applied without noticeable impact in terms of user
|
|
experience. As the satellite back-haul has very high round trip time (several
|
|
hundred milliseconds), adding some more delay is not going to make things
|
|
significantly worse.
|
|
|
|
For the batching, the idea consists of batching multiple codec frames on the
|
|
sender side, A batching factor (B) of '4' means that we will send 4 codec
|
|
frames in one underlying protocol packet. The additional delay of the batching
|
|
can be computed as (B-1)*20ms as 20ms is the duration of one codec frame.
|
|
Existing experimentation has shown that a batching factor of 4 to 8 (causing a
|
|
delay of 60ms to 140ms) is acceptable and does not cause significant quality
|
|
degradation.
|
|
|
|
The main requirements for such voice RTP proxy are:
|
|
|
|
* Always batch codec frames of multiple simultaneous calls into single UDP
|
|
message.
|
|
|
|
* Batch configurable number codec frames of the same call into one UDP
|
|
message.
|
|
|
|
* Make sure to properly reconstruct timing at receiver (non-bursty but
|
|
one codec frame every 20ms).
|
|
|
|
* Implementation in libosmo-netif to make sure it can be used
|
|
in osmo-bts (towards osmo-bsc), osmo-bsc (towards osmo-bts and
|
|
osmo-bsc_nat) and osmo-bsc_nat (towards osmo-bsc)
|
|
|
|
* Primary application will be with osmo-bsc connected via satellite link to
|
|
osmo-bsc_nat.
|
|
|
|
* Make sure to properly deal with SID (silence detection) frames in case
|
|
of DTX.
|
|
|
|
* Make sure to transmit and properly re-construct the M (marker) bit of
|
|
the RTP header, as it is used in AMR.
|
|
|
|
* Primary use case for AMR codec, probably not worth to waste extra
|
|
payload byte on indicating codec type (amr/hr/fr/efr). If we can add
|
|
the codec type somewhere without growing the packet, we'll do it.
|
|
Otherwise, we'll skip this.
|
|
|
|
=== Signalling
|
|
|
|
Signalling uses SCCP/IPA/TCP/IP stacking. Considering SCCP as payload, this
|
|
adds 3 (IPA) + 20 (TCP) + 20 (IP) = 43 bytes overhead for every signalling
|
|
message, plus of course the 40-byte-sized TCP ACK sent in the opposite
|
|
direction.
|
|
|
|
While trying to look for alternatives, we consider that none of the standard IP
|
|
layer 4 protocols are suitable for this application. We detail the reasons
|
|
why:
|
|
|
|
* TCP is a streaming protocol aimed at maximizing the throughput of a stream
|
|
withing the constraints of the underlying transport layer. This feature is
|
|
not really required for the low-bandwidth and low-pps GSM signalling.
|
|
Moreover, TCP is stream oriented and does not conserve message boundaries.
|
|
As such, the IPA header has to serve as a boundary between messages in the
|
|
stream. Moreover, assuming a generally quite idle signalling link, the
|
|
assumption of a pure TCP ACK (without any data segment) is very likely to
|
|
happen.
|
|
|
|
* Raw IP or UDP as alternative is not a real option, as it does not recover
|
|
lost packets.
|
|
|
|
* SCTP preserves message boundaries and allows for multiple streams
|
|
(multiplexing) within one connection, but it has too much overhead.
|
|
|
|
For that reason, we propose the use of LAPD for this task. This protocol was
|
|
originally specified to be used on top of E1 links for the A interface, who
|
|
do not expose any kind of noticeable latency. LAPD resolves (albeit not as
|
|
good as TCP does) packet loss and copes with packet re-ordering.
|
|
|
|
LAPD has a very small header (3-5 octets) compared to TCPs 20 bytes. Even if
|
|
LAPD is put inside UDP, the combination of 11 to 13 octets still saves a
|
|
noticable number of bytes per packet. Moreover, LAPD has been modified for less
|
|
reliable interfaces such as the GSM Um interface (LAPDm), as well as for the
|
|
use in satellite systems (LAPsat in ETSI GMR).
|
|
|
|
== OSmux protocol
|
|
|
|
The OSmux protocol is the core of our proposed solution. This protocol operates
|
|
over UDP or, alternatively, over raw IP. The designated default UDP port number
|
|
and IP protocol type have not been yet decided.
|
|
|
|
Every OSmux message starts with a control octet. The control octet contains a
|
|
2-bit Field Type (FT) and its location starts on the 2nd bit for backward
|
|
compatibility with older versions (used to be 3 bits). The FT defines the
|
|
structure of the remaining header as well as the payload.
|
|
|
|
The following FT values are assigned:
|
|
|
|
* FT == 0: LAPD Signalling
|
|
* FT == 1: AMR Codec
|
|
* FT == 2: Dummy
|
|
* FT == 3: Reserved for Fture Use
|
|
|
|
There can be any number of OSmux messages batched up in one underlaying packet.
|
|
In this case, the multiple OSmux messages are simply concatenated, i.e. the
|
|
OSmux header control octet directly follows the last octet of the payload of the
|
|
previous OSmux message.
|
|
|
|
|
|
=== LAPD Signalling (0)
|
|
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
|X|FT |X X X X X| PL-LENGTH | LAPD header + payload |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
|
|
Field Type (FT): 2 bits::
|
|
The Field Type allocated for AMR codec is "0".
|
|
|
|
This frame type is not yet supported inside OsmoCom and may be subject to
|
|
change in future versions of the protocol.
|
|
|
|
|
|
=== AMR Codec (1)
|
|
|
|
This OSmux packet header is used to transport one or more RTP-AMR packets for a
|
|
specific RTP stream identified by the Circuit ID field.
|
|
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
|M|FT | CTR |F|Q| Red. TS/SeqNR | Circuit ID |AMR FT |AMR CMR|
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
|
|
Marker (M): 1 bit::
|
|
This is a 1:1 mapping from the RTP Marker (M) bit as specified in RFC3550
|
|
Section 5.1 (RTP) as well as RFC3267 Section 4.1 (RTP-AMR). In AMR, the Marker
|
|
is used to indicate the beginning of a talk-spurt, i.e. the end of a silence
|
|
period. In case more than one AMR frame from the specific stream is batched into
|
|
this OSmux header, it is guaranteed that the first AMR frame is the first in the
|
|
talkspurt.
|
|
|
|
Field Type (FT): 2 bits::
|
|
The Field Type allocated for AMR codec is "1".
|
|
|
|
Frame Counter (CTR): 2 bits::
|
|
Provides the number of batched AMR payloads (starting 0) after the header. For
|
|
instance, if there are 2 AMR payloads batched, CTR will be "1".
|
|
|
|
AMR-F (F): 1 bit::
|
|
This is a 1:1 mapping from the AMR F field in RFC3267 Section 4.3.2. In case
|
|
there are multiple AMR codec frames with different F bit batched together, we
|
|
only use the last F and ignore any previous F.
|
|
|
|
AMR-Q (Q): 1 bit::
|
|
This is a 1:1 mapping from the AMR Q field (Frame quality indicator) in RFC3267
|
|
Section 4.3.2. In case there are multiple AMR codec frames with different Q bit
|
|
batched together, we only use the last Q and ignore any previous Q.
|
|
|
|
Circuit ID Code (CIC): 8 bits::
|
|
Identifies the Circuit (Voice call), which in RTP is identified by {srcip,
|
|
srcport, dstip, dstport, ssrc}.
|
|
|
|
Reduced/Combined Timestamp and Sequence Number (RCTS): 8 bits::
|
|
Resembles a combination of the RTP timestamp and sequence number. In the GSM
|
|
system, speech codec frames are generated at a rate of 20ms. Thus, there is no
|
|
need to have independent timestamp and sequence numbers (related to a 8kHz
|
|
clock) as specified in AMR-RTP.
|
|
|
|
AMR Codec Mode Request (AMR-FT): 4 bits::
|
|
This is a mapping from te AMR FT field (Frame type index) in RFC3267 Section
|
|
4.3.2. The length of each codec frame needs to be determined from this field. It
|
|
is thus guaranteed that all frames for a specific stream in an OSmux batch are
|
|
of the same AMR type.
|
|
|
|
AMR Codec Mode Request (AMR-CMR): 4 bits::
|
|
The RTP AMR payload header as specified in RFC3267 contains a 4-bit CMR field.
|
|
Rather than transporting it in a separate octet, we squeeze it in the lower four
|
|
bits of the clast octet. In case there are multiple AMR codec frames with
|
|
different CMR, we only use the last CMR and ignore any previous CMR.
|
|
|
|
==== Additional considerations
|
|
|
|
* It can be assumed that all OSmux frames of type AMR Codec contain at least 1
|
|
AMR frame.
|
|
* Given a batch factor of N frames (N>1), it can not be assumed that the amount
|
|
of AMR frames in any OSmux frame will always be N, due to some restrictions
|
|
mentioned above. For instance, a sender can decide to send before queueing the
|
|
expected N frames due to timing issues, or to conform with the restriction
|
|
that the first AMR frame in the batch must be the first in the talkspurt
|
|
(Marker M bit).
|
|
|
|
|
|
=== Dummy (2)
|
|
|
|
This kind of frame is used for NAT traversal. If a peer is behind a NAT, its
|
|
source port specified in SDP will be a private port not accessible from the
|
|
outside. Before other peers are able to send any packet to it, they require the
|
|
mapping between the private and the public port to be set by the firewall,
|
|
otherwise the firewall will most probably drop the incoming messages or send it
|
|
to a wrong destination. The firewall in most cases won't create a mapping until
|
|
the peer behind the NAT sends a packet to the peer residing outside.
|
|
|
|
In this scenario, if the peer behind the nat is expecting to receive but never
|
|
transmit audio, no packets will ever reach him. To solve this, the peer sends
|
|
dummy packets to let the firewall create the port mapping. When the other peers
|
|
receive this dummy packet, they can infer the relation between the original
|
|
private port and the public port and start sending packets to it.
|
|
|
|
When opening a connection, the peer is expected to send dummy packets until it
|
|
starts sending real audio, at which point dummy packets are not needed anymore.
|
|
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
|X|FT | CTR |X X|X X X X X X X X X| Circuit ID |AMR FT |X X X X|
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
|
|
Field Type (FT): 2 bits::
|
|
The Field Type allocated for AMR codec is "2".
|
|
|
|
Frame Counter (CTR): 2 bits::
|
|
Provides the number of dummy batched AMR payloads (starting 0) after the header.
|
|
For instance, if there are 2 AMR payloads batched, CTR will be "1".
|
|
|
|
Circuit ID Code (CIC): 8 bits::
|
|
Identifies the Circuit (Voice call), which in RTP is identified by {srcip,
|
|
srcport, dstip, dstport, ssrc}.
|
|
|
|
AMR Codec Mode Request (AMR-FT): 4 bits::
|
|
This field must contain any valid value described in the AMR FT field (Frame
|
|
type index) in RFC3267 Section 4.3.2.
|
|
|
|
==== Additional considerations
|
|
|
|
* After the header, additional padding needs to be allocated to conform with CTR
|
|
and AMR FT fields. For instance, if CTR is 0 and AMR FT is AMR 6.9, a padding
|
|
of 17 bytes is to be allocated after the header.
|
|
|
|
* On receival of this kind of OSmux frame, it's usually enough for the reader to
|
|
discard the header plus the calculated padding and keep operating.
|
|
|
|
|
|
== Evaluation: Expected traffic savings
|
|
|
|
The following figure shows the traffic saving (in %) depending on the number
|
|
of concurrent numbers of callings (asumming trunking but no batching at all):
|
|
----
|
|
Traffic savings (%)
|
|
100 ++-------+-------+--------+--------+-------+--------+-------+-------++
|
|
+ + + + + + batch factor 1 **E*** +
|
|
| |
|
|
80 ++ ++
|
|
| |
|
|
| |
|
|
| ****E********E
|
|
60 ++ ****E*******E********E*** ++
|
|
| **E**** |
|
|
| **** |
|
|
40 ++ *E** ++
|
|
| ** |
|
|
| ** |
|
|
| ** |
|
|
20 ++ E ++
|
|
| |
|
|
+ + + + + + + + +
|
|
0 ++-------+-------+--------+--------+-------+--------+-------+-------++
|
|
0 1 2 3 4 5 6 7 8
|
|
Concurrent calls
|
|
----
|
|
|
|
The results shows a saving of 15.79% with only one concurrent call, that
|
|
quickly improves with more concurrent calls (due to trunking).
|
|
|
|
We also provide the expected results by batching 4 messages for a single call:
|
|
----
|
|
Traffic savings (%)
|
|
100 ++-------+-------+--------+--------+-------+--------+-------+-------++
|
|
+ + + + + + batch factor 4 **E*** +
|
|
| |
|
|
80 ++ ++
|
|
| |
|
|
| |
|
|
| ****E********E*******E********E*******E********E
|
|
60 ++ ****E**** ++
|
|
| E*** |
|
|
| |
|
|
40 ++ ++
|
|
| |
|
|
| |
|
|
| |
|
|
20 ++ ++
|
|
| |
|
|
+ + + + + + + + +
|
|
0 ++-------+-------+--------+--------+-------+--------+-------+-------++
|
|
0 1 2 3 4 5 6 7 8
|
|
Concurrent calls
|
|
----
|
|
|
|
The results show a saving of 56.68% with only one concurrent call. Trunking
|
|
slightly improves the situation with more concurrent calls.
|
|
|
|
We also provide the figure with batching factor of 8:
|
|
----
|
|
Traffic savings (%)
|
|
100 ++-------+-------+--------+--------+-------+--------+-------+-------++
|
|
+ + + + + + batch factor 8 **E*** +
|
|
| |
|
|
80 ++ ++
|
|
| |
|
|
| ****E*******E********E
|
|
| ****E********E********E*******E**** |
|
|
60 ++ E*** ++
|
|
| |
|
|
| |
|
|
40 ++ ++
|
|
| |
|
|
| |
|
|
| |
|
|
20 ++ ++
|
|
| |
|
|
+ + + + + + + + +
|
|
0 ++-------+-------+--------+--------+-------+--------+-------+-------++
|
|
0 1 2 3 4 5 6 7 8
|
|
Concurrent calls
|
|
----
|
|
|
|
That shows very little improvement with regards to batching 4 messages.
|
|
Still, we risk to degrade user experience. Thus, we consider a batching factor
|
|
of 3 and 4 is adecuate.
|
|
|
|
== Other proposed follow-up works
|
|
|
|
The following sections describe features that can be considered in the mid-run
|
|
to be included in the OSmux infrastructure. They will be considered for future
|
|
proposals as extensions to this work. Therefore, they are NOT included in
|
|
this proposal.
|
|
|
|
=== Encryption
|
|
|
|
Voice streams within OSmux can be encrypted in a similar manner to SRTP
|
|
(RFC3711). The only potential problem is the use of a reduced sequence number,
|
|
as it wraps in (20ms * 2^256 * B), i.e. 5.12s to 40.96s. However, as the
|
|
receiver knows at which rate the codec frames are generated at the sender, he
|
|
should be able to compute how much time has passed using his own timebase.
|
|
|
|
Another alternative can be the use of DTLS (RFC 6347) that can be used to
|
|
secure datagram traffic using TLS facilities (libraries like openssl and
|
|
gnutls already support this).
|
|
|
|
=== Multiple OSmux messages in one packet
|
|
|
|
In case there is already at least one active voice call, there will be
|
|
regular transmissions of voice codec frames. Depending on the batching
|
|
factor, they will be sent every 70ms to 140ms. The size even of a
|
|
batched (and/or trunked) codec message is still much lower than the MTU.
|
|
|
|
Thus, any signalling (related or unrelated to the call causing the codec
|
|
stream) can just be piggy-backed to the packets containing the voice
|
|
codec frames.
|