Fix critical bug in default TCP keepalive user timeout

It turns out that our calculation of the TCP_USER_TIMEOUT value was
flawed in several ways:

* there should have been parenthesis around the + operator
  (line->keepalive_probe_interval + line->keepalive_idle_timeout) as the
  keepalive_idle_timeout is in seconds, not milli-seconds.

* in the default case, all those values are configured to -1
  (E1INP_USE_DEFAULT). This means we're using
	1000 * -1 * -1 + -1 = 999
  i.e. just below a second which clearly is not enough for a lossy
  satellite or wifi back-haul.

This fixes a regression introduced in Ia7659c209aea0d26eb37d31e771adc91b17ae668
(libosmo-abis >= 1.4.0) when TCP keepalive user timeouts became enabled
by default.

The initial support for TCP_USER_TIMEOUT was merged in
I5e7425958472aa5d758e09bfbefc7d7d37bf6f5f (libosmo-abis >= 0.7.0) but
since TCP keepalives were not yet enabled by default, only users with
explicit TCP keepalive configuration in their config files would be
affected - and then only of the second part of the bug (operator
precedence).

In addition, let's print the actually-used values to the log, helping to
spot unintended values.

Change-Id: Idca24d3e676a45d860d9eec60dc2097d8d87f3bf
Related: OS#5785, OS#6375, SYS#6801
Fixes: Ia7659c209aea0d26eb37d31e771adc91b17ae668
(cherry picked from commit 12fae9aeeb)
This commit is contained in:
Harald Welte 2024-02-24 06:51:04 +01:00 committed by Vadim Yanitskiy
parent e32dd31555
commit b8a44773fb
1 changed files with 18 additions and 21 deletions

View File

@ -599,7 +599,7 @@ struct e1inp_driver ipaccess_driver = {
static void update_fd_settings(struct e1inp_line *line, int fd)
{
int ret;
int val;
int val, idle_val, interval_val, retry_count_val, user_timeout_val;
if (line->keepalive_num_probes) {
/* Enable TCP keepalive to find out if the connection is gone */
@ -611,42 +611,39 @@ static void update_fd_settings(struct e1inp_line *line, int fd)
else
LOGP(DLINP, LOGL_NOTICE, "TCP Keepalive is enabled\n");
idle_val = line->keepalive_idle_timeout > 0 ?
line->keepalive_idle_timeout :
DEFAULT_TCP_KEEPALIVE_IDLE_TIMEOUT;
interval_val = line->keepalive_probe_interval > -1 ?
line->keepalive_probe_interval :
DEFAULT_TCP_KEEPALIVE_INTERVAL;
retry_count_val = line->keepalive_num_probes > 0 ?
line->keepalive_num_probes :
DEFAULT_TCP_KEEPALIVE_RETRY_COUNT;
user_timeout_val = 1000 * retry_count_val * (interval_val + idle_val);
LOGP(DLINP, LOGL_NOTICE,
"TCP keepalive idle_timeout=%us, interval=%us, retry_count=%u user_timeout=%ums\n",
idle_val, interval_val, retry_count_val, user_timeout_val);
/* The following options are not portable! */
val = line->keepalive_idle_timeout > 0 ?
line->keepalive_idle_timeout :
DEFAULT_TCP_KEEPALIVE_IDLE_TIMEOUT;
ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPIDLE,
&val, sizeof(val));
ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPIDLE, &idle_val, sizeof(idle_val));
if (ret < 0) {
LOGP(DLINP, LOGL_ERROR,
"Failed to set TCP keepalive idle time: %s\n",
strerror(errno));
}
val = line->keepalive_probe_interval > -1 ?
line->keepalive_probe_interval :
DEFAULT_TCP_KEEPALIVE_INTERVAL;
ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPINTVL,
&val, sizeof(val));
ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPINTVL, &interval_val, sizeof(interval_val));
if (ret < 0) {
LOGP(DLINP, LOGL_ERROR,
"Failed to set TCP keepalive interval: %s\n",
strerror(errno));
}
val = line->keepalive_num_probes > 0 ?
line->keepalive_num_probes :
DEFAULT_TCP_KEEPALIVE_RETRY_COUNT;
ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPCNT,
&val, sizeof(val));
ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPCNT, &retry_count_val, sizeof(retry_count_val));
if (ret < 0) {
LOGP(DLINP, LOGL_ERROR,
"Failed to set TCP keepalive count: %s\n",
strerror(errno));
}
val = 1000 * line->keepalive_num_probes *
line->keepalive_probe_interval +
line->keepalive_idle_timeout;
ret = setsockopt(fd, IPPROTO_TCP, TCP_USER_TIMEOUT,
&val, sizeof(val));
ret = setsockopt(fd, IPPROTO_TCP, TCP_USER_TIMEOUT, &user_timeout_val, sizeof(user_timeout_val));
if (ret < 0) {
LOGP(DLINP, LOGL_ERROR,
"Failed to set TCP user timeout: %s\n",