Retransmission jobs for old requests for which we already received a
response previously left the impression that messages were sent more
recently than was actually the case.
task_manager_t always defined INVALID_STATE as possible return value if
no retransmit was sent, this just was never actually returned.
I guess we could further differentiate between actual invalid states
(e.g. if we already received the response) and when we don't send a
retransmit for other reasons e.g. because the IKE_SA became stale.
Due to the exponential backoff a high number of retransmits only
makes sense if retransmit_limit is set. However, even with that there
was a problem.
We first calculated the timeout for the next retransmit and only then
compared that to the configured limit. Depending on the configured
base and timeout the calculation overflowed the range of uint32_t after
a relatively low number of retransmits (with the default values after 23)
causing the timeout to first get lower (on a high level) before constantly
resulting in 0 (with the default settings after 60 retransmits).
Since that's obviously lower than any configured limit, all remaining
retransmits were then sent without any delay, causing a lot of concurrent
messages if the number of retransmits was high.
This change determines the maximum number of retransmits until an
overflow occurs based on the configuration and defaults to UINT32_MAX
if that value is exceeded. Note that since the timeout is in milliseconds
UINT32_MAX equals nearly 50 days.
The calculation in task_manager_total_retransmit_timeout() uses a double
variable and the result is in seconds so the maximum number would be higher
there (with the default settings 1205). However, we want its result to
be based on the actual IKE retransmission behavior.
If a lot of QUICK_MODE tasks are queued and the other side
sends a DPD request, there is a good chance for timeouts.
Observed this in cases where other side is quite slow in responding
QUICK_MODE requests (e.g. Cisco ASA v8.x) and about 100 CHILD_SAs
are to be spawned.
Closesstrongswan/strongswan#115.
The task manager for IKEv1 issues a retransmit send alert in the
retransmit_packet() function. The corresponding retransmit cleared alert
however is only issued for exchanges we initiated after processing the
response in process_response().
For quick mode exchanges we may retransmit the second packet if the peer
(the initiator) does not send the third message in a timely manner. In
this case the retransmit send alert may never be cleared.
With this patch the retransmit cleared alert is issued for packets that
were retransmitted also when we are the responding party when we receive
the outstanding response.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
We don't retransmit DPD requests like we do requests for proper exchanges,
so increasing the number with each sent DPD could result in the peer's state
getting out of sync if DPDs are lost. Because according to RFC 3706, DPDs
with an unexpected sequence number SHOULD be rejected (it does mention the
possibility of maintaining a window of acceptable numbers, but we currently
don't implement that). We partially ignore such messages (i.e. we don't
update the expected sequence number and the inbound message stats, so we
might send a DPD when none is required). However, we always send a response,
so a peer won't really notice this (it also ensures a reply for "retransmits"
caused by this change, i.e. multiple DPDs with the same number - hopefully,
other implementations behave similarly when receiving such messages).
Fixes#2714.
This is mainly for HA where a passive SA was already created when the
IKE keys were derived. If e.g. an authentication error occurs later that
SA wouldn't get cleaned up.
If we find a redundant CHILD_SA (the peer probably rekeyed the SA before
us) we might not want to delete the old SA because the peer might still
use it (same applies to old CHILD_SAs after rekeyings). So only delete
them if configured to do so.
Fixes#2358.
Some devices always use the oldest IKE_SA to send DPDs and will delete
all IKE_SAs when there is no response. If uniqueness is not enforced
rekeyed IKE_SAs might not get deleted until they expire so we should
respond to DPDs.
References #2090.
By aborting the active task we don't have to wait for potential
retransmits if the other peer does not respond to the current task.
Since IKEv1 has no sequential message IDs and INFORMATIONALs are no real
exchanges this should not be a problem.
Fixes#1537
References #429, #1410Closesstrongswan/strongswan#48
Such a task is not initiated unless a certain time has passed. This
allows delaying certain tasks but avoids problems if we'd do this
via a scheduled job (e.g. if the IKE_SA is rekeyed in the meantime).
If the IKE_SA is rekeyed the delay of such tasks is reset when the
tasks are adopted i.e. they get executed immediately on the new IKE_SA.
This hasn't been implemented for IKEv1 yet.
Some peers send an INITIAL_CONTACT notify after they received our XAuth
username. The XAuth task waiting for the third XAuth message handles
this incorrectly and closes the IKE_SA as no configuration payloads are
contained in the message. We queue the INFORMATIONAL until the XAuth
exchange is complete to avoid this issue.
Fixes#1434.
If we haven't received the third QM message for multiple exchanges the
return value of NEED_MORE for passive tasks that are not responsible for
a specific exchange would trigger a fourth empty QM message.
Fixes: 4de361d92c ("ikev1: Fix handling of overlapping Quick Mode exchanges")
References #1076.
In some cases the third message of a Quick Mode exchange might arrive
after the first message of a subsequent Quick Mode exchange. Previously
these messages were handled incorrectly and the second Quick Mode
exchange failed.
Some implementations might even try to establish multiple Quick Modes
simultaneously, which is explicitly allowed in RFC 2409. We don't fully
support that, though, in particular in case of retransmits.
Fixes#1076.
As we now use the same reqid for multiple CHILD_SAs with the same selectors,
having marks based on the reqid makes not that much sense anymore. Instead we
use unique marks that use a custom identifier. This identifier is reused during
rekeying, keeping the marks constant for any rule relying on it (for example
installed by updown).
This also simplifies handling of reqid allocation, as we do not have to query
the marks that is not yet assigned for an unknown reqid.
The message() hook on bus_t is now called exactly once before (plain) and
once after fragmenting (!plain), not twice for the complete message and again
for each individual fragment, as was the case in earlier iterations.
For inbound messages the hook is called once for each fragment (!plain)
and twice for the reassembled message.
At the time we reset an IKE_SA (e.g. when re-authenticating a not yet
established SA due to a roaming event) such tasks might already be queued
by one of the phase 1 tasks. If the SA is initiated again another task will
get queued by the phase 1 task. This results in e.g. multiple mode config
requests, which most gateways will have problems with.
This patch exports the task manager's flush to allow flushing of all
queues with one function call from ike_sa->destroy. It allows the
access of intact children during task destructoin (see git-commit
e44ebdcf) and allows the access of the task manager in
child_state_change hook.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>