The task manager for IKEv1 issues a retransmit send alert in the
retransmit_packet() function. The corresponding retransmit cleared alert
however is only issued for exchanges we initiated after processing the
response in process_response().
For quick mode exchanges we may retransmit the second packet if the peer
(the initiator) does not send the third message in a timely manner. In
this case the retransmit send alert may never be cleared.
With this patch the retransmit cleared alert is issued for packets that
were retransmitted also when we are the responding party when we receive
the outstanding response.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
For inbound processing, it can be rather useful to apply the mark to the
packet in the SA, so the associated policy with that mark implicitly matches.
When using %unique as match mark, we don't know the mark beforehand, so
we most likely want to set the mark we match against.
%unique (and the upcoming %same key) are usable in specific contexts only.
To restrict the user from using it in other places where it does not get the
expected results, reject such keywords unless explicitly allowed.
We don't retransmit DPD requests like we do requests for proper exchanges,
so increasing the number with each sent DPD could result in the peer's state
getting out of sync if DPDs are lost. Because according to RFC 3706, DPDs
with an unexpected sequence number SHOULD be rejected (it does mention the
possibility of maintaining a window of acceptable numbers, but we currently
don't implement that). We partially ignore such messages (i.e. we don't
update the expected sequence number and the inbound message stats, so we
might send a DPD when none is required). However, we always send a response,
so a peer won't really notice this (it also ensures a reply for "retransmits"
caused by this change, i.e. multiple DPDs with the same number - hopefully,
other implementations behave similarly when receiving such messages).
Fixes#2714.
This is mainly for HA where a passive SA was already created when the
IKE keys were derived. If e.g. an authentication error occurs later that
SA wouldn't get cleaned up.
The reload of the configuration of the loggers so far only included
the log levels. In order to support the reload of all other options,
a reload function may be implemented.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
The options control whether the DF and ECN header bits/fields are copied
from the unencrypted packets to the encrypted packets in tunnel mode (DF only
for IPv4), and for ECN whether the same is done for inbound packets.
Note: This implementation only works with Linux/Netlink/XFRM.
Based on a patch by Markus Sattler.
During a test with ~12000 established SAs it was noted that vici
related operations hung.
The operations took over 16 minutes to finish. The time was spent in
the vici message parser, which was assigning the message over and over
again, to get rid of the already parsed portions.
First fixed by cutting the consumed parts off without copying the message.
Runtime for ~12000 SAs is now around 20 seconds.
Further optimization brought the runtime down to roughly 1-2 seconds
by using an fd to read through the message variable.
Closesstrongswan/strongswan#103.
The code to support parallel Netlink queries (commit 3c7193f) made use
of nlmsg_len member from struct nlmsghdr to allocate and copy the
responses. Since NLMSG_NEXT is later used to parse these responses, they
must be aligned, or the results are undefined.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
macOS supports AES_GCM_ICV16 natively using PF_KEYv2.
This change enables AES_GCM if the corresponding definition is detected
in the headers.
With this change it is no longer necessary to use the libipsec module to
use AES_GCM on macOS.
Closesstrongswan/strongswan#107.
Removing and readding the entry to a potentially different row/segment,
while driving out waiting and new threads, could prevent threads from
acquiring the SA even if they were waiting to check it out by unique
ID (which doesn't change), or if they were just trying to enumerate it.
With this change the row and segment doesn't change anymore and waiting
threads may acquire the SA. However, those looking for an IKE_SA by SPIs
might get one back that has a different SPI (but that's probably not
something that happens very often this early).
This was noticed because we check out SAs by unique ID in the Android
app to terminate them after failed retransmits if we are not reestablishing
the SA (otherwise we continue), and this sometimes failed.
Fixes: eaedcf8c00 ("ike-sa-manager: Add method to change the initiator SPI of an IKE_SA")
Instead of logging the search parameters for IKE configs (which were already
before starting the lookup) we log the configured settings.
The peer config lookup is also changed slightly by doing the IKE config
match first and skipping some checks if that or the local peer identity
doesn't match.
Although being already logged on level 2, these messages are usually just
confusing if they pop up randomly in the log when e.g. querying the configs
or installing traps. So after this the log messages will only be logged when
actually proposing or selecting traffic selectors during IKE.
This way we don't rely on the order of equally matching configs as
heavily anymore (which is actually tricky in vici) and this also doesn't
require repeating weak algorithms in all configs that might potentially be
selected if there are some clients that require them.
There is currently no ordering, so an explicitly configured exactly matching
proposal isn't a better match than e.g. the default proposal that also
contains the proposed algorithms.
In some scenarios we might find multiple usable peer configs with different
IKE proposals. This is a problem if we use a config with non-matching
proposals that later causes IKE rekeying to fail. It might even be a problem
already when creating the CHILD_SA if the proposals of IKE and CHILD_SA
are consistent.
This allows switching to probing mode if the client is on a public IP
and this is the active task and connectivity gets restored. We only add
NAT-D payloads if we are currently behind a NAT (to detect changed NAT
mappings), a MOBIKE update that might follow will add them in case we
move behind a NAT.
In case the PRF's set_key() or allocate_bytes() method failed, skeyseed
was not initialized and the chunk_clear() call later caused a crash.
This could have happened with OpenSSL in FIPS mode when MD5 was
negotiated (and test vectors were not checked, in which case the PRF
couldn't be instantiated as the test vectors would have failed).
MD5 is not included in the default proposal anymore since 5.6.1, so
with recent versions this could only happen with configs that are not
valid in FIPS mode anyway.
Fixes: CVE-2018-10811
We now check if there are other routes tracked for the same destination
and replace the installed route instead of just removing it. Same during
installation, where we previously didn't replace existing routes due to
NLM_F_EXCL. Routes with virtual IPs as source address are preferred over
routes without.
This should allow using trap policies with virtual IPs on Linux.
Fixes#85, #2162.
The client identifier serves as unique identifier just like a unique MAC
address would, so even with identity_leases disabled some DHCP servers
might assign unique leases per identity.
This increases the chances that subject DNs that might have been cut
off with the arbitrary previous limit of 64 bytes might now be sent
successfully.
The REQUEST message has the most static overhead in terms of other
options (17 bytes) as compared to DISCOVER (5) and RELEASE (7).
Added to that are 3 bytes for the DHCP message type, which means we have
288 bytes left for the two options based on the client identity (host
name and client identification). Since both contain the same value, a
FQDN identity, which causes a host name option to get added, may be
142 bytes long, other identities like subject DNs may be 255 bytes
long (the maximum for a DHCP option).
According to RFC 2131, the minimum size of the 'options' field is 312
bytes, including the 4 byte magic cookie. There also does not seem to
be any restriction regarding the message length, previously the length
was rounded to a multiple of 64 bytes. The latter might have been
because in BOOTP the options field (or rather vendor-specific area as it
was called back then) had a fixed length of 64 bytes (so max(optlen+4, 64)
might actually have been what was intended), but for DHCP the field is
explicitly variable length, so I don't think it's necessary to pad it.
Since we won't read from the socket reducing the receive buffer saves
some memory and it should also minimize the impact on other processes that
bind the same port (Linux distributes packets to the sockets round-robin).
DHCP servers will respond to port 67 if giaddr is non-zero, which we set
if we are not broadcasting. While such messages are received fine via
RAW socket the kernel will respond with an ICMP port unreachable if no
socket is bound to that port. Instead of opening a dummy socket on port
67 just to avoid the ICMPs we can also just operate with a single
socket, bind it to port 67 and send our requests from that port.
Since SO_REUSEADDR behaves on Linux like SO_REUSEPORT does on other
systems we can bind that port even if a DHCP server is running on the
same host as the daemon (this might have to be adapted to make this work
on other systems, but due to the raw socket the plugin is not that portable
anyway).
The previous code compared the port in the packet to the client port and, if
successful, checked it also against the server port, which, therefore, never
matched, but due to incorrect offsets did skip the BPF_JA. If the client port
didn't match the code also skipped to the instruction after the BPF_JA.
However, the latter was incorrect also and processing would have continued at
the next instruction anyway. Basically, DHCP packets to any port were accepted.
What's not fixed with this is that the kernel returns an ICMP Port
unreachable for packets sent to the server port (67) because we don't
have a socket bound to it.
Fixes: f0212e8837 ("Accept DHCP replies on bootps port, as we act as a relay agent if server address configured")
We don't have MOBIKE and the fallback to reauthentication does also not
make much sense as that doesn't affect the CHILD_SAs for IKEv1. So
instead of complicating the code we just ignore roam events for IKEv1
for now.
Closesstrongswan/strongswan#100.
In very early versions routed CHILD_SAs were attached to IKE_SAs, since
that's not the case anymore (they are handled via trap manager), we can
remove this special handling.
This algorithm uses a fixed-length key and we MUST NOT send a key length
attribute when proposing such algorithms.
While we could accept transforms with key length this would only work as
responder, as original initiator it wouldn't because we won't know if a
peer requires the key length. And as exchange initiator (e.g. for
rekeyings), while being original responder, we'd have to go to great
lengths to store the condition and modify the sent proposal to patch in
the key length. This doesn't seem worth it for only a partial fix.
This means, however, that ChaCha20/Poly1305 can't be used with previous
releases (5.3.3 an newer) that don't contain this fix.
Fixes#2614.
Fixes: 3232c0e64e ("Merge branch 'chapoly'")
Since these are installed overlapping (like during a rekeying) we have to use
the same (unique) marks (and possibly reqid) that were used previously,
otherwise, the policy installation will fail.
Fixes#2610.
If the responder is behind a NAT that remaps the response from the
statically forwarded port 500 to a new external port (as Azure seems to be
doing) we should still switch to port 4500 if we used port 500 so far as
it would not have been possible to send any messages to it if it wasn't
really port 500 (we only add a non-ESP marker if neither port is 500).
If a Quick mode is initiated for a CHILD_SA that is already installed
we can identify this situation and rekey the already installed CHILD_SA.
Otherwise we end up with several CHILD_SAs in state INSTALLED which
means multiple calls of child_updown are done. Unfortunately,
the deduplication code later does not call child_updown() (so up and down
were not even).
Closesstrongswan/strongswan#95.
Until now the configuration available to user for HW offload were:
hw_offload = no
hw_offload = yes
With this commit users will be able to configure auto mode using:
hw_offload = auto
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Until now there were 2 hw_offload modes: no/yes
* hw_offload = no : Configure the SA without HW offload.
* hw_offload = yes : Configure the SA with HW offload.
In this case, if the device does not support
offloading, SA creation will fail.
This commit introduces a new mode: hw_offload = auto
----------------------------------------------------
If the device and kernel support HW offload, configure
the SA with HW offload, but do not fail SA creation otherwise.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
The previous code was obviously incorrect and caused strange side effects
depending on the compiler and its optimization flags (infinite looping seen
with GCC 4.8.4, segfault when destroying the private key in build() seen
with clang 4.0.0 on FreeBSD).
Fixes#2579.
If we receive an INVALID_KE_PAYLOAD notify we should not just retry
with the requested DH group without checking first if we actually propose
the group (or any at all).
After a rekeying we keep the inbound SA and policies installed for a
while, but the outbound SA and policies are already removed. Attempting
to update them could get the refcount in the kernel interface out of sync
as the additional policy won't be removed when the CHILD_SA object is
eventually destroyed.
When initiating a trap policy we explicitly pass the reqid along. I guess
the lookup was useful to get the same reqid if a trapped CHILD_SA is manually
initiated. However, we now get the same reqid anyway if there is no
narrowing. And if the traffic selectors do get narrowed the reqid will be
different but that shouldn't be a problem as that doesn't cause an issue with
any temporary SAs in the kernel (this is why we pass the reqid to the
triggered CHILD_SA, otherwise, no new acquire would get triggered for
traffic that doesn't match the wider trap policy).
Reqids for the same traffic selectors are now stable so we don't have to
pass reqids of previously installed CHILD_SAs. Likewise, we don't need
to know the reqid of the newly installed trap policy as we now uninstall
by name.
With IKEv1 we transmit both public DH factors (used to derive the initial
IV) besides the shared secret. So these messages could get significantly
larger than 1024 bytes, depending on the DH group (modp2048 just about
fits into it). The new default of 2048 bytes should be fine up to modp4096
and for larger groups the buffer size may be increased (an error is
logged should this happen).
This is really only needed for other exchanges like DPDs not when we
just updated the addresses. The NAT-D payloads are only used here to
detect whether UDP encapsulation has to be enabled/disabled.