For inbound processing, it can be rather useful to apply the mark to the
packet in the SA, so the associated policy with that mark implicitly matches.
When using %unique as match mark, we don't know the mark beforehand, so
we most likely want to set the mark we match against.
%unique (and the upcoming %same key) are usable in specific contexts only.
To restrict the user from using it in other places where it does not get the
expected results, reject such keywords unless explicitly allowed.
The options control whether the DF and ECN header bits/fields are copied
from the unencrypted packets to the encrypted packets in tunnel mode (DF only
for IPv4), and for ECN whether the same is done for inbound packets.
Note: This implementation only works with Linux/Netlink/XFRM.
Based on a patch by Markus Sattler.
During a test with ~12000 established SAs it was noted that vici
related operations hung.
The operations took over 16 minutes to finish. The time was spent in
the vici message parser, which was assigning the message over and over
again, to get rid of the already parsed portions.
First fixed by cutting the consumed parts off without copying the message.
Runtime for ~12000 SAs is now around 20 seconds.
Further optimization brought the runtime down to roughly 1-2 seconds
by using an fd to read through the message variable.
Closesstrongswan/strongswan#103.
The code to support parallel Netlink queries (commit 3c7193f) made use
of nlmsg_len member from struct nlmsghdr to allocate and copy the
responses. Since NLMSG_NEXT is later used to parse these responses, they
must be aligned, or the results are undefined.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
macOS supports AES_GCM_ICV16 natively using PF_KEYv2.
This change enables AES_GCM if the corresponding definition is detected
in the headers.
With this change it is no longer necessary to use the libipsec module to
use AES_GCM on macOS.
Closesstrongswan/strongswan#107.
Although being already logged on level 2, these messages are usually just
confusing if they pop up randomly in the log when e.g. querying the configs
or installing traps. So after this the log messages will only be logged when
actually proposing or selecting traffic selectors during IKE.
We now check if there are other routes tracked for the same destination
and replace the installed route instead of just removing it. Same during
installation, where we previously didn't replace existing routes due to
NLM_F_EXCL. Routes with virtual IPs as source address are preferred over
routes without.
This should allow using trap policies with virtual IPs on Linux.
Fixes#85, #2162.
The client identifier serves as unique identifier just like a unique MAC
address would, so even with identity_leases disabled some DHCP servers
might assign unique leases per identity.
This increases the chances that subject DNs that might have been cut
off with the arbitrary previous limit of 64 bytes might now be sent
successfully.
The REQUEST message has the most static overhead in terms of other
options (17 bytes) as compared to DISCOVER (5) and RELEASE (7).
Added to that are 3 bytes for the DHCP message type, which means we have
288 bytes left for the two options based on the client identity (host
name and client identification). Since both contain the same value, a
FQDN identity, which causes a host name option to get added, may be
142 bytes long, other identities like subject DNs may be 255 bytes
long (the maximum for a DHCP option).
According to RFC 2131, the minimum size of the 'options' field is 312
bytes, including the 4 byte magic cookie. There also does not seem to
be any restriction regarding the message length, previously the length
was rounded to a multiple of 64 bytes. The latter might have been
because in BOOTP the options field (or rather vendor-specific area as it
was called back then) had a fixed length of 64 bytes (so max(optlen+4, 64)
might actually have been what was intended), but for DHCP the field is
explicitly variable length, so I don't think it's necessary to pad it.
Since we won't read from the socket reducing the receive buffer saves
some memory and it should also minimize the impact on other processes that
bind the same port (Linux distributes packets to the sockets round-robin).
DHCP servers will respond to port 67 if giaddr is non-zero, which we set
if we are not broadcasting. While such messages are received fine via
RAW socket the kernel will respond with an ICMP port unreachable if no
socket is bound to that port. Instead of opening a dummy socket on port
67 just to avoid the ICMPs we can also just operate with a single
socket, bind it to port 67 and send our requests from that port.
Since SO_REUSEADDR behaves on Linux like SO_REUSEPORT does on other
systems we can bind that port even if a DHCP server is running on the
same host as the daemon (this might have to be adapted to make this work
on other systems, but due to the raw socket the plugin is not that portable
anyway).
The previous code compared the port in the packet to the client port and, if
successful, checked it also against the server port, which, therefore, never
matched, but due to incorrect offsets did skip the BPF_JA. If the client port
didn't match the code also skipped to the instruction after the BPF_JA.
However, the latter was incorrect also and processing would have continued at
the next instruction anyway. Basically, DHCP packets to any port were accepted.
What's not fixed with this is that the kernel returns an ICMP Port
unreachable for packets sent to the server port (67) because we don't
have a socket bound to it.
Fixes: f0212e8837 ("Accept DHCP replies on bootps port, as we act as a relay agent if server address configured")
Until now the configuration available to user for HW offload were:
hw_offload = no
hw_offload = yes
With this commit users will be able to configure auto mode using:
hw_offload = auto
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Until now there were 2 hw_offload modes: no/yes
* hw_offload = no : Configure the SA without HW offload.
* hw_offload = yes : Configure the SA with HW offload.
In this case, if the device does not support
offloading, SA creation will fail.
This commit introduces a new mode: hw_offload = auto
----------------------------------------------------
If the device and kernel support HW offload, configure
the SA with HW offload, but do not fail SA creation otherwise.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Reqids for the same traffic selectors are now stable so we don't have to
pass reqids of previously installed CHILD_SAs. Likewise, we don't need
to know the reqid of the newly installed trap policy as we now uninstall
by name.
With IKEv1 we transmit both public DH factors (used to derive the initial
IV) besides the shared secret. So these messages could get significantly
larger than 1024 bytes, depending on the DH group (modp2048 just about
fits into it). The new default of 2048 bytes should be fine up to modp4096
and for larger groups the buffer size may be increased (an error is
logged should this happen).
This can be useful if routing rules (instead of e.g. route metrics) are used
to switch from one to another interface (i.e. from one to another
routing table). Since we currently don't evaluate routing rules when
doing the route lookup this is only useful if the kernel-based route
lookup is used.
Resolvesstrongswan/strongswan#88.
If users want to associate secrets with any identity, let 'em. This is
also possible with vici and might help if e.g. the remote identity is
actually %any as that would match a PSK with local IP and %any better
than one with local and different remote IP.
Fixes#2497.
libstrongswan and kernel-netlink are the only two components which do
not adhere to the naming scheme used for all other tests. If the tests
are run by an external application this imposes problems due to clashing
names.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
If enabled, add the RADIUS Class attributes received in Access-Accept messages
to RADIUS accounting messages as suggested by RFC 2865 section 5.25.
Fixes#2451.
It seems that there is a race, at least in 10.13, that lets
if_indextoname() fail for the new TUN device. So we delay the call a bit,
which seems to "fix" the issue. It's strange anyway that the previous
delay was only applied when an iface entry was already found.
The value of DHCP_OPTEND is 255. When it is assigned this result in a
sign change as the positive int constant is cast to a signed char and -1
results. Clang 4.0 complains about this.
If an interface is renamed we already have an entry (based on the
ifindex) allocated but previously only set the usable state once
based on the original name.
Fixes#2403.
When querying SAs the keys will end up in this buffer (the allocated
messages that are returned are already wiped). The kernel also returns
XFRM_MSG_NEWSA as response to XFRM_MSG_ALLOCSPI but we can't distinguish
this here as we only see the response.
References #2388.
This prevented new listeners from receiving notifies if they joined
after another listener disconnected previously, and if they themselves
disconnected their old connection would prevent them again from getting
notifies.
Multiple CHILD_SAs sharing the same traffic selectors (e.g. during
make-before-break reauthentication) also have the same reqid assigned.
If all matching entries are removed we could end up without entry even
though an SA exists that still uses these traffic selectors.
Fixes#2373.
This is similar to the eap-aka-3gpp2 plugin. K (optionally concatenated
with OPc) may be configured as binary EAP secret in ipsec.secrets or
swanctl.conf.
Based on a patch by Thomas Strangert.
Fixes#2326.
Interestingly, this doesn't show up in the regression tests because the
compiler removes the first assignment (and thus the allocation) due to
-O2 that's included in our default CFLAGS.
By using the total retransmit timeout, modifications of timeout settings
automatically reflect on the value of xfrm_acq_expires. If set, the
value of xfrm_acq_expires configured by the user takes precedence over
the calculated value.
When establishing a traffic-triggered CHILD_SA involves the setup of an
IKE_SA more than one exchange is required. As a result the temporary
acquire state may have expired -- even if the acquire expiration
(xfrm_acq_expires) time is set properly (165 by default). The expire
message sent by the kernel is not processed in charon since no trap can
be found by the trap manager.
A possible solution could be to track allocated SPIs. But since this is
a corner case and the tracking introduces quite a bit of overhead, it
seems much more sensible to add a new state if the update of a state
fails with NOT_FOUND.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Upcoming FreeBSD kernels will support updating the addresses of existing
SAs with new SADB_X_EXT_NEW_ADDRESS_SRC|DST extensions for the SADB_UPDATE
message.
During initialization of the plugins the thread pool is not yet
initialized so there is no watcher thread that could handle the queued
Netlink message and the main thread will wait indefinitely for a
response.
Fixes#2199.
On Linux, setting the source address is insufficient to force a packet to be
sent over a certain path. The kernel uses the best route to select the outgoing
interface, even if we set a source address of a lower priority interface. This
is not only true for interfaces attaching to the same subnet, but also for
unrelated interfaces; the kernel (at least on 4.7) sends out the packet on
whatever interface it sees fit, even if that network does not expect packets
from the source address we force to.
When a better interface becomes available, strongSwan sends its MOBIKE address
list update using the old source address. But the kernel sends that packet over
the new best interface. If that network drops packets having the unexpected
source address from the old path, the MOBIKE update fails and the SA finally
times out.
To enforce a specific interface for our packet, we explicitly set the interface
index from the interface where the source address is installed. According to
ip(7), this overrules the specified source address to the primary interface
address. As this could have side effects to installations using multiple
addresses on a single interface, we disable the option by default for now.
This also allows using IPv6 link-local addresses, which won't work if
the outbound interface is not set explicitly.
Line 66 yields "TypeError: can't concat bytes to str" using Python 3.4.
"requestdata" was introduced in 22f08609f1 but is not actually used.
Since the original "request" is not used anywhere else this can be changed
to be similar to the other UTF-8 encoding changes in that commit.
Fixes: 22f08609f1 ("vici: Explicitly set the Python encoding type").
Closesstrongswan/strongswan#66.
When constructing the result, all responses from Netlink were concatenated
iteratively, i.e. for each response, the previously acquired result was
copied to newly allocated memory and the current response appended to it.
This results in O(n^2) copy operations. Instead, we now check for the
total final length of the result and copy the individual responses to it
in one pass, i.e. in O(n) copy operations. In particular, this issue caused
very high CPU usage in memcpy() function as the result is copied over and
over. Common way how to hit the issue is when having 1000+ routes and 5+
connecting clients a second. In that case, the memcpy() function can
take 50%+ of one CPU thread on a decent CPU and the whole charon daemon
is stuck just reading routes and concatenating them together (connecting
clients are blocked in that particular case as this is done under mutex).
Closes strongswan/strongswan#65.
References #2055.
If a the original responder narrows the selectors of its peer in addrblock,
the peer gets a subset of that selectors. However, once the original responder
initiates rekeying of that CHILD_SA, it sends the full selectors to the peer,
and then narrows the received selectors locally for the installation, only.
This is insufficient, as the peer ends up with wider selectors, sending traffic
that the original responder will reject to the stricter IPsec policy. So
additionally narrow the selectors when rekeying CHILD_SAs before sending the
TS list to the peer.
This is different if `ike` and `child` are provided and uninstall()
fails as we call that without knowing whether a matching shunt exists.
But if `ike` is not provided we explicitly search for a matching shunt
and if found don't need to look for a trap policy.
Previously, the client had to propose no wider selectors than the certificate
permits, otherwise the complete CHILD_SA was rejected. However, with IKEv2
we can dynamically narrow the selectors to what the certificate allows. This
makes client and gateway configurations very simple by just proposing 0.0.0.0/0,
narrowed to selectors the client is permitted to route into the network.
This allows a gateway to enforce the addrblock policy on certificates that
actually have the extension only. For (legacy) certificates not having the
extension, traffic selectors are validated/narrowed by other means, most
likely by the configuration.
This way updates to the mediation config are respected and the order in
which configs are configured/loaded does not matter.
The SQL plugin currently maintains the strong relationship between
mediated and mediation connection (we could theoretically change that to a
string too).
The original name is returned in the new "name" attribute.
This fixes an issue with bindings that map VICI messages to
dictionaries. For instance, in roadwarrior scenarios where every
CHILD_SA has the same name only the information of the last CHILD_SA
would end up in the dictionary for that name.
PINs are stored in a "hidden" credential set, so that its shared
secrets are not exposed via VICI. Since they are not explicitly loaded as
shared secrets via VICI a client might consider them as removed secrets and
remove them.
After an interface disappeared we can't remove the policies correctly as
the name doesn't resolve to the previous index anymore.
And making the policies so specific might not provide that much benefit.
To handle the interfaces on the policies correctly would require some
changes to the child-cfg, kernel-interface etc. so they'd take interface
indices directly so we could target the policies correctly even if an
interface disappeared (or reappeared and got a new index).
For table dumps the kernel accepts RTA_PREFSRC to filter the routes, which is
what we do when doing userspace route calculations. For kernel-based route
lookups, however, the RTA_PREFSRC attribute is ignored and we must specify
RTA_SRC for policy based route lookups.
For gateways with many connections, installing routes is often disabled,
as we can use a static route configuration to achieve proper routing with
a single rule. If this is the case, there is no need to dump all routes and
do userspace route lookups, as there is no need to exclude routes we installed
ourself.
Doing kernel-based route lookups is not only faster with may routes, but also
can use the full power of Linux policy based routing; something we can hardly
rebuild in userspace when calculating routes.
When using vici over RPyC and its (awesome) splitbrain, encoding and decoding
strings fails in vici, most likely because of the Monkey-Patch magic splitbrain
uses.
When specifying the implicit UTF-8 as encoding scheme explicitly, Python uses
the correct method to encode/decode the string, making vici useable in
splitbrain contexts.
While trap and regular policies now often look the same (mainly because
reqids are kept constant) trap policies still need to have a lower priority
than regular policies to handle unroute/route correctly if e.g. IPComp
is used or the mode changes. But if we use a completely different
priority range that's lower than that of regular policies it is not possible
to install overlapping trap policies. By differentiating trap from
regular policies via the priority's LSB this issue is avoided while
still maintaining the proper ordering of trap and regular policies.
Fixes#1243.
The Optimistic Duplicate Address Detection (DAD) seems to fail in some
cases (`dadfailed` in `ip addr`) rendering the virtual IP address unusable.
Fixes#2183.
This implements rule 6 of RFC 6724 using the default priority table,
so that e.g. global addresses are preferred over ULAs (which also have
global scope) when the destination is a global address.
Fixes#2138.
By default, the kernel incorrectly uses an 8 byte alignment, which is
mandatory for IPv6 but prohibited for IPv4. For many algorithms this
doesn't matter but that's not the case for HMAC_SHA2_256_128.
Since 2.6.39 the kernel can be explicitly configured to use a 4 byte
alignment.
Otherwise, we'd end up with an empty TS list, which is not valid.
Because end->tohost is set to !end->subnets in starter the removed branch was
never used.
The Python VICI library does not check if the socket is closed.
If the daemon closes the connection, _recvall() spins forever.
Closesstrongswan/strongswan#56.
Fix for "Permission denied (you must be root)" error when calling
iptc_init(), which opens a RAW socket to communicate with the kernel,
when built with "--with-capabilities=libcap".
Closes strongswan/strongswan#53.
Fixes#2157.
A wrong variable is used (route instead of best), so much that the
returned interface belongs to the last seen route instead of the best
choice route.
get_route() may therefore return mismatching interface and gateway.
Fixes: 66e9165bc6 ("kernel-netlink: Return outbound interface in get_nexthop()")
Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>