Until now the configuration available to user for HW offload were:
hw_offload = no
hw_offload = yes
With this commit users will be able to configure auto mode using:
hw_offload = auto
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Until now there were 2 hw_offload modes: no/yes
* hw_offload = no : Configure the SA without HW offload.
* hw_offload = yes : Configure the SA with HW offload.
In this case, if the device does not support
offloading, SA creation will fail.
This commit introduces a new mode: hw_offload = auto
----------------------------------------------------
If the device and kernel support HW offload, configure
the SA with HW offload, but do not fail SA creation otherwise.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Reqids for the same traffic selectors are now stable so we don't have to
pass reqids of previously installed CHILD_SAs. Likewise, we don't need
to know the reqid of the newly installed trap policy as we now uninstall
by name.
With IKEv1 we transmit both public DH factors (used to derive the initial
IV) besides the shared secret. So these messages could get significantly
larger than 1024 bytes, depending on the DH group (modp2048 just about
fits into it). The new default of 2048 bytes should be fine up to modp4096
and for larger groups the buffer size may be increased (an error is
logged should this happen).
This can be useful if routing rules (instead of e.g. route metrics) are used
to switch from one to another interface (i.e. from one to another
routing table). Since we currently don't evaluate routing rules when
doing the route lookup this is only useful if the kernel-based route
lookup is used.
Resolvesstrongswan/strongswan#88.
If users want to associate secrets with any identity, let 'em. This is
also possible with vici and might help if e.g. the remote identity is
actually %any as that would match a PSK with local IP and %any better
than one with local and different remote IP.
Fixes#2497.
libstrongswan and kernel-netlink are the only two components which do
not adhere to the naming scheme used for all other tests. If the tests
are run by an external application this imposes problems due to clashing
names.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
If enabled, add the RADIUS Class attributes received in Access-Accept messages
to RADIUS accounting messages as suggested by RFC 2865 section 5.25.
Fixes#2451.
It seems that there is a race, at least in 10.13, that lets
if_indextoname() fail for the new TUN device. So we delay the call a bit,
which seems to "fix" the issue. It's strange anyway that the previous
delay was only applied when an iface entry was already found.
The value of DHCP_OPTEND is 255. When it is assigned this result in a
sign change as the positive int constant is cast to a signed char and -1
results. Clang 4.0 complains about this.
If an interface is renamed we already have an entry (based on the
ifindex) allocated but previously only set the usable state once
based on the original name.
Fixes#2403.
When querying SAs the keys will end up in this buffer (the allocated
messages that are returned are already wiped). The kernel also returns
XFRM_MSG_NEWSA as response to XFRM_MSG_ALLOCSPI but we can't distinguish
this here as we only see the response.
References #2388.
This prevented new listeners from receiving notifies if they joined
after another listener disconnected previously, and if they themselves
disconnected their old connection would prevent them again from getting
notifies.
Multiple CHILD_SAs sharing the same traffic selectors (e.g. during
make-before-break reauthentication) also have the same reqid assigned.
If all matching entries are removed we could end up without entry even
though an SA exists that still uses these traffic selectors.
Fixes#2373.
This is similar to the eap-aka-3gpp2 plugin. K (optionally concatenated
with OPc) may be configured as binary EAP secret in ipsec.secrets or
swanctl.conf.
Based on a patch by Thomas Strangert.
Fixes#2326.
Interestingly, this doesn't show up in the regression tests because the
compiler removes the first assignment (and thus the allocation) due to
-O2 that's included in our default CFLAGS.
By using the total retransmit timeout, modifications of timeout settings
automatically reflect on the value of xfrm_acq_expires. If set, the
value of xfrm_acq_expires configured by the user takes precedence over
the calculated value.
When establishing a traffic-triggered CHILD_SA involves the setup of an
IKE_SA more than one exchange is required. As a result the temporary
acquire state may have expired -- even if the acquire expiration
(xfrm_acq_expires) time is set properly (165 by default). The expire
message sent by the kernel is not processed in charon since no trap can
be found by the trap manager.
A possible solution could be to track allocated SPIs. But since this is
a corner case and the tracking introduces quite a bit of overhead, it
seems much more sensible to add a new state if the update of a state
fails with NOT_FOUND.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Upcoming FreeBSD kernels will support updating the addresses of existing
SAs with new SADB_X_EXT_NEW_ADDRESS_SRC|DST extensions for the SADB_UPDATE
message.
During initialization of the plugins the thread pool is not yet
initialized so there is no watcher thread that could handle the queued
Netlink message and the main thread will wait indefinitely for a
response.
Fixes#2199.
On Linux, setting the source address is insufficient to force a packet to be
sent over a certain path. The kernel uses the best route to select the outgoing
interface, even if we set a source address of a lower priority interface. This
is not only true for interfaces attaching to the same subnet, but also for
unrelated interfaces; the kernel (at least on 4.7) sends out the packet on
whatever interface it sees fit, even if that network does not expect packets
from the source address we force to.
When a better interface becomes available, strongSwan sends its MOBIKE address
list update using the old source address. But the kernel sends that packet over
the new best interface. If that network drops packets having the unexpected
source address from the old path, the MOBIKE update fails and the SA finally
times out.
To enforce a specific interface for our packet, we explicitly set the interface
index from the interface where the source address is installed. According to
ip(7), this overrules the specified source address to the primary interface
address. As this could have side effects to installations using multiple
addresses on a single interface, we disable the option by default for now.
This also allows using IPv6 link-local addresses, which won't work if
the outbound interface is not set explicitly.
Line 66 yields "TypeError: can't concat bytes to str" using Python 3.4.
"requestdata" was introduced in 22f08609f1 but is not actually used.
Since the original "request" is not used anywhere else this can be changed
to be similar to the other UTF-8 encoding changes in that commit.
Fixes: 22f08609f1 ("vici: Explicitly set the Python encoding type").
Closesstrongswan/strongswan#66.
When constructing the result, all responses from Netlink were concatenated
iteratively, i.e. for each response, the previously acquired result was
copied to newly allocated memory and the current response appended to it.
This results in O(n^2) copy operations. Instead, we now check for the
total final length of the result and copy the individual responses to it
in one pass, i.e. in O(n) copy operations. In particular, this issue caused
very high CPU usage in memcpy() function as the result is copied over and
over. Common way how to hit the issue is when having 1000+ routes and 5+
connecting clients a second. In that case, the memcpy() function can
take 50%+ of one CPU thread on a decent CPU and the whole charon daemon
is stuck just reading routes and concatenating them together (connecting
clients are blocked in that particular case as this is done under mutex).
Closes strongswan/strongswan#65.
References #2055.
If a the original responder narrows the selectors of its peer in addrblock,
the peer gets a subset of that selectors. However, once the original responder
initiates rekeying of that CHILD_SA, it sends the full selectors to the peer,
and then narrows the received selectors locally for the installation, only.
This is insufficient, as the peer ends up with wider selectors, sending traffic
that the original responder will reject to the stricter IPsec policy. So
additionally narrow the selectors when rekeying CHILD_SAs before sending the
TS list to the peer.
This is different if `ike` and `child` are provided and uninstall()
fails as we call that without knowing whether a matching shunt exists.
But if `ike` is not provided we explicitly search for a matching shunt
and if found don't need to look for a trap policy.
Previously, the client had to propose no wider selectors than the certificate
permits, otherwise the complete CHILD_SA was rejected. However, with IKEv2
we can dynamically narrow the selectors to what the certificate allows. This
makes client and gateway configurations very simple by just proposing 0.0.0.0/0,
narrowed to selectors the client is permitted to route into the network.
This allows a gateway to enforce the addrblock policy on certificates that
actually have the extension only. For (legacy) certificates not having the
extension, traffic selectors are validated/narrowed by other means, most
likely by the configuration.
This way updates to the mediation config are respected and the order in
which configs are configured/loaded does not matter.
The SQL plugin currently maintains the strong relationship between
mediated and mediation connection (we could theoretically change that to a
string too).
The original name is returned in the new "name" attribute.
This fixes an issue with bindings that map VICI messages to
dictionaries. For instance, in roadwarrior scenarios where every
CHILD_SA has the same name only the information of the last CHILD_SA
would end up in the dictionary for that name.
PINs are stored in a "hidden" credential set, so that its shared
secrets are not exposed via VICI. Since they are not explicitly loaded as
shared secrets via VICI a client might consider them as removed secrets and
remove them.
After an interface disappeared we can't remove the policies correctly as
the name doesn't resolve to the previous index anymore.
And making the policies so specific might not provide that much benefit.
To handle the interfaces on the policies correctly would require some
changes to the child-cfg, kernel-interface etc. so they'd take interface
indices directly so we could target the policies correctly even if an
interface disappeared (or reappeared and got a new index).
For table dumps the kernel accepts RTA_PREFSRC to filter the routes, which is
what we do when doing userspace route calculations. For kernel-based route
lookups, however, the RTA_PREFSRC attribute is ignored and we must specify
RTA_SRC for policy based route lookups.
For gateways with many connections, installing routes is often disabled,
as we can use a static route configuration to achieve proper routing with
a single rule. If this is the case, there is no need to dump all routes and
do userspace route lookups, as there is no need to exclude routes we installed
ourself.
Doing kernel-based route lookups is not only faster with may routes, but also
can use the full power of Linux policy based routing; something we can hardly
rebuild in userspace when calculating routes.
When using vici over RPyC and its (awesome) splitbrain, encoding and decoding
strings fails in vici, most likely because of the Monkey-Patch magic splitbrain
uses.
When specifying the implicit UTF-8 as encoding scheme explicitly, Python uses
the correct method to encode/decode the string, making vici useable in
splitbrain contexts.
While trap and regular policies now often look the same (mainly because
reqids are kept constant) trap policies still need to have a lower priority
than regular policies to handle unroute/route correctly if e.g. IPComp
is used or the mode changes. But if we use a completely different
priority range that's lower than that of regular policies it is not possible
to install overlapping trap policies. By differentiating trap from
regular policies via the priority's LSB this issue is avoided while
still maintaining the proper ordering of trap and regular policies.
Fixes#1243.
The Optimistic Duplicate Address Detection (DAD) seems to fail in some
cases (`dadfailed` in `ip addr`) rendering the virtual IP address unusable.
Fixes#2183.
This implements rule 6 of RFC 6724 using the default priority table,
so that e.g. global addresses are preferred over ULAs (which also have
global scope) when the destination is a global address.
Fixes#2138.
By default, the kernel incorrectly uses an 8 byte alignment, which is
mandatory for IPv6 but prohibited for IPv4. For many algorithms this
doesn't matter but that's not the case for HMAC_SHA2_256_128.
Since 2.6.39 the kernel can be explicitly configured to use a 4 byte
alignment.
Otherwise, we'd end up with an empty TS list, which is not valid.
Because end->tohost is set to !end->subnets in starter the removed branch was
never used.
The Python VICI library does not check if the socket is closed.
If the daemon closes the connection, _recvall() spins forever.
Closesstrongswan/strongswan#56.
Fix for "Permission denied (you must be root)" error when calling
iptc_init(), which opens a RAW socket to communicate with the kernel,
when built with "--with-capabilities=libcap".
Closes strongswan/strongswan#53.
Fixes#2157.
A wrong variable is used (route instead of best), so much that the
returned interface belongs to the last seen route instead of the best
choice route.
get_route() may therefore return mismatching interface and gateway.
Fixes: 66e9165bc6 ("kernel-netlink: Return outbound interface in get_nexthop()")
Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
The kernel will apply the mask to the mark on the packet and then
compare it to the configured mark. So to match only unmarked packets we
have to be able to set 0/0xffffffff.
If the number of flows over a gateway exceeds the flow cache size of the Linux
kernel, policy lookup gets very expensive. Policies covering more than a single
address don't get hash-indexed by default, which results in wasting most of
the cycles in xfrm_policy_lookup_bytype() and its xfrm_policy_match() use.
Starting with several hundred policies the overhead gets inacceptable.
Starting with Linux 3.18, Linux can hash the first n-bit of a policy subnet
to perform indexed lookup. With correctly chosen netbits, this can completely
eliminate the performance impact of policy lookups, freeing the resources
for ESP crypto.
WARNING: Due to a bug in kernels 3.19 through 4.7, the kernel crashes with a
NULL pointer dereference if a socket policy is installed while hash thresholds
are changed. And because the hashtable rebuild triggered by the threshold
change that causes this is scheduled it might also happen if the socket
policies are seemingly installed after setting the thresholds.
The fix for this bug - 6916fb3b10b3 ("xfrm: Ignore socket policies when
rebuilding hash tables") - is included since 4.8 (and might get backported).
As a workaround `charon.plugins.kernel-netlink.port_bypass` may be enabled
to replace the socket policies that allow IKE traffic with port specific
bypass policies.
When fresh CRLs are released with a high update frequency (e.g.
every 24 hours) or OCSP is used then the certificate cache gets
quickly filled with stale CRLs or OCSP responses. The new VICI
flush-certs command allows to flush e.g. cached CRLs or OCSP
responses only. Without the type argument all kind of certificates
(e.g. also received end entity and intermediate CA certificates)
are purged.
fgetc() returns an int and EOF is usually -1 so when this gets casted to
a char the result depends on whether `char` means `signed char` or
`unsigned char` (the C standard does not specify it). If it is unsigned
then its value is 0xff so the comparison with EOF will fail as that is an
implicit signed int.
This fixes DNS server installation if make-before-break reauthentication
is used as there the new SA and DNS server is installed before it then
is removed again when the old IKE_SA is torn down.
If running resolvconf fails handle() fails release() is not called, which
might leave an interface file on the system (or depending on which script
called by resolvconf actually failed even the installed DNS server).
We don't need them for drop policies and they might even mess with other
routes we install. Routes for policies with protocol/ports in the
selector will always be too broad and might conflict with other routes
we install.
Using the source address to determine the interface is not correct for
net-to-net shunts between two interfaces on which the host has IP addresses
for each subnet.
Other threads are free to add/update/delete other policies.
This tries to prevent race conditions caused by releasing the mutex while
sending messages to the kernel. For instance, if break-before-make
reauthentication is used and one thread on the responder is delayed in
deleting the policies that another thread is concurrently adding for the
new SA. This could have resulted in no policies being installed
eventually.
Fixes#1400.
If a pseudonym changed a new entry was added to the table storing
permanent identity objects (that are used as keys in the other table).
However, the old mapping was not removed while replacing the mapping in
the pseudonym table caused the old pseudonym to get destroyed. This
eventually caused crashes when a new pseudonym had the same hash value as
such a defunct entry and keys had to be compared.
Fixesstrongswan/strongswan#46.
The versioning scheme used by Python (PEP 440) supports the rcN suffix
but development releases have to be named devN, not drN, which are
not supported and considered legacy versions.
After adding the read callback the state is WATCHER_QUEUED and it is
switched to WATCHER_RUNNING only later by an asynchronous job. This means
that a thread that sent a Netlink message shortly after registration
might see the state as WATCHER_QUEUED. If it then tries to read the
response and the watcher thread is quicker to actually read the message
from the socket, it could block on recv() while still holding the lock.
And the asynchronous job that actually read the message and tries to queue
it will block while trying to acquire the lock, so we'd end up in a deadlock.
This is probably mostly a problem in the unit tests.
Metrics are basically defined to order routes with equal prefix, so ordering
routes by metric first makes not much sense as that could prefer totally
unspecific routes over very specific ones.
For instance, the previous code did break installation of routes for
passthrough policies with two routes like these in the main routing table:
default via 192.168.2.1 dev eth0 proto static
192.168.2.0/24 dev eth0 proto kernel scope link src 192.168.2.10 metric 1
Because the default route has no metric set (0) it was used, instead of the
more specific other one, to determine src and next hop when installing a route
for a passthrough policy for 192.168.2.0/24. Therefore, the installed route
in table 220 did then incorrectly redirect all local traffic to "next hop"
192.168.2.1.
The same issue occurred when determining the source address while
installing trap policies.
Fixes 6b57790270 ("kernel-netlink: Respect kernel routing priorities for IKE routes").
Fixes#1416.
This allows using manual priorities for traps, which have a lower
base priority than the resulting IPsec policies. This could otherwise
be problematic if, for example, swanctl --install/uninstall is used while
an SA is established combined with e.g. IPComp, where the trap policy does
not look the same as the IPsec policy (which is now otherwise often the case
as the reqids stay the same).
It also orders policies by selector size if manual priorities are configured
and narrowing occurs.
The kernel policy now considers src and dst port masks as well as
restictions to a given network interface. The base priority is
100'000 for passthrough shunts, 200'000 for IPsec policies,
300'000 for IPsec policy traps and 400'000 for fallback drop shunts.
The values 1..30'000 can be used for manually set priorities.
This allows two CHILD_SAs with reversed subnets to install two FWD
policies each. Since the outbound policy won't have a reqid set we will
end up with the two inbound FWD policies installed in the kernel, with
the correct templates to allow decrypted traffic.
We can't set a template on the outbound FWD policy (or we'd have to make
it optional). Because if the traffic does not come from another (matching)
IPsec tunnel it would get dropped due to the template mismatch.
This allows us to install more than one FWD policy. We already do this
in the kernel-pfkey plugin (there the original reason was that not all
kernels support FWD policies).
Running or undoing start actions might require enumerating IKE_SAs,
which in turn might have to enumerate peer configs concurrently, which
requires acquiring a read lock. So if we keep holding the write lock while
enumerating the SAs we provoke a deadlock.
By preventing other threads from acquiring the write lock while handling
actions, and thus preventing the modification of the configs, we largely
maintain the current synchronous behavior. This way we also don't need to
acquire additional refs for config objects as they won't get modified/removed.
Fixes#1185.
This allows e.g. modified versions of xl2tpd to set the mark in
situations where two clients are using the same source port behind the
same NAT, which CONNMARK can't restore properly as only one conntrack entry
will exist with the mark set to that of the client that sent the last packet.
Fixes#1230.
By settings a matchmask that covers the complete rule we ensure that the
correct rule is deleted (i.e. matches and targets with potentially different
marks are also compared).
Since data after the passed pointer is actually dereferenced when
comparing we definitely have to pass an array that is at least as long as
the ipt_entry.
Fixes#1229.
This also restores the charon.signature_authentication_constraints
functionality, that is, if no explicit IKE signature schemes are
configured we apply all regular signature constraints as IKE constraints.