369 lines
12 KiB
Plaintext
369 lines
12 KiB
Plaintext
%include "default.mgp"
|
|
%default 1 bgrad
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
%nodefault
|
|
%back "blue"
|
|
|
|
%center
|
|
%size 7
|
|
|
|
|
|
The future of Linux packet filtering
|
|
targeted for kernel 2.6 and beyond
|
|
|
|
|
|
%center
|
|
%size 4
|
|
by
|
|
|
|
Harald Welte <laforge@gnumonks.org>
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Future of Linux packet filtering
|
|
Contents
|
|
|
|
|
|
Problems with current 2.4.x netfilter/iptables
|
|
Solution to code replication
|
|
Solution for dynamic rulesets
|
|
Solution for API to GUI's and other management programs
|
|
|
|
HA for stateful firewalling
|
|
What's special about firewalling HA
|
|
Poor man's failover
|
|
Real state replication
|
|
|
|
Other current work
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Future of Linux packet filtering
|
|
Problems with 2.4.x netfilter/iptables
|
|
|
|
code replication between iptables/ip6tables/arptables
|
|
iptables was never meant for other protocols, but people did copy+paste 'ports'
|
|
replication of
|
|
core kernel code
|
|
layer 3 independent matches (mac, interface, ...)
|
|
userspace library (libiptc)
|
|
userspace tool (iptables)
|
|
userspace plugins (libipt_xxx.so)
|
|
doesn't suit the needs for dynamically changing rulesets
|
|
dynamic rulesets becomming more common due (service selection, IDS)
|
|
a whole table is created in userspace and sent as blob to kernel
|
|
for every ruleset the table needs to be copied to userspace and back
|
|
inside kernel consistency checks on whole table, loop detection
|
|
too extensible for writing any forward-compatible GUI
|
|
new extensions showing up all the time
|
|
a frontend would need to know about the options and use of a new extension
|
|
thus frontends are always incomplete and out-of-date
|
|
no high-level API other than piping to iptables-restore
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Future of Linux packet filtering
|
|
Reducing code replication
|
|
|
|
code replication is a real problem: unclean, bugfixes missed
|
|
we need layer 3 independent layer for
|
|
submitting rules to the kernel
|
|
traversing packet-rulesets supporting match/target modules
|
|
registering matches/targets
|
|
layer 3 specific (like matching ipv4 address)
|
|
layer 3 independent (like matching MAC address)
|
|
|
|
solution
|
|
pkt_tables inside kernel
|
|
pkt_tables_ipv4 registers layer 3 handler with pkt_tables
|
|
pkt_tables_ipv6 registers layer 3 handler with pkt_tables
|
|
everybody registering a pkt_table (like iptable_filter) needs to specify the l3 protocol
|
|
libraries in userspace (see later)
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Future of Linux packet filtering
|
|
Supporting dynamic rulesets
|
|
|
|
atomic table-replacement turned out to be bad idea
|
|
need new interface for sending individual rules to kernel
|
|
policy routing has the same problem and good solution: rtnetlink
|
|
solution: nfnetlink
|
|
multicast-netlink based packet-orinented socket between kernel and userspace
|
|
has extra benefit that other userspace processes get notified of rule changes [just like routing daemons]
|
|
nfnetlink is a low-layer below all kernel/userspace communication
|
|
pkttnetlink [aka iptnetlink]
|
|
ctnetlink
|
|
ulog
|
|
ip_queue
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Future of Linux packet filtering
|
|
Communication with other programs
|
|
|
|
whole set of libraries
|
|
libnfnetlink for low-layer communication
|
|
libpkttnetlink for rule modifications
|
|
will handle all plugins [which are currently part of iptables]
|
|
query functions about avaliable matches/targets
|
|
query functions about parameters
|
|
query functions for help messages about specific match/parameter of a match
|
|
generic structure from which rules can be built
|
|
conversion functions to parse generic structure into in-kernel structure
|
|
conversion functiosn to perse kernel structure into generic structure
|
|
functions to convert generic structure in plain text
|
|
libipq will stay API-compatible to current version
|
|
libipulog will stay API-compatible to current version
|
|
libiptc will go away [compatibility layer extremely difficult]
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
HA for netfillter/iptables
|
|
Introduction
|
|
|
|
What is special about firewall failover?
|
|
|
|
Nothing, in case of the stateless packet filter
|
|
Common IP takeover solutions can be used
|
|
VRRP
|
|
Hartbeat
|
|
|
|
Distribution of packet filtering ruleset no problem
|
|
can be done manually
|
|
or implemented with simple userspace process
|
|
|
|
Problems arise with stateful packet filters
|
|
Connection state only on active node
|
|
NAT mappings only on active node
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
HA for netfillter/iptables
|
|
Connection Tracking Subsystem
|
|
|
|
Connection tracking...
|
|
implemented seperately from NAT
|
|
enables stateful filtering
|
|
implementation
|
|
hooks into NF_IP_PRE_ROUTING to track packets
|
|
hooks into NF_IP_POST_ROUTING and NF_IP_LOCAL_IN to see if packet passed filtering rules
|
|
protocol modules (currently TCP/UDP/ICMP)
|
|
application helpers currently (FTP,IRC,H.323,talk,SNMP)
|
|
divides packets in the following four categories
|
|
NEW - would establish new connection
|
|
ESTABLISHED - part of already established connection
|
|
RELATED - is related to established connection
|
|
INVALID - (multicast, errors...)
|
|
does _NOT_ filter packets itself
|
|
can be utilized by iptables using the 'state' match
|
|
is used by NAT Subsystem
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
HA for netfillter/iptables
|
|
Connection Tracking Subsystem
|
|
|
|
Common structures
|
|
struct ip_conntrack_tuple, representing unidirectional flow
|
|
layer 3 src + dst
|
|
layer 4 protocol
|
|
layer 4 src + dst
|
|
|
|
|
|
connetions represented as struct ip_conntrack
|
|
original tuple
|
|
reply tuple
|
|
timeout
|
|
l4 state private data
|
|
app helper
|
|
app helper private data
|
|
expected connections
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
HA for netfillter/iptables
|
|
Connection Tracking Subsystem
|
|
|
|
Flow of events for new packet
|
|
packet enters NF_IP_PRE_ROUTING
|
|
tuple is derived from packet
|
|
lookup conntrack hash table with hash(tuple) -> fails
|
|
new ip_conntrack is allocated
|
|
fill in original and reply == inverted(original) tuple
|
|
initialize timer
|
|
assign app helper if applicable
|
|
see if we've been expected -> fails
|
|
call layer 4 helper 'new' function
|
|
|
|
...
|
|
|
|
packet enters NF_IP_POST_ROUTING
|
|
do hashtable lookup for packet -> fails
|
|
place struct ip_conntrack in hashtable
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
HA for netfillter/iptables
|
|
Connection Tracking Subsystem
|
|
|
|
Flow of events for packet part of existing connection
|
|
packet enters NF_IP_PRE_ROUTING
|
|
tuple is derived from packet
|
|
lookup conntrack hash table with hash(tuple)
|
|
assosiate conntrack entry with skb->nfct
|
|
call l4 protocol helper 'packet' function
|
|
do l4 state tracking
|
|
update timeouts as needed [i.e. TCP TIME_WAIT,...]
|
|
|
|
...
|
|
|
|
packet enters NF_IP_POST_ROUTING
|
|
do hashtable lookup for packet -> succeds
|
|
do nothing else
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
HA for netfillter/iptables
|
|
Poor man's failover
|
|
|
|
principle
|
|
every node does it's own tracking, no state replicating
|
|
two possible implementations
|
|
connect every node to shared media (i.e. real ethernet)
|
|
forwarding only turned on on active node
|
|
slave nodes use promiscuous mode to sniff packets
|
|
copy all traffic to slave nodes
|
|
active master needs to copy all traffic to other nodes
|
|
disadvantage: high load, sync traffic == payload traffic
|
|
advantages
|
|
very easy implementation
|
|
only addition of sniffing mode to conntrack needed
|
|
existing means of address takeover can be used
|
|
same load on active master and slave nodes
|
|
disadvantages
|
|
can only be used with real shared media (no switches, ...)
|
|
can not be used with NAT
|
|
remaining problem
|
|
no initial state sync after reboot of slave node!
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
HA for netfillter/iptables
|
|
Real state replication
|
|
|
|
Parts needed
|
|
state replication protocol
|
|
multicast based
|
|
sequence numbers for detection of packet loss
|
|
NACK-based retransmission
|
|
no security, since private ethernet segment to be used
|
|
event interface on active node
|
|
calling out to callback function at all state changes
|
|
exported interface to manipulate conntrack hash table
|
|
kernel thread for sending conntrack state protocol messages
|
|
registers with event interface
|
|
creates and accumulates state replication packets
|
|
sends them via in-kernel sockets api
|
|
kernel thread for receiving conntrack state replication messages
|
|
receives state replication packets via in-kernel sockets
|
|
uses conntrack hashtable manipulation interface
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
HA for netfillter/iptables
|
|
Real state replication
|
|
|
|
Flow of events in chronological order:
|
|
on active node, inside the network RX softirq
|
|
connection tracking code is analyzing a forwarded packet
|
|
connection tracking gathers some new state information
|
|
connection tracking updates local connection tracking database
|
|
connection tracking sends event message to event API
|
|
on active node, inside the conntrack-sync kernel thread
|
|
conntrack sync daemon receives event through event API
|
|
conntrack sync daemon aggregates multiple event messages into a state replication protocol message, removing possible redundancy
|
|
conntrack sync daemon generates state replication protocol message
|
|
conntrack sync daemon sends state replication protocol message
|
|
on slave node(s), inside network RX softirq
|
|
connection tracking code ignores packets coming from the interface attached to the private conntrac sync network
|
|
state replication protocol messages is appended to socket receive queue of conntrack-sync kernel thread
|
|
on slave node(s), inside conntrack-sync kernel thread
|
|
conntrack sync daemon receives state replication message
|
|
conntrack sync daemon creates/updates conntrack entry
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
HA for netfillter/iptables
|
|
Neccessary changes to kernel
|
|
|
|
Neccessary changes to current conntrack core
|
|
|
|
event generation (callback functions) for all state changes
|
|
|
|
conntrack hashtable manipulation API
|
|
is needed (and already implemented) for 'ctnetlink' API
|
|
|
|
conntrack exemptions
|
|
needed to _not_ track conntrack state replication packets
|
|
is needed for other cases as well
|
|
currently being developed by Jozsef Kadlecsik
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
HA for netfillter/iptables
|
|
Other current work
|
|
|
|
conntrack hash function optimization
|
|
current hash function not good for even hash bucket count
|
|
other hash functions in development
|
|
hash function evaluation tool [cttest] avaliable
|
|
introduce per-system randomness to prevent hash attack
|
|
conntrack code optimization (locking/timers/...)
|
|
conntrack exemptions
|
|
not useable when NAT is active
|
|
SLOLG (session log)
|
|
maybe netflow compatible logs?
|
|
getting our work submitted into the mainstream kernel
|
|
turns out to be more difficult as expected
|
|
newnat has finally made it into 2.4.19
|
|
discussions about multiple targets/actions per rule
|
|
technical implementation easy
|
|
however, not everybody convinced that it fits into the concept
|
|
using tc for firewalling
|
|
Jamal Hadi Selim uses iptables targets from within TC
|
|
leads to discussion of generic classification engine API in kernel
|
|
netfilter for MPLS
|
|
implementation of mpls-ping-draft as netfilter module
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Future of Linux packet filtering
|
|
Thanks
|
|
The slides and the an according paper of this presentation are available at http://www.gnumonks.org/
|
|
The netfilter homepage http://www.netfilter.org/
|
|
Thanks to
|
|
the BBS people, Z-Netz, FIDO, ...
|
|
for heavily increasing my computer usage in 1992
|
|
KNF (http://www.franken.de/)
|
|
for bringing me in touch with the internet as early as 1994
|
|
for providing a playground for technical people
|
|
for telling me about the existance of Linux!
|
|
Alan Cox, Alexey Kuznetsov, David Miller, Andi Kleen
|
|
for implementing (one of?) the world's best TCP/IP stacks
|
|
Paul 'Rusty' Russell
|
|
for starting the netfilter/iptables project
|
|
for trusting me to maintain it today
|
|
Astaro AG
|
|
for sponsoring parts of my netfilter work
|
|
for sponsoring my flight ticket to this conference
|
|
|