laforge-slides/2004/netfilter-failover-lk2004/netfilter-failover-lk2004.mgp

370 lines
11 KiB
Plaintext

%include "default.mgp"
%default 1 bgrad
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
%nodefault
%back "blue"
%center
%size 7
How to replicate the fire
HA for netfilter-based firewalls
%center
%size 4
by
Harald Welte <laforge@netfilter.org>
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Contents
Introduction
Connection Tracking Subsystem
Packet selection based on IP Tables
The Connection Tracking Subsystem
The NAT Subsystem
Poor man's failover
Real state replication
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Introduction
What is special about firewall failover?
Nothing, in case of the stateless packet filter
Common IP takeover solutions can be used
VRRP
Heartbeat
Distribution of packet filtering ruleset no problem
can be done manually
or implemented with simple userspace process
Problems arise with stateful packet filters
Connection state only on active node
NAT mappings only on active node
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Connection Tracking Subsystem
Connection tracking...
enables stateful filtering
implementation
hooks into netfilter to track packets
protocol modules (currently TCP/UDP/ICMP)
application helpers currently (FTP,IRC,H.323,talk,SNMP)
divides packets in the following four categories
NEW - would establish new connection
ESTABLISHED - part of already established connection
RELATED - is related to established connection
INVALID - (multicast, errors...)
does _NOT_ filter packets itself
can be utilized by iptables using the 'state' match
is used by NAT Subsystem
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Connection Tracking Subsystem
Common structures
struct ip_conntrack_tuple, representing unidirectional flow
layer 3 src + dst
layer 4 protocol
layer 4 src + dst
connections represented as struct ip_conntrack
original tuple
reply tuple
timeout
l4 state private data
app helper
app helper private data
expected connections
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Connection Tracking Subsystem
Flow of events for new packet
packet enters NF_IP_PRE_ROUTING
tuple is derived from packet
lookup conntrack hash table with hash(tuple) -> fails
new ip_conntrack is allocated
fill in original and reply == inverted(original) tuple
initialize timer
assign app helper if applicable
see if we've been expected -> fails
call layer 4 helper 'new' function
...
packet enters NF_IP_POST_ROUTING
do hashtable lookup for packet -> fails
place struct ip_conntrack in hashtable
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Connection Tracking Subsystem
Flow of events for packet part of existing connection
packet enters NF_IP_PRE_ROUTING
tuple is derived from packet
lookup conntrack hash table with hash(tuple)
associate conntrack entry with skb->nfct
call l4 protocol helper 'packet' function
do l4 state tracking
update timeouts as needed [i.e. TCP TIME_WAIT,...]
...
packet enters NF_IP_POST_ROUTING
do hashtable lookup for packet -> succeds
do nothing else
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Network Address Translation
Overview
Previous Linux Kernels only implemented one special case of NAT: Masquerading
Linux 2.4.x can do any kind of NAT.
NAT subsystem implemented on top of netfilter, iptables and conntrack
NAT subsystem registers with all five netfilter hooks
'nat' Table registers chains PREROUTING, POSTROUTING and OUTPUT
Following targets available within 'nat' Table
SNAT changes the packet's source while passing NF_IP_POST_ROUTING
DNAT changes the packet's destination while passing NF_IP_PRE_ROUTING
MASQUERADE is a special case of SNAT
REDIRECT is a special case of DNAT
NAT bindings determined only for NEW packet and saved in ip_conntrack
Further packets within connection NATed according NAT bindings
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Poor man's failover
Poor man's failover
principle
let every node do its own tracking rather than replicating state
two possible implementations
connect every node to shared media (i.e. real ethernet)
forwarding only turned on on active node
slave nodes use promiscuous mode to sniff packets
copy all traffic to slave nodes
active master needs to copy all traffic to other nodes
disadvantage: high load, sync traffic == payload traffic
IMHO stupid way of solving the problem
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Poor man's failover
Poor man's failover
advantages
very easy implementation
only addition of sniffing mode to conntrack needed
existing means of address takeover can be used
same load on active master and slave nodes
no additional load on active master
disadvantages
can only be used with real shared media (no switches, ...)
can not be used with NAT
remaining problem
no initial state sync after reboot of slave node!
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication (ct_sync)
Real state replication (ct_sync)
characteristics
replicates state changes from active master to slave(s)
seperate shared ethernet segment for sync
advantages
can be used with any network media
works with NAT
initial sync after new slave is introduced
problems
complex implementation
current limitations
no replication of connection relations (ftp/h.323/...)
current problems
bugs, bugs, bugs
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication (ct_sync)
Required parts
state replication protocol
multicast based
sequence numbers for detection of packet loss
NACK-based retransmission
no security, since private ethernet segment to be used
event interface on active node
calling out to callback function at all state changes
exported interface to manipulate conntrack hash table
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication (ct_sync)
Required parts
kernel thread for sending conntrack state protocol messages
registers with event interface
creates and accumulates state replication packets
sends them via in-kernel sockets api
kernel thread for receiving conntrack state replication messages
receives state replication packets via in-kernel sockets
uses conntrack hashtable manipulation interface
kernel thread for initial or full re-sync
sends full conntrack table with fixed speed
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication
Flow of events in chronological order:
on active node, inside the network RX softirq
connection tracking code is analyzing a forwarded packet
connection tracking gathers some new state information
connection tracking updates local connection tracking database
connection tracking sends event message to event API
function registered at event API enqueues message to send ring
on active node, inside the conntrack-sync kernel thread
conntrack sync daemon aggregates multiple event messages into a state replication protocol message, removing possible redundancy
conntrack sync daemon dequeues packets from ring
conntrack sync daemon sends state replication protocol packet via in-kernel sockets
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication
Flow of events in chronological order:
on slave node(s), inside network RX softirq
connection tracking code ignores packets coming from the interface attached to the private conntrac sync network
state replication protocol messages is appended to socket receive queue of conntrack-sync kernel thread
on slave node(s), inside conntrack-sync kernel thread
conntrack sync daemon receives state replication message
conntrack sync daemon creates/updates conntrack entry
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication
Neccessary changes to conntrack core
event generation (callback functions) for all state changes
is needed (and already implemented) for 'ctnetlink' API
conntrack hashtable manipulation API
is needed (and already implemented) for 'ctnetlink' API
conntrack exemptions
needed to _not_ track conntrack state replication packets
is needed for other cases as well (raw table / NOTRACK target)
works by
layer two packet drop (l2netfilter hooks)
disables any incoming or outgoing packets on other than the sync device on slave nodes
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Usage
To set up a conntrack cluster you need
hardware
two firewalls with identical iptables rulesets
all ethernet interfaces (internal, dmz, external) connected to both nodes
seperate network segment for conntrack sync device
software
configure any working ip address range/subnet to sync device
assign every node a unique node id (0..255)
decide which of the nodes is master, which slave
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Usage
To set up a conntrack cluster you need
configuration on master
first: modprobe ct_sync syncdev=ethX state=1 id=1 l2drop=1
second: configure your 'real' devices (internal, external)
configuration on slave
modprobe ct_sync syncdev=ethX state=0 id=2 l2drop=1
second: configure your 'real' devices (internal, external)
after loading ct_sync with l2drop=1, a slave node will be invisible on the 'real' networks. ssh access is only possible via sync device
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Usage
Cluster manager
set up a cluster manager with some heartbeat mechanism
configure it to run the following command on a slave that is to be propagated to master:
echo "1" > /proc/net/ct_sync
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Thanks
Thanks to
the BBS scenee, Z-Netz, FIDO, ...
for heavily increasing my computer usage in 1992
KNF
for bringing me in touch with the internet as early as 1994
for providing a playground for technical people
for introducing me to the existance of Linux!
Alan Cox, Alexey Kuznetsov, David Miller, Andi Kleen
for implementing (one of?) the world's best TCP/IP stacks
Paul 'Rusty' Russell
for starting the netfilter/iptables project
for trusting me to maintain it today
Astaro AG
for sponsoring my netfilter failover work
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Availability of slides / Links
The code
http://cvs.netfilter.org/netfilter-ha/ct_sync
The slides
http://www.gnumonks.org/
The netfilter homepage
http://www.netfilter.org/
Astaro AG
http://www.astaro.com/