241 lines
8.5 KiB
C++
241 lines
8.5 KiB
C++
--author Harald Welte <laforge@netfilter.org>
|
|
--title What's been happening in the netfilter world
|
|
--date 16 Jul 2005
|
|
This is an overview about what has been going on in the netfilter world recently. The main purpose is to keep the rest of the linux kenrel networking crowd informed.
|
|
--footer This presentation is made with tpp http://synflood.at/tpp.html
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header Overview
|
|
rustynat
|
|
nfnetlink
|
|
ctnetlink
|
|
flow-based accounting
|
|
conntrack tool
|
|
helpers (pptp, h.323, sip)
|
|
pkttables
|
|
ipset
|
|
ct_sync
|
|
transparent proxies
|
|
misc
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header rustynat
|
|
Three years ago, the "newnat" design was adopted as architecture and API for conntrack/nat helpers. This is what most people are using, and what's in kernel 2.4.x and 2.6.x (for x < 11).
|
|
|
|
In 2.6.11, a new scheme (which I call "rustynat") was integrated.
|
|
|
|
Fundamental changes:
|
|
struct ip_conntrack no longer has sibling_list
|
|
struct ip_conntrack_expect is killed when expected conntrack comes in
|
|
NAT helpers are now called by callback functions from conntrack helpers
|
|
cleanup of NAT manip data structures to reduce size of ip_conntrack
|
|
|
|
Problems:
|
|
All existing helpers need to be ported (non-trivial port)
|
|
Some fallout related to sequence number updates in NAT helper case
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header nfnetlink
|
|
Fundamental idea is to have a generic layer for all netfilter related netlink messages. It basically adds another layer of abstraction/multiplexing on top of netlink. Is it really needed?
|
|
|
|
Looking at the real users, they are extremely different:
|
|
|
|
ctnetlink
|
|
dump/read/flush/update connection tracking table
|
|
dump/read/flush/update connection tracking expectation table
|
|
ulog-ng
|
|
log arbitrary (even non-ip) packets to userspace
|
|
nf_queue
|
|
queue arbitrary (even non-ip) packets to userspace
|
|
pkttnetlink
|
|
ruleset management
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header ctnetlink
|
|
Purpose of ctnetlink is to have a userspace interface to the conntrack table
|
|
|
|
message types
|
|
IPCTNL_MSG_CT_NEW - create a new conntrack
|
|
IPCTNL_MSG_CT_DELETE - delete a conntrack, flush table
|
|
IPCTNL_MSG_CT_GET - read one or more conntracks
|
|
IPCTNL_MSG_CT_GET_CTRZERO - read conntrack and zero counters
|
|
|
|
IPCTNL_MSG_EXP_NEW - create a new expect
|
|
IPCTNL_MSG_EXP_DELETE - delete an expect
|
|
IPCTNL_MSG_EXP_GET - read one or more expects
|
|
|
|
IPCTNL_MSG_CONFIG - configuration of masks (see later)
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header conntrack event cache
|
|
ctnetlink also wants to have events, i.e. inform userspace about updates
|
|
|
|
ip_conntrack was extended to build an 'event cache', i.e. a list of events that have happened while one specific packet passes throught the stack:
|
|
|
|
IPCT_DESTROY
|
|
IPCT_NEW
|
|
IPCT_RELATED
|
|
IPCT_STATUS
|
|
IPCT_PROTOINFO
|
|
IPCT_HELPER
|
|
IPCT_HELPINFO
|
|
IPCT_NATINFO
|
|
|
|
When packet traversal finishes, a notifier is called with the bitmask of accumulated events for this packet (skb->nfcache)
|
|
Event API is used by ct_sync and ctnetlink
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header ctnetlink
|
|
ctnetlink registers with the event API and sends ctnetlink multicast msgs
|
|
|
|
ctnetlink event messages are either NEW, NEW with F_UPDATE or DELETE
|
|
|
|
Problem:
|
|
There can be lots of events.
|
|
We can easily see 200,000 NEW conntracks per second
|
|
|
|
Interim Solution:
|
|
Have userspace app specify the bitmask of interesting events via
|
|
IPCTNL_MSG_CONFIG. This defeats use by multiple incooperative apps.
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header ctnetlink
|
|
Proposed Real Solution:
|
|
Have generic netlink event message filters.
|
|
- Every socket can set it's local bitmask of events using setsockopt()
|
|
- netlink core maintains ORed event mask that is used by ctnetlink
|
|
- Whenever a socket disappears (or changes its mask), we recalculate
|
|
the global mask
|
|
|
|
This scheme should really be generic, since other subsystems with potentially many messages can profit from it.
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header conntrack tool
|
|
|
|
To test and use ctnetlink, Pablo Neira wrote the "conntrack" tool
|
|
Basically "iproute2" for conntrack:
|
|
|
|
-L [table] [-z] List conntrack or expect table
|
|
-G [table] params Show conntrac or expect
|
|
-D [table] params Delete conntrack or expect
|
|
-I [table] params Create conntrack or expect
|
|
-E [table] [options] Show events (equals "ip route monitor")
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header flow-based accounting
|
|
Linux misses good accounting solution.
|
|
Lots of people use inefficient net-acct/nacctd, ip-acct, ulog-acct, ...
|
|
Specialized solutions exist (ipt_ACCOUNT, ...) but are limited in scope
|
|
Most people want to have flow-based instead of packet-based logs
|
|
NETFLOW (or now IPFIX) format can be used by standard tools for analysis
|
|
|
|
Idea: We already have a flow cache in the kernel
|
|
Problem: It's read-only per packet
|
|
But: ip_conntrack already has per-packet write acccess
|
|
So: We can put counters in same already-written-to ip_conntrack cache line
|
|
|
|
Userspace interface is ctnetlink (either polling or event-based)
|
|
Simplistic implementation can use "conntrack" tool and pipe to perl script
|
|
Fully-featured logging daemon (ulogd2) is in the final implementation stage
|
|
See my OLS 2005 paper for more details
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header helpers
|
|
PPTP
|
|
helper is now finally ported to rustynat
|
|
will be merged soon since I'm tired of syncing it with core changes
|
|
|
|
H.323
|
|
now has a simplified ASN.1 parser instead of brute-force replace
|
|
needs more testing but could probably be merged soon, too
|
|
|
|
SIP
|
|
first development version showed up
|
|
extremely complex protocol, helper can only cover common cases
|
|
some features (like host names in SDP) cannot be solved in-kernel
|
|
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header pkttables
|
|
|
|
Sorry, no real progress since last year. Too much other work :(
|
|
|
|
We'll have to wait a bit longer until we see the next linux packet filter..
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header nf_conntrack
|
|
|
|
nf_conntrack is the layer3-independent connection tracking code (ipv4+ipv6)
|
|
- Code is still kept in-sync with ip_conntrack changes
|
|
- We still don't have IPv4-NAT on top of it
|
|
- Should already have been submitted a long time ago
|
|
- Problem: you can only have ip_conntrack or nf_conntrack loaded at once
|
|
- All the existing users ('state' and 'conntrack' iptables match, ..)
|
|
can't deal with it transparently.
|
|
- Should get fixed up, but like many ipv6 issues it has low prio :(
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header ipset
|
|
http://ipset.netfilter.org/
|
|
- Supersedes old ippool code
|
|
- Idea is to have certain groups of addresses (called "sets")
|
|
- Instead of having 100 iptables rules to match on 100 addresses, you have
|
|
1 iptables rule and an ipset with 100 addresses
|
|
- It's more optimal since it has efficient data types (such as a 256bit
|
|
long bitmask for any N addresses out of a /24)
|
|
- Should IMHO get merged soon, too.
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header ct_sync
|
|
|
|
- Development of 2.6.x port seems to have stabilized now
|
|
- We're not seeing any oopses for quite some time
|
|
- Still doesn't support working failover for 'helped' connections
|
|
- 2.6.x branch allows one node to participate in multiple virtual clusters
|
|
- Currently working on real active-active failover
|
|
- Current code based on 2.6.10, so no "rustynat" port yet
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header transparent proxying
|
|
In 2.2.x we had the kludy bind-to-foreign-address code
|
|
In 2.4.x it was removed because netfilter had to clean up core networking code
|
|
Now we have huge bloaty TPROXY patches out-of-tree instead:
|
|
- they do DNAT of incoming connection
|
|
- SNAT on outgoing connection
|
|
- use SO_GETORIGDST on incoming connection to retrieve un-nat'ed addr
|
|
While the code is working fine, I think it's just not worth the effort:
|
|
- NATing _twice_ just to route packets to local sockets, plus
|
|
- kludgy socket options and other nasty stuff....
|
|
Al we need is
|
|
- route certain packets to local sockets (based on destip/destport)
|
|
- bind local processes to foreign addresses (already works)
|
|
- send packets from sockets bound to foreign addreses
|
|
Transparent proxies with ctnetlink-issued expectations is what you want to enable conntrack helpers in userspace!
|
|
|
|
--newpage
|
|
--footer netconf'05 - netfilter update
|
|
--header misc
|
|
|
|
- new sourcecode directory structure: /net/netfilter/* for core stuff
|
|
- ipsec interaction -> Patrick
|
|
- conntrack reference issue (rmmod ip_conntrack vs. nf_reset() vs.
|
|
local nat vs. GETORIGDST)
|
|
|
|
not netfilter-related
|
|
- would somebody mind 'alias' devices that had their own mac address?
|