237 lines
6.0 KiB
Plaintext
237 lines
6.0 KiB
Plaintext
%include "default.mgp"
|
|
%default 1 bgrad
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
%nodefault
|
|
%back "blue"
|
|
|
|
|
|
|
|
%center
|
|
%size 7
|
|
A tour of the
|
|
Linux 2.6 network stack
|
|
|
|
|
|
%center
|
|
%size 4
|
|
by
|
|
|
|
Harald Welte <laforge@hmw-consulting.de>
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
Contents
|
|
|
|
|
|
Introduction
|
|
Hardirq Context
|
|
Hard Interrupt Handler
|
|
Softirq Context
|
|
Network RX Softirq
|
|
IPv4 Packet Handler
|
|
IPv4 Packet Forwarding
|
|
IPv4 Packet Output
|
|
Driver TX routine
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
Introduction
|
|
|
|
|
|
Who is speaking to you?
|
|
an independent Free Software developer
|
|
who earns his living off Free Software since 1997
|
|
who is one of the authors of the Linux kernel firewall system called netfilter/iptables
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
Interrupt context
|
|
|
|
Also called 'hardirq'
|
|
Triggered by external interrupt to the cpu
|
|
Is not reentrant, because the irq is disabled before handler is called
|
|
Should only do minimum of work and leave as fast as possible
|
|
|
|
hardirq handler registered via request_irq()
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
Receive Interrupt
|
|
|
|
NIC receives packet for local mac address
|
|
NIC issues interrupt
|
|
Interrupt is routed to one CPU
|
|
Kernel enters hardirq context and disables this irq on local cpu
|
|
Driver's interrupt handler
|
|
allocates skb (struct sk_buff)
|
|
calls net/core/dev.c:netif_rx()
|
|
return irqreturn_t
|
|
Kernel leaves hardirq context and reenables this irq
|
|
|
|
2.6.x introduces NAPI for polling at high irq rates: netif_rx_schedule()
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
Softirq context
|
|
|
|
Softirq is the real workhorse of interrupts
|
|
Continues work where hardirq has finished
|
|
Can be interrupted by hardirq context
|
|
Can run in parallel on any number of CPU's
|
|
|
|
softirq handler registered via kernel/softirq.c:open_softirq()
|
|
|
|
softirq's need to be 'raised' by raise_softirq() from hardirq
|
|
softirq's are scheduled
|
|
after hardirq context exits
|
|
from softirqd in case there's too much work
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
Network RX Softirq
|
|
|
|
|
|
kernel/softirq.c:do_softirq()
|
|
generic softirq code
|
|
net/core/dev.c:net_rx_action()
|
|
function that is registered at open_softirq() time
|
|
net/core/dev.c:process_backlog()
|
|
dequeue skb from local CPU's backlog queue
|
|
uses a weighting scheme between different devices
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
netif_receive_skb()
|
|
|
|
|
|
net/core/dev.c:netif_receive_skb()
|
|
main network rx softirq workhorse
|
|
check if there are any netpoll users, if yes netpoll_rx()
|
|
if somebody requested skb rx timestamp, net_timestamp()
|
|
if interface is part of bound group, skb_bound()
|
|
tc ingress filtering: ing_filter()
|
|
packet diverter: handle_diverter()
|
|
bridging handler: net/core/dev.c:handle_bridge()
|
|
deliver to l3 protocol handler: net/core/dev.c:deliver_skb()
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
IPv4 packet handler
|
|
|
|
|
|
net/ipv4/ip_input.c:ip_rcv()
|
|
checksum check
|
|
size check
|
|
NF_IP_PRE_ROUTING netfilter hook
|
|
net/ipv4/ip_input.c:ip_rcv_finish()
|
|
net/ipv4/route.c/ip_route_input()
|
|
route/dst cache lookup
|
|
if lookup fails, ip_route_input_slow()
|
|
fib lookup
|
|
allocation of new dst_entry / rtable
|
|
include/net/dst.h:dst_input()
|
|
iterate over destination stack
|
|
call destination function of the respective stack items
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
IPv4 packet forwarding
|
|
|
|
|
|
net/ipv4/ip_forward.c:ip_forward()
|
|
xfrm4_policy_check()
|
|
router alert handling (ip_call_ra_chain)
|
|
ttl decrement
|
|
if route is redirect route, ip_rt_send_redirect()
|
|
call NF_IP_FORWARD netfilter hook
|
|
net/ipv4/ip_forward.c:ip_forward_finish()
|
|
increase statistics for snmp mib
|
|
include/net/dst.h:dst_output()
|
|
iterate over output functions of dst stack
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
IPv4 packet output
|
|
|
|
|
|
net/ipv4/ip_output.c:ip_output()
|
|
fragment packet via ip_fragment() if needed
|
|
net/ipv4/ip_output.c:ip_finish_output()
|
|
call netfilter NF_IP_POST_ROUTING hook
|
|
net/ipv4/ip_output.c:ip_finish_output2()
|
|
attach hardware header
|
|
call header cache output fn (if neighbour in cache)
|
|
net/core/dev.c:dev_skb_xmit()
|
|
or neighbour output function (if neighbour unknown)
|
|
net/core/neighbour.c:neigh_resolve_output()
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
dev_skb_xmit()
|
|
|
|
|
|
skb->dev->qdisc->enqueue()
|
|
enqueue into devices output queue
|
|
default: net/sched/sch_generic.c:pfifo_fast_enqueue()
|
|
net/sched/sch_generic.c:qdisc_restart():
|
|
dev->qdisc->dequeue()
|
|
dequeue skb from queue
|
|
dev->hard_start_xmit()
|
|
transmit skb via driver
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
Driver TX Routine
|
|
|
|
drivers/net/e1000/e1000_main.c:e1000_xmit_frame()
|
|
tons of workarounds for chip bugs
|
|
set up TX DMA descriptor
|
|
queue TX DMA descriptor to device hardware
|
|
return NETDEV_TX_OK
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%page
|
|
Linux 2.6 Network Tour
|
|
Thanks
|
|
|
|
Thanks to
|
|
Alan Cox, Alexey Kuznetsov, David Miller, Andi Kleen
|
|
for implementing (one of?) the world's best TCP/IP stacks
|
|
Paul 'Rusty' Russell
|
|
for starting the netfilter/iptables project
|
|
for trusting me to maintain it today
|
|
Astaro AG
|
|
for sponsoring parts of my netfilter work
|
|
Free Software Foundation
|
|
for the GNU Project
|
|
for the GNU General Public License
|
|
%size 3
|
|
The slides of this presentation are available at http://www.gnumonks.org/
|
|
|
|
Further Reading
|
|
%size 3
|
|
The netfilter homepage http://www.netfilter.org/
|
|
%size 3
|
|
The http://www.gpl-violations.org/ project
|
|
|
|
|