Archived
14
0
Fork 0
This repository has been archived on 2022-02-17. You can view files and clone it, but cannot push or open issues or pull requests.
linux-2.6/net
Neil Horman 0e3125c755 packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Version 4 of this patch.

Change notes:
1) Removed extra memset.  Didn't think kcalloc added a GFP_ZERO the way kzalloc did :)

Summary:
It was shown to me recently that systems under high load were driven very deep
into swap when tcpdump was run.  The reason this happened was because the
AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
application to specify how many entries an AF_PACKET socket will have and how
large each entry will be.  It seems the default setting for tcpdump is to set
the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
allocation.  Thats difficult under good circumstances, and horrid under memory
pressure.

I thought it would be good to make that a bit more usable.  I was going to do a
simple conversion of the ring buffer from contigous pages to iovecs, but
unfortunately, the metadata which AF_PACKET places in these buffers can easily
span a page boundary, and given that these buffers get mapped into user space,
and the data layout doesn't easily allow for a change to padding between frames
to avoid that, a simple iovec change is just going to break user space ABI
consistency.

So I've done this, I've added a three tiered mechanism to the af_packet set_ring
socket option.  It attempts to allocate memory in the following order:

1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
digging into swap

2) Using vmalloc

3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
needed to get the memory

The effect is that we don't disturb the system as much when we're under load,
while still being able to conduct tcpdumps effectively.

Tested successfully by me.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 10:26:47 -08:00
..
9p net/9p: Return error on read with NULL buffer 2010-10-28 09:08:49 -05:00
802 net/802: add __rcu annotations 2010-10-25 13:09:44 -07:00
8021q vlan: Fix build warning in vlandev_seq_show() 2010-11-15 10:37:30 -08:00
appletalk
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2010-10-23 11:47:02 -07:00
ax25 net: ax25: fix information leak to userland 2010-11-10 10:14:33 -08:00
bluetooth Bluetooth: fix not setting security level when creating a rfcomm session 2010-11-09 00:56:10 -02:00
bridge bridge: add RCU annotations to bridge port lookup 2010-11-15 11:13:18 -08:00
caif caif: Remove noisy printout when disconnecting caif socket 2010-11-03 18:50:04 -07:00
can can-bcm: fix minor heap overflow 2010-11-12 14:07:14 -08:00
ceph ceph: fix num_pages_free accounting in pagelist 2010-10-20 15:38:23 -07:00
core net: Export netif_get_vlan_features(). 2010-11-15 20:15:03 -08:00
dcb
dccp dccp ccid-2: Separate option parsing from CCID processing 2010-11-15 07:12:01 +01:00
decnet Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-11-14 11:57:05 -08:00
dns_resolver
dsa
econet
ethernet
ieee802154
ipv4 xfrm: use gre key as flow upper protocol info 2010-11-15 10:44:04 -08:00
ipv6 ipv6: fix missing in6_ifa_put in addrconf 2010-11-16 09:24:58 -08:00
ipx BKL: introduce CONFIG_BKL. 2010-10-21 15:44:13 +02:00
irda Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-10-24 13:41:39 -07:00
iucv [S390] cleanup lowcore access from external interrupts 2010-10-25 16:10:19 +02:00
key
l2tp l2tp: kzalloc with swapped params in l2tp_dfs_seq_open 2010-11-01 06:56:02 -07:00
lapb
llc
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-11-16 09:17:12 -08:00
netfilter ipv4: Make rt->fl.iif tests lest obscure. 2010-11-11 17:07:48 -08:00
netlabel
netlink netlink: fix netlink_change_ngroups() 2010-10-24 16:25:39 -07:00
netrom
packet packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4) 2010-11-16 10:26:47 -08:00
phonet phonet: remove the unused variable pn 2010-10-20 01:55:54 -07:00
rds rds: Fix rds message leak in rds_message_map_pages 2010-11-08 12:17:09 -08:00
rfkill rfkill: remove dead code 2010-11-15 13:24:06 -05:00
rose
rxrpc
sched classifier: report statistics for basic classifier 2010-11-08 12:17:05 -08:00
sctp net: avoid limits overflow 2010-11-10 12:12:00 -08:00
sunrpc convert get_sb_single() users 2010-10-29 04:16:28 -04:00
tipc net: tipc: fix information leak to userland 2010-11-09 09:25:46 -08:00
unix af_unix: optimize unix_dgram_poll() 2010-11-08 13:50:09 -08:00
wanrouter
wimax
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-11-16 09:17:12 -08:00
x25 x25: Prevent crashing when parsing bad X.25 facilities 2010-11-12 12:44:42 -08:00
xfrm xfrm: make xfrm_bundle_ok local 2010-10-21 03:09:46 -07:00
compat.c net: Limit socket I/O iovec total length to INT_MAX. 2010-10-28 11:47:52 -07:00
Kconfig ceph: factor out libceph from Ceph file system 2010-10-20 15:37:28 -07:00
Makefile ceph: factor out libceph from Ceph file system 2010-10-20 15:37:28 -07:00
nonet.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
socket.c net: net_families __rcu annotations 2010-11-12 13:27:25 -08:00
sysctl_net.c
TUNABLE