Archived
14
0
Fork 0
This repository has been archived on 2022-02-17. You can view files and clone it, but cannot push or open issues or pull requests.
linux-2.6/net
Catalin(ux) M. BOIE 6f7edb4881 IPVS: Allow boot time change of hash size
I was very frustrated about the fact that I have to recompile the kernel
to change the hash size. So, I created this patch.

If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel
command line, or, if you built IPVS as modules, you can add
options ip_vs conn_tab_bits=??.

To keep everything backward compatible, you still can select the size at
compile time, and that will be used as default.

It has been about a year since this patch was originally posted
and subsequently dropped on the basis of insufficient test data.

Mark Bergsma has provided the following test results which seem
to strongly support the need for larger hash table sizes:

We do however run into the same problem with the default setting (212 =
4096 entries), as most of our LVS balancers handle around a million
connections/SLAB entries at any point in time (around 100-150 kpps
load). With only 4096 hash table entries this implies that each entry
consists of a linked list of 256 connections *on average*.

To provide some statistics, I did an oprofile run on an 2.6.31 kernel,
with both the default 4096 table size, and the same kernel recompiled
with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a
quick test setup with a part of Wikimedia/Wikipedia's live traffic
mirrored by the switch to the test host.

With the default setting, at ~ 120 kpps packet load we saw a typical %si
CPU usage of around 30-35%, and oprofile reported a hot spot in
ip_vs_conn_in_get:

samples  %        image name               app name
symbol name
1719761  42.3741  ip_vs.ko                 ip_vs.ko      ip_vs_conn_in_get
302577    7.4554  bnx2                     bnx2          /bnx2
181984    4.4840  vmlinux                  vmlinux       __ticket_spin_lock
128636    3.1695  vmlinux                  vmlinux       ip_route_input
74345     1.8318  ip_vs.ko                 ip_vs.ko      ip_vs_conn_out_get
68482     1.6874  vmlinux                  vmlinux       mwait_idle

After loading the recompiled kernel with 218 entries, %si CPU usage
dropped in half to around 12-18%, and oprofile looks much healthier,
with only 7% spent in ip_vs_conn_in_get:

samples  %        image name               app name
symbol name
265641   14.4616  bnx2                     bnx2         /bnx2
143251    7.7986  vmlinux                  vmlinux      __ticket_spin_lock
140661    7.6576  ip_vs.ko                 ip_vs.ko     ip_vs_conn_in_get
94364     5.1372  vmlinux                  vmlinux      mwait_idle
86267     4.6964  vmlinux                  vmlinux      ip_route_input

[ horms@verge.net.au: trivial up-port and minor style fixes ]
Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro>
Cc: Mark Bergsma <mark@wikimedia.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-05 05:50:24 +01:00
..
9p 9p connect fixes 2009-12-16 12:16:41 -05:00
802 sysctl net: Remove unused binary sysctl code 2009-11-12 02:05:06 -08:00
8021q bonding: allow arp_ip_targets on separate vlans to use arp validation 2010-01-03 21:17:16 -08:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
atm atm: [br2684] allow routed mode operation again 2009-12-08 20:22:31 -08:00
ax25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
bluetooth Bluetooth: Fix L2CAP locking scheme regression 2009-12-17 12:07:25 -08:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
can can: Speed up CAN frame receiption by using ml_priv 2010-01-03 21:31:03 -08:00
core bonding: allow arp_ip_targets on separate vlans to use arp validation 2010-01-03 21:17:16 -08:00
dcb net: Move && and || to end of previous line 2009-11-29 16:55:45 -08:00
dccp kfifo: rename kfifo_put... into kfifo_in... and kfifo_get... into kfifo_out... 2009-12-22 14:17:56 -08:00
decnet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
dsa
econet net: use net_eq to compare nets 2009-11-25 15:14:13 -08:00
ethernet llc: use dev_hard_header 2009-12-26 20:38:23 -08:00
ieee802154 net: use net_eq to compare nets 2009-11-25 15:14:13 -08:00
ipv4 netfilter: SNMP NAT: correct the size argument to kzalloc 2010-01-04 15:21:31 +01:00
ipv6 net: Add rtnetlink init_rcvwnd to set the TCP initial receive window 2009-12-23 14:13:30 -08:00
ipx Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
irda tree-wide: convert open calls to remove spaces to skip_spaces() lib function 2009-12-15 08:53:32 -08:00
iucv const: constify remaining dev_pm_ops 2009-12-15 08:53:25 -08:00
key xfrm: Fix truncation length of authentication algorithms installed via PF_KEY 2009-12-11 15:07:57 -08:00
lapb
llc llc: fix SAP reference counting w.r.t. socket handling 2009-12-26 20:47:23 -08:00
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-12-30 13:51:29 -08:00
netfilter IPVS: Allow boot time change of hash size 2010-01-05 05:50:24 +01:00
netlabel Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-12-09 19:43:33 -08:00
netlink net: use net_eq to compare nets 2009-11-25 15:14:13 -08:00
netrom Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
packet packet: dont call sleeping functions while holding rcu_read_lock() 2009-12-15 21:12:21 -08:00
phonet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
rds Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband 2009-12-16 10:32:31 -08:00
rfkill net/rfkill/core.c: work around gcc-4.0.2 silliness 2009-12-07 16:51:23 -05:00
rose Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
rxrpc net: use net_eq to compare nets 2009-11-25 15:14:13 -08:00
sched Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-12-09 19:43:33 -08:00
sctp Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-12-09 19:43:33 -08:00
sunrpc Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 2009-12-16 10:47:44 -08:00
tipc tipc: use kconfig to limit numeric ranges 2010-01-03 21:31:04 -08:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
wanrouter
wimax Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-12-09 19:43:33 -08:00
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-12-30 13:51:29 -08:00
x25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
compat.c net: use compat helper functions in compat_sys_recvmmsg 2009-12-11 15:07:57 -08:00
Kconfig
Makefile
nonet.c
socket.c fs: no games with DCACHE_UNHASHED 2009-12-17 10:51:40 -05:00
sysctl_net.c
TUNABLE