Archived
14
0
Fork 0
Commit graph

18109 commits

Author SHA1 Message Date
Simon Horman
258e958b85 IPVS: remove duplicate initialisation or rs_table
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 18:24:09 +01:00
Simon Horman
a870c8c5cb IPVS: use z modifier for sizeof() argument
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 18:21:53 +01:00
a00f1f3686 netfilter: ctnetlink: fix ctnetlink_parse_tuple() warning
net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_parse_tuple':
net/netfilter/nf_conntrack_netlink.c:832:11: warning: comparison between 'enum ctattr_tuple' and 'enum ctattr_type'

Use ctattr_type for the 'type' parameter since that's the type of all attributes
passed to this function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 17:26:37 +01:00
582e1fc85c netfilter: ipset: remove unnecessary includes
None of the set types need uaccess.h since this is handled centrally
in ip_set_core. Most set types additionally don't need bitops.h and
spinlock.h since they use neither. tcp.h is only needed by those
using before(), udp.h is not needed at all.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 16:57:37 +01:00
8da560ced5 netfilter: ipset: use nla_parse_nested()
Replace calls of the form:

nla_parse(tb, ATTR_MAX, nla_data(attr), nla_len(attr), policy)

by:

nla_parse_nested(tb, ATTR_MAX, attr, policy)

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 16:27:25 +01:00
Pablo Neira Ayuso
3db7e93d33 netfilter: ecache: always set events bits, filter them later
For the following rule:

iptables -I PREROUTING -t raw -j CT --ctevents assured

The event delivered looks like the following:

 [UPDATE] tcp      6 src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]

Note that the TCP protocol state is not included. For that reason
the CT event filtering is not very useful for conntrackd.

To resolve this issue, instead of conditionally setting the CT events
bits based on the ctmask, we always set them and perform the filtering
in the late stage, just before the delivery.

Thus, the event delivered looks like the following:

 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 16:06:30 +01:00
Pablo Neira Ayuso
9d0db8b6b1 netfilter: arpt_mangle: fix return values of checkentry
In 135367b "netfilter: xtables: change xt_target.checkentry return type",
the type returned by checkentry was changed from boolean to int, but the
return values where not adjusted.

arptables: Input/output error

This broke arptables with the mangle target since it returns true
under success, which is interpreted by xtables as >0, thus
returning EIO.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 16:03:46 +01:00
Jozsef Kadlecsik
d956798d82 netfilter: xtables: "set" match and "SET" target support
The patch adds the combined module of the "SET" target and "set" match
to netfilter. Both the previous and the current revisions are supported.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:56:00 +01:00
Jozsef Kadlecsik
f830837f0e netfilter: ipset: list:set set type support
The module implements the list:set type support in two flavours:
without and with timeout. The sets has two sides: for the userspace,
they store the names of other (non list:set type of) sets: one can add,
delete and test set names. For the kernel, it forms an ordered union of
the member sets: the members sets are tried in order when elements are
added, deleted and tested and the process stops at the first success.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:54:59 +01:00
Jozsef Kadlecsik
21f45020a3 netfilter: ipset: hash:net,port set type support
The module implements the hash:net,port type support in four flavours:
for IPv4 and IPv6, both without and with timeout support. The elements
are two dimensional: IPv4/IPv6 network address/prefix and protocol/port
pairs.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:53:55 +01:00
Jozsef Kadlecsik
b38370299e netfilter: ipset: hash:net set type support
The module implements the hash:net type support in four flavours:
for IPv4 and IPv6, both without and with timeout support. The elements
are one dimensional: IPv4/IPv6 network address/prefixes.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:52:54 +01:00
Jozsef Kadlecsik
41d22f7b2e netfilter: ipset: hash:ip,port,net set type support
The module implements the hash:ip,port,net type support in four flavours:
for IPv4 and IPv6, both without and with timeout support. The elements
are three dimensional: IPv4/IPv6 address, protocol/port and IPv4/IPv6
network address/prefix triples. The different prefixes are searched/matched
from the longest prefix to the shortes one (most specific to least).
In other words the processing time linearly grows with the number of
different prefixes in the set.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:51:00 +01:00
Jozsef Kadlecsik
5663bc30e6 netfilter: ipset: hash:ip,port,ip set type support
The module implements the hash:ip,port,ip type support in four flavours:
for IPv4 and IPv6, both without and with timeout support. The elements
are three dimensional: IPv4/IPv6 address, protocol/port and IPv4/IPv6
address triples.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:41:26 +01:00
Jozsef Kadlecsik
07896ed37b netfilter: ipset: hash:ip,port set type support
The module implements the hash:ip,port type support in four flavours:
for IPv4 and IPv6, both without and with timeout support. The elements
are two dimensional: IPv4/IPv6 address and protocol/port pairs. The port
is interpeted for TCP, UPD, ICMP and ICMPv6 (at the latters as type/code
of course).

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:39:52 +01:00
Jozsef Kadlecsik
6c02788969 netfilter: ipset: hash:ip set type support
The module implements the hash:ip type support in four flavours:
for IPv4 or IPv6, both without and with timeout support.

All the hash types are based on the "array hash" or ahash structure
and functions as a good compromise between minimal memory footprint
and speed. The hashing uses arrays to resolve clashes. The hash table
is resized (doubled) when searching becomes too long. Resizing can be
triggered by userspace add commands only and those are serialized by
the nfnl mutex. During resizing the set is read-locked, so the only
possible concurrent operations are the kernel side readers. Those are
protected by RCU locking.

Because of the four flavours and the other hash types, the functions
are implemented in general forms in the ip_set_ahash.h header file
and the real functions are generated before compiling by macro expansion.
Thus the dereferencing of low-level functions and void pointer arguments
could be avoided: the low-level functions are inlined, the function
arguments are pointers of type-specific structures.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:38:36 +01:00
Jozsef Kadlecsik
543261907d netfilter: ipset; bitmap:port set type support
The module implements the bitmap:port type in two flavours, without
and with timeout support to store TCP/UDP ports from a range.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:37:04 +01:00
Jozsef Kadlecsik
de76021a1b netfilter: ipset: bitmap:ip,mac type support
The module implements the bitmap:ip,mac set type in two flavours,
without and with timeout support. In this kind of set one can store
IPv4 address and (source) MAC address pairs. The type supports elements
added without the MAC part filled out: when the first matching from kernel
happens, the MAC part is automatically filled out. The timing out of the
elements stars when an element is complete in the IP,MAC pair.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:35:12 +01:00
Jozsef Kadlecsik
72205fc68b netfilter: ipset: bitmap:ip set type support
The module implements the bitmap:ip set type in two flavours, without
and with timeout support. In this kind of set one can store IPv4
addresses (or network addresses) from a given range.

In order not to waste memory, the timeout version does not rely on
the kernel timer for every element to be timed out but on garbage
collection. All set types use this mechanism.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:33:17 +01:00
Jozsef Kadlecsik
a7b4f989a6 netfilter: ipset: IP set core support
The patch adds the IP set core support to the kernel.

The IP set core implements a netlink (nfnetlink) based protocol by which
one can create, destroy, flush, rename, swap, list, save, restore sets,
and add, delete, test elements from userspace. For simplicity (and backward
compatibilty and for not to force ip(6)tables to be linked with a netlink
library) reasons a small getsockopt-based protocol is also kept in order
to communicate with the ip(6)tables match and target.

The netlink protocol passes all u16, etc values in network order with
NLA_F_NET_BYTEORDER flag. The protocol enforces the proper use of the
NLA_F_NESTED and NLA_F_NET_BYTEORDER flags.

For other kernel subsystems (netfilter match and target) the API contains
the functions to add, delete and test elements in sets and the required calls
to get/put refereces to the sets before those operations can be performed.

The set types (which are implemented in independent modules) are stored
in a simple RCU protected list. A set type may have variants: for example
without timeout or with timeout support, for IPv4 or for IPv6. The sets
(i.e. the pointers to the sets) are stored in an array. The sets are
identified by their index in the array, which makes possible easy and
fast swapping of sets. The array is protected indirectly by the nfnl
mutex from nfnetlink. The content of the sets are protected by the rwlock
of the set.

There are functional differences between the add/del/test functions
for the kernel and userspace:

- kernel add/del/test: works on the current packet (i.e. one element)
- kernel test: may trigger an "add" operation  in order to fill
  out unspecified parts of the element from the packet (like MAC address)
- userspace add/del: works on the netlink message and thus possibly
  on multiple elements from the IPSET_ATTR_ADT container attribute.
- userspace add: may trigger resizing of a set

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:28:35 +01:00
Eric W. Biederman
bf36076a67 net: Fix ipv6 neighbour unregister_sysctl_table warning
In my testing of 2.6.37 I was occassionally getting a warning about
sysctl table entries being unregistered in the wrong order.  Digging
in it turns out this dates back to the last great sysctl reorg done
where Al Viro introduced the requirement that sysctl directories
needed to be created before and destroyed after the files in them.

It turns out that in that great reorg /proc/sys/net/ipv6/neigh was
overlooked.  So this patch fixes that oversight and makes an annoying
warning message go away.

>------------[ cut here ]------------
>WARNING: at kernel/sysctl.c:1992 unregister_sysctl_table+0x134/0x164()
>Pid: 23951, comm: kworker/u:3 Not tainted 2.6.37-350888.2010AroraKernelBeta.fc14.x86_64 #1
>Call Trace:
> [<ffffffff8103e034>] warn_slowpath_common+0x80/0x98
> [<ffffffff8103e061>] warn_slowpath_null+0x15/0x17
> [<ffffffff810452f8>] unregister_sysctl_table+0x134/0x164
> [<ffffffff810e7834>] ? kfree+0xc4/0xd1
> [<ffffffff813439b2>] neigh_sysctl_unregister+0x22/0x3a
> [<ffffffffa02cd14e>] addrconf_ifdown+0x33f/0x37b [ipv6]
> [<ffffffff81331ec2>] ? skb_dequeue+0x5f/0x6b
> [<ffffffffa02ce4a5>] addrconf_notify+0x69b/0x75c [ipv6]
> [<ffffffffa02eb953>] ? ip6mr_device_event+0x98/0xa9 [ipv6]
> [<ffffffff813d2413>] notifier_call_chain+0x32/0x5e
> [<ffffffff8105bdea>] raw_notifier_call_chain+0xf/0x11
> [<ffffffff8133cdac>] call_netdevice_notifiers+0x45/0x4a
> [<ffffffff8133d2b0>] rollback_registered_many+0x118/0x201
> [<ffffffff8133d3af>] unregister_netdevice_many+0x16/0x6d
> [<ffffffff8133d571>] default_device_exit_batch+0xa4/0xb8
> [<ffffffff81337c42>] ? cleanup_net+0x0/0x194
> [<ffffffff81337a2a>] ops_exit_list+0x4e/0x56
> [<ffffffff81337d36>] cleanup_net+0xf4/0x194
> [<ffffffff81053318>] process_one_work+0x187/0x280
> [<ffffffff8105441b>] worker_thread+0xff/0x19f
> [<ffffffff8105431c>] ? worker_thread+0x0/0x19f
> [<ffffffff8105776d>] kthread+0x7d/0x85
> [<ffffffff81003824>] kernel_thread_helper+0x4/0x10
> [<ffffffff810576f0>] ? kthread+0x0/0x85
> [<ffffffff81003820>] ? kernel_thread_helper+0x0/0x10
>---[ end trace 8a7e9310b35e9486 ]---

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 20:54:17 -08:00
Tom Herbert
8587523640 net: Check rps_flow_table when RPS map length is 1
In get_rps_cpu, add check that the rps_flow_table for the device is
NULL when trying to take fast path when RPS map length is one.
Without this, RFS is effectively disabled if map length is one which
is not correct.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 16:23:42 -08:00
David S. Miller
0c838ff1ad ipv4: Consolidate all default route selection implementations.
Both fib_trie and fib_hash have a local implementation of
fib_table_select_default().  This is completely unnecessary
code duplication.

Since we now remember the fib_table and the head of the fib
alias list of the default route, we can implement one single
generic version of this routine.

Looking at the fib_hash implementation you may get the impression
that it's possible for there to be multiple top-level routes in
the table for the default route.  The truth is, it isn't, the
insert code will only allow one entry to exist in the zero
prefix hash table, because all keys evaluate to zero and all
keys in a hash table must be unique.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 16:16:50 -08:00
David S. Miller
5b4704419c ipv4: Remember FIB alias list head and table in lookup results.
This will be used later to implement fib_select_default() in a
completely generic manner, instead of the current situation where the
default route is re-looked up in the TRIE/HASH table and then the
available aliases are analyzed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 16:10:03 -08:00
Linus Torvalds
0fd08c5545 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: NFSv4 readdir loses entries
  NFS: Micro-optimize nfs4_decode_dirent()
  NFS: Fix an NFS client lockdep issue
  NFS construct consistent co_ownerid for v4.1
  NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount
  NFS improve pnfs_put_deviceid_cache debug print
  NFS fix cb_sequence error processing
  NFS do not find client in NFSv4 pg_authenticate
  NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!"
  NFS: Prevent memory allocation failure in nfsacl_encode()
  NFS: nfsacl_{encode,decode} should return signed integer
  NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"
  NFS: Fix "kernel BUG at fs/aio.c:554!"
  NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds().
  NFS: fix handling of malloc failure during nfs_flush_multi()
2011-02-01 09:41:02 +10:00
David S. Miller
a5e3c2aae2 Merge branch 'batman-adv/next' of git://git.open-mesh.org/ecsv/linux-merge 2011-01-31 13:24:56 -08:00
Roland Dreier
ec831ea72e net: Add default_mtu() methods to blackhole dst_ops
When an IPSEC SA is still being set up, __xfrm_lookup() will return
-EREMOTE and so ip_route_output_flow() will return a blackhole route.
This can happen in a sndmsg call, and after d33e455337 ("net: Abstract
default MTU metric calculation behind an accessor.") this leads to a
crash in ip_append_data() because the blackhole dst_ops have no
default_mtu() method and so dst_mtu() calls a NULL pointer.

Fix this by adding default_mtu() methods (that simply return 0, matching
the old behavior) to the blackhole dst_ops.

The IPv4 part of this patch fixes a crash that I saw when using an IPSEC
VPN; the IPv6 part is untested because I don't have an IPv6 VPN, but it
looks to be needed as well.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 13:16:00 -08:00
David S. Miller
5403c8a295 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-31 13:13:24 -08:00
David S. Miller
c79b9e4936 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2011-01-31 12:31:24 -08:00
Rajkumar Manoharan
8c7914dec2 mac80211: disable power save if an infra AP vif exists
PS should not be enabled if an infra AP vif exists in
the interface list. So while recalculating PS,
AP vif type should be taken into account.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rajkumar Manoharan <rmanoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-31 15:06:26 -05:00
Sven Eckelmann
64afe35398 batman-adv: Update copyright years
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:12 +01:00
Sven Eckelmann
1299bdaa1c batman-adv: Remove unused variables
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:12 +01:00
Sven Eckelmann
fb86d7648f batman-adv: Remove declaration of batman_skb_recv
batman_skb_recv can be defined in hard-interface.c as static because it is
never used outside of that file.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:11 +01:00
Sven Eckelmann
335f94c981 batman-adv: Remove unused definitions
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:10 +01:00
Sven Eckelmann
633979b43f batman-adv: Remove dangling declaration of hash_remove_element
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:10 +01:00
Simon Wunderlich
74ef115359 batman-adv: remove unused parameters
Some function parameters are obsolete now and can be removed.

Reported-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:09 +01:00
Sven Eckelmann
ae361ce19f batman-adv: Calculate correct size for merged packets
The routing algorithm must be able to decide if a fragment can be merged with
the missing part and still be passed to a forwarding interface. The fragments
can only differ by one byte in case that the original payload had an uneven
length. In that situation the sender has to inform all possible receivers that
the tail is one byte longer using the flag UNI_FRAG_LARGETAIL.

The combination of UNI_FRAG_LARGETAIL and UNI_FRAG_HEAD flag makes it possible
to calculate the correct length for even and uneven sized payloads.

The original formula missed to add the unicast header at all and forgot to
remove the fragment header of the second fragment. This made the results highly
unreliable and only useful for machines with large differences between the
configured MTUs.

Reported-by: Russell Senior <russell@personaltelco.net>
Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:08 +01:00
Sven Eckelmann
5c77d8bb8a batman-adv: Create roughly equal sized fragments
The routing algorithm must know how large two fragments are to be able to
decide that it is safe to merge them or if it should resubmit without waiting
for the second part. When these two fragments have a too different size, it is
not possible to guess right in every situation.

The user could easily configure the MTU of the attached cards so that one
fragment is forwarded and the other one is added to the fragments table to wait
for the missing part.

For even sized packets, it is possible to split it so that the resulting
packages are equal sized by ignoring the old non-fragment header at the
beginning of the original packet.

This still creates different sized fragments for uneven sized packets.

Reported-by: Russell Senior <russell@personaltelco.net>
Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:08 +01:00
David S. Miller
81c2bdb688 Merge branch 'batman-adv/merge-oopsonly' of git://git.open-mesh.org/ecsv/linux-merge 2011-01-30 22:16:34 -08:00
Sven Eckelmann
1181e1daac batman-adv: Make vis info stack traversal threadsafe
The batman-adv vis server has to a stack which stores all information
about packets which should be send later. This stack is protected
with a spinlock that is used to prevent concurrent write access to it.

The send_vis_packets function has to take all elements from the stack
and send them to other hosts over the primary interface. The send will
be initiated without the lock which protects the stack.

The implementation using list_for_each_entry_safe has the problem that
it stores the next element as "safe ptr" to allow the deletion of the
current element in the list. The list may be modified during the
unlock/lock pair in the loop body which may make the safe pointer
not pointing to correct next element.

It is safer to remove and use the first element from the stack until no
elements are available. This does not need reduntant information which
would have to be validated each time the lock was removed.

Reported-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-30 10:32:08 +01:00
Sven Eckelmann
dda9fc6b2c batman-adv: Remove vis info element in free_info
The free_info function will be called when no reference to the info
object exists anymore. It must be ensured that the allocated memory
gets freed and not only the elements which are managed by the info
object.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-30 10:32:06 +01:00
Sven Eckelmann
2674c15870 batman-adv: Remove vis info on hashing errors
A newly created vis info object must be removed when it couldn't be
added to the hash. The old_info which has to be replaced was already
removed and isn't related to the hash anymore.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-30 10:32:02 +01:00
Eric W. Biederman
709b46e8d9 net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT
SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1,
which unfortunately means the existing infrastructure for compat networking
ioctls is insufficient.  A trivial compact ioctl implementation would conflict
with:

SIOCAX25ADDUID
SIOCAIPXPRISLT
SIOCGETSGCNT_IN6
SIOCGETSGCNT
SIOCRSSCAUSE
SIOCX25SSUBSCRIP
SIOCX25SDTEFACILITIES

To make this work I have updated the compat_ioctl decode path to mirror the
the normal ioctl decode path.  I have added an ipv4 inet_compat_ioctl function
so that I can have ipv4 specific compat ioctls.   I have added a compat_ioctl
function into struct proto so I can break out ioctls by which kind of ip socket
I am using.  I have added a compat_raw_ioctl function because SIOCGETSGCNT only
works on raw sockets.  I have added a ipmr_compat_ioctl that mirrors the normal
ipmr_ioctl.

This was necessary because unfortunately the struct layout for the SIOCGETSGCNT
has unsigned longs in it so changes between 32bit and 64bit kernels.

This change was sufficient to run a 32bit ip multicast routing daemon on a
64bit kernel.

Reported-by: Bill Fenner <fenner@aristanetworks.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-30 01:14:38 -08:00
Eric W. Biederman
13ad17745c net: Fix ip link add netns oops
Ed Swierk <eswierk@bigswitch.com> writes:
> On 2.6.35.7
>  ip link add link eth0 netns 9999 type macvlan
> where 9999 is a nonexistent PID triggers an oops and causes all network functions to hang:
> [10663.821898] BUG: unable to handle kernel NULL pointer dereference at 000000000000006d
>  [10663.821917] IP: [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
>  [10663.821933] PGD 1d3927067 PUD 22f5c5067 PMD 0
>  [10663.821944] Oops: 0000 [#1] SMP
>  [10663.821953] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
>  [10663.821959] CPU 3
>  [10663.821963] Modules linked in: macvlan ip6table_filter ip6_tables rfcomm ipt_MASQUERADE binfmt_misc iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack sco ipt_REJECT bnep l2cap xt_tcpudp iptable_filter ip_tables x_tables bridge stp vboxnetadp vboxnetflt vboxdrv kvm_intel kvm parport_pc ppdev snd_hda_codec_intelhdmi snd_hda_codec_conexant arc4 iwlagn iwlcore mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi i915 snd_seq_midi_event snd_seq thinkpad_acpi drm_kms_helper btusb tpm_tis nvram uvcvideo snd_timer snd_seq_device bluetooth videodev v4l1_compat v4l2_compat_ioctl32 tpm drm tpm_bios snd cfg80211 psmouse serio_raw intel_ips soundcore snd_page_alloc intel_agp i2c_algo_bit video output netconsole configfs lp parport usbhid hid e1000e sdhci_pci ahci libahci sdhci led_class
>  [10663.822155]
>  [10663.822161] Pid: 6000, comm: ip Not tainted 2.6.35-23-generic #41-Ubuntu 2901CTO/2901CTO
>  [10663.822167] RIP: 0010:[<ffffffff8149c2fa>] [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
>  [10663.822177] RSP: 0018:ffff88014aebf7b8 EFLAGS: 00010286
>  [10663.822182] RAX: 00000000fffffff4 RBX: ffff8801ad900800 RCX: 0000000000000000
>  [10663.822187] RDX: ffff880000000000 RSI: 0000000000000000 RDI: ffff88014ad63000
>  [10663.822191] RBP: ffff88014aebf808 R08: 0000000000000041 R09: 0000000000000041
>  [10663.822196] R10: 0000000000000000 R11: dead000000200200 R12: ffff88014aebf818
>  [10663.822201] R13: fffffffffffffffd R14: ffff88014aebf918 R15: ffff88014ad62000
>  [10663.822207] FS: 00007f00c487f700(0000) GS:ffff880001f80000(0000) knlGS:0000000000000000
>  [10663.822212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [10663.822216] CR2: 000000000000006d CR3: 0000000231f19000 CR4: 00000000000026e0
>  [10663.822221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  [10663.822226] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  [10663.822231] Process ip (pid: 6000, threadinfo ffff88014aebe000, task ffff88014afb16e0)
>  [10663.822236] Stack:
>  [10663.822240] ffff88014aebf808 ffffffff814a2bb5 ffff88014aebf7e8 00000000a00ee8d6
>  [10663.822251] <0> 0000000000000000 ffffffffa00ef940 ffff8801ad900800 ffff88014aebf818
>  [10663.822265] <0> ffff88014aebf918 ffff8801ad900800 ffff88014aebf858 ffffffff8149c413
>  [10663.822281] Call Trace:
>  [10663.822290] [<ffffffff814a2bb5>] ? dev_addr_init+0x75/0xb0
>  [10663.822298] [<ffffffff8149c413>] dev_alloc_name+0x43/0x90
>  [10663.822307] [<ffffffff814a85ee>] rtnl_create_link+0xbe/0x1b0
>  [10663.822314] [<ffffffff814ab2aa>] rtnl_newlink+0x48a/0x570
>  [10663.822321] [<ffffffff814aafcc>] ? rtnl_newlink+0x1ac/0x570
>  [10663.822332] [<ffffffff81030064>] ? native_x2apic_icr_read+0x4/0x20
>  [10663.822339] [<ffffffff814a8c17>] rtnetlink_rcv_msg+0x177/0x290
>  [10663.822346] [<ffffffff814a8aa0>] ? rtnetlink_rcv_msg+0x0/0x290
>  [10663.822354] [<ffffffff814c25d9>] netlink_rcv_skb+0xa9/0xd0
>  [10663.822360] [<ffffffff814a8a85>] rtnetlink_rcv+0x25/0x40
>  [10663.822367] [<ffffffff814c223e>] netlink_unicast+0x2de/0x2f0
>  [10663.822374] [<ffffffff814c303e>] netlink_sendmsg+0x1fe/0x2e0
>  [10663.822383] [<ffffffff81488533>] sock_sendmsg+0xf3/0x120
>  [10663.822391] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
>  [10663.822400] [<ffffffff81168656>] ? __d_lookup+0x136/0x150
>  [10663.822406] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
>  [10663.822414] [<ffffffff812b7a0d>] ? _atomic_dec_and_lock+0x4d/0x80
>  [10663.822422] [<ffffffff8116ea90>] ? mntput_no_expire+0x30/0x110
>  [10663.822429] [<ffffffff81486ff5>] ? move_addr_to_kernel+0x65/0x70
>  [10663.822435] [<ffffffff81493308>] ? verify_iovec+0x88/0xe0
>  [10663.822442] [<ffffffff81489020>] sys_sendmsg+0x240/0x3a0
> [10663.822450] [<ffffffff8111e2a9>] ? __do_fault+0x479/0x560
>  [10663.822457] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
>  [10663.822465] [<ffffffff8116cf4a>] ? alloc_fd+0x10a/0x150
>  [10663.822473] [<ffffffff8158d76e>] ? do_page_fault+0x15e/0x350
>  [10663.822482] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
>  [10663.822487] Code: 90 48 8d 78 02 be 25 00 00 00 e8 92 1d e2 ff 48 85 c0 75 cf bf 20 00 00 00 e8 c3 b1 c6 ff 49 89 c7 b8 f4 ff ff ff 4d 85 ff 74 bd <4d> 8b 75 70 49 8d 45 70 48 89 45 b8 49 83 ee 58 eb 28 48 8d 55
>  [10663.822618] RIP [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
>  [10663.822627] RSP <ffff88014aebf7b8>
>  [10663.822631] CR2: 000000000000006d
>  [10663.822636] ---[ end trace 3dfd6c3ad5327ca7 ]---

This bug was introduced in:
commit 81adee47df
Author: Eric W. Biederman <ebiederm@aristanetworks.com>
Date:   Sun Nov 8 00:53:51 2009 -0800

    net: Support specifying the network namespace upon device creation.

    There is no good reason to not support userspace specifying the
    network namespace during device creation, and it makes it easier
    to create a network device and pass it to a child network namespace
    with a well known name.

    We have to be careful to ensure that the target network namespace
    for the new device exists through the life of the call.  To keep
    that logic clear I have factored out the network namespace grabbing
    logic into rtnl_link_get_net.

    In addtion we need to continue to pass the source network namespace
    to the rtnl_link_ops.newlink method so that we can find the base
    device source network namespace.

    Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Where apparently I forgot to add error handling to the path where we create
a new network device in a new network namespace, and pass in an invalid pid.

Cc: stable@kernel.org
Reported-by: Ed Swierk <eswierk@bigswitch.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-30 01:14:15 -08:00
Herbert Xu
66c46d741e gro: Reset dev pointer on reuse
On older kernels the VLAN code may zero skb->dev before dropping
it and causing it to be reused by GRO.

Unfortunately we didn't reset skb->dev in that case which causes
the next GRO user to get a bogus skb->dev pointer.

This particular problem no longer happens with the current upstream
kernel due to changes in VLAN processing.

However, for correctness we should still reset the skb->dev pointer
in the GRO reuse function in case a future user does the same thing.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-29 22:36:24 -08:00
David S. Miller
b8dad61cc7 ipv4: If fib metrics are default, no need to grab ref to FIB info.
The fib metric memory in this case is static in the kernel image,
so we don't need to reference count it since it's never going
to go away on us.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-28 14:07:16 -08:00
David S. Miller
725d1e1b45 ipv4: Attach FIB info to dst_default_metrics when possible
If there are no explicit metrics attached to a route, hook
fi->fib_info up to dst_default_metrics.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-28 14:05:05 -08:00
David S. Miller
9c150e82ac ipv4: Allocate fib metrics dynamically.
This is the initial gateway towards super-sharing metrics
if they are all set to zero for a route.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-28 14:01:25 -08:00
John W. Linville
3e11210d46 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
Conflicts:
	drivers/net/wireless/ath/ath9k/init.c
2011-01-28 16:23:14 -05:00
Julia Lawall
efe1cf0c57 net/wireless/nl80211.c: Avoid call to genlmsg_cancel
genlmsg_cancel subtracts some constants from its second argument before
calling nlmsg_cancel.  nlmsg_cancel then calls nlmsg_trim on the same
arguments.  nlmsg_trim tests for NULL before doing any computation, but a
NULL second argument to genlmsg_cancel is no longer NULL due to the initial
subtraction.  Nothing else happens in this execution, so the call to
genlmsg_cancel is simply unnecessary in this case.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression data;
@@

if (data == NULL) { ...
* genlmsg_cancel(..., data);
  ...
  return ...;
}
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-28 15:46:23 -05:00
Ben Greear
4914b3bb7f mac80211: Add sdata state and flags to debugfs.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-28 15:46:23 -05:00
Johannes Berg
6d744bacee mac80211: add MCS information to radiotap
This adds the MCS information we currently get
from the drivers into radiotap.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-28 15:44:29 -05:00
Juuso Oikarinen
45cbad6a12 cfg80211: Allow non-zero indexes for device specific pair-wise ciphers
Some vendor specific cipher suites require non-zero key indexes for pairwise
keys, but as of currently, the cfg80211 does not allow it.

As validating they cipher parameters for vendor specific cipher suites is the
job of the driver or hardware/firmware, change the cfg80211 to allow also
non-zero pairwise key indexes for vendor specific ciphers.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-28 15:44:27 -05:00
Thomas Jacob
6a4ddef2a3 netfilter: xt_iprange: add IPv6 match debug print code
Signed-off-by: Thomas Jacob <jacob@internet24.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-28 19:33:13 +01:00
David S. Miller
a4daad6b09 net: Pre-COW metrics for TCP.
TCP is going to record metrics for the connection,
so pre-COW the route metrics at route cache entry
creation time.

This avoids several atomic operations that have to
occur if we COW the metrics after the entry reaches
global visibility.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 22:01:53 -08:00
David S. Miller
8571a19c4a Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2011-01-27 16:00:37 -08:00
Eric Dumazet
ccf434380d net: fix dev_seq_next()
Commit c6d14c8456 (net: Introduce for_each_netdev_rcu() iterator)
added a race in dev_seq_next().

The rcu_dereference() call should be done _before_ testing the end of
list, or we might return a wrong net_device if a concurrent thread
changes net_device list under us.

Note : discovered thanks to a sparse warning :

net/core/dev.c:3919:9: error: incompatible types in comparison expression
(different address spaces)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 15:02:56 -08:00
David S. Miller
065825402c net: Store ipv4/ipv6 COW'd metrics in inetpeer cache.
Please note that the IPSEC dst entry metrics keep using
the generic metrics COW'ing mechanism using kmalloc/kfree.

This gives the IPSEC routes an opportunity to use metrics
which are unique to their encapsulated paths.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:59:31 -08:00
David S. Miller
1397e171f1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-27 14:59:08 -08:00
David S. Miller
8f2771f2b8 ipv6: Remove route peer binding assertions.
They are bogus.  The basic idea is that I wanted to make sure
that prefixed routes never bind to peers.

The test I used was whether RTF_CACHE was set.

But first of all, the RTF_CACHE flag is set at different spots
depending upon which ip6_rt_copy() caller you're talking about.

I've validated all of the code paths, and even in the future
where we bind peers more aggressively (for route metric COW'ing)
we never bind to prefix'd routes, only fully specified ones.
This even applies when addrconf or icmp6 routes are allocated.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:55:22 -08:00
Eric Dumazet
c2aa3665cf net: add kmemcheck annotation in __alloc_skb()
pskb_expand_head() triggers a kmemcheck warning when copy of
skb_shared_info is done in pskb_expand_head()

This is because destructor_arg field is not necessarily initialized at
this point. Add kmemcheck_annotate_variable() call in __alloc_skb() to
instruct kmemcheck this is a normal situation.

Resolves bugzilla.kernel.org 27212

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=27212
Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:41:06 -08:00
Kurt Van Dijck
6d3a9a6854 net: fix validate_link_af in rtnetlink core
I'm testing an API that uses IFLA_AF_SPEC attribute.
In the rtnetlink core , the set_link_af() member
of the rtnl_af_ops struct receives the nested attribute
(as I expected), but the validate_link_af() member
receives the parent attribute.
IMO, this patch fixes this.

Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:39:21 -08:00
Eric Dumazet
389f2a18c6 econet: remove compiler warnings
net/econet/af_econet.c: In function ‘econet_sendmsg’:
net/econet/af_econet.c:494: warning: label ‘error’ defined but not used
net/econet/af_econet.c:268: warning: unused variable ‘sk’

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:15:54 -08:00
David S. Miller
144001bddc inetpeer: Mark metrics as "new" in fresh inetpeer entries.
Set the RTAX_LOCKED metric to INETPEER_METRICS_NEW (basically,
all ones) on fresh inetpeer entries.

This way code can determine if default metrics have been loaded
in from a routing table entry already.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 13:52:16 -08:00
Thomas Jacob
705ca14717 netfilter: xt_iprange: typo in IPv4 match debug print code
Signed-off-by: Thomas Jacob <jacob@internet24.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-27 10:56:32 +01:00
David S. Miller
62fa8a846d net: Implement read-only protection and COW'ing of metrics.
Routing metrics are now copy-on-write.

Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there.  Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.

The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.

For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing.  Very likely
this "somewhere else" will be the inetpeer cache.

Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.

But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads.  In those
cases the read-only metric copies stay in place and never get written
to.

TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit.  But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.

Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.

Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.

The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline.  This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26 20:51:05 -08:00
David S. Miller
b4e69ac670 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-26 13:49:30 -08:00
David S. Miller
7cc2edb834 xfrm6: Don't forget to propagate peer into ipsec route.
Like ipv4, we have to propagate the ipv6 route peer into
the ipsec top-level route during instantiation.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26 13:41:03 -08:00
Johannes Berg
ba99d93b3d mac80211: use DECLARE_EVENT_CLASS
For events that include only the local struct as
their parameter, we can use DECLARE_EVENT_CLASS
and save quite some binary size across segments
as well lines of code.

   text	   data	    bss	    dec	    hex	filename
 375745	  19296	    916	 395957	  60ab5	mac80211.ko.before
 367473	  17888	    916	 386277	  5e4e5	mac80211.ko.after
  -8272   -1408       0   -9680   -25d0 delta

Some more tracepoints with identical arguments
could be combined like this but for now this is
the one that benefits most.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-26 16:15:45 -05:00
Eric Dumazet
144ce879b0 net_sched: sch_mqprio: dont leak kernel memory
mqprio_dump() should make sure all fields of struct tc_mqprio_qopt are
initialized.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26 13:15:29 -08:00
David S. Miller
9b6941d8b1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-01-26 11:49:49 -08:00
2e0348c449 Merge branch 'connlimit' of git://dev.medozas.de/linux 2011-01-26 16:28:45 +01:00
Jan Engelhardt
ad86e1f27a netfilter: xt_connlimit: pick right dstaddr in NAT scenario
xt_connlimit normally records the "original" tuples in a hashlist
(such as "1.2.3.4 -> 5.6.7.8"), and looks in this list for iph->daddr
when counting.

When the user however uses DNAT in PREROUTING, looking for
iph->daddr -- which is now 192.168.9.10 -- will not match. Thus in
daddr mode, we need to record the reverse direction tuple
("192.168.9.10 -> 1.2.3.4") instead. In the reverse tuple, the dst
addr is on the src side, which is convenient, as count_them still uses
&conn->tuple.src.u3.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-26 13:01:39 +01:00
Linus Lüssing
dd58ddc692 batman-adv: Fix kernel panic when fetching vis data on a vis server
The hash_iterate removal introduced a bug leading to a kernel panic when
fetching the vis data on a vis server. That commit forgot to rename one
variable name, which this commit fixes now.

Reported-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-25 23:58:33 +01:00
Jerry Chu
44f5324b5d TCP: fix a bug that triggers large number of TCP RST by mistake
This patch fixes a bug that causes TCP RST packets to be generated
on otherwise correctly behaved applications, e.g., no unread data
on close,..., etc. To trigger the bug, at least two conditions must
be met:

1. The FIN flag is set on the last data packet, i.e., it's not on a
separate, FIN only packet.
2. The size of the last data chunk on the receive side matches
exactly with the size of buffer posted by the receiver, and the
receiver closes the socket without any further read attempt.

This bug was first noticed on our netperf based testbed for our IW10
proposal to IETF where a large number of RST packets were observed.
netperf's read side code meets the condition 2 above 100%.

Before the fix, tcp_data_queue() will queue the last skb that meets
condition 1 to sk_receive_queue even though it has fully copied out
(skb_copy_datagram_iovec()) the data. Then if condition 2 is also met,
tcp_recvmsg() often returns all the copied out data successfully
without actually consuming the skb, due to a check
"if ((chunk = len - tp->ucopy.len) != 0) {"
and
"len -= chunk;"
after tcp_prequeue_process() that causes "len" to become 0 and an
early exit from the big while loop.

I don't see any reason not to free the skb whose data have been fully
consumed in tcp_data_queue(), regardless of the FIN flag.  We won't
get there if MSG_PEEK is on. Am I missing some arcane cases related
to urgent data?

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-25 13:46:30 -08:00
Felix Fietkau
eb3e554b4b mac80211: fix a crash in ieee80211_beacon_get_tim on change_interface
Some drivers (e.g. ath9k) do not always disable beacons when they're
supposed to. When an interface is changed using the change_interface op,
the mode specific sdata part is in an undefined state and trying to
get a beacon at this point can produce weird crashes.

To fix this, add a check for ieee80211_sdata_running before using
anything from the sdata.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-25 16:28:56 -05:00
Eric Dumazet
26ad787962 pktgen: speedup fragmented skbs
We spend lot of time clearing pages in pktgen.
(Or not clearing them on ipv6 and leaking kernel memory)

Since we dont modify them, we can use one zeroed page, and get
references on it. This page can use NUMA affinity as well.

Define pktgen_finalize_skb() helper, used both in ipv4 and ipv6

Results using skbs with one frag :

Before patch :

Result: OK: 608980458(c608978520+d1938) nsec, 1000000000
(100byte,1frags)
  1642088pps 1313Mb/sec (1313670400bps) errors: 0

After patch :

Result: OK: 345285014(c345283891+d1123) nsec, 1000000000
(100byte,1frags)
  2896158pps 2316Mb/sec (2316926400bps) errors: 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-25 13:26:05 -08:00
David S. Miller
73a8bd74e2 ipv6: Revert 'administrative down' address handling changes.
This reverts the following set of commits:

d1ed113f16 ("ipv6: remove duplicate neigh_ifdown")
29ba5fed1b ("ipv6: don't flush routes when setting loopback down")
9d82ca98f7 ("ipv6: fix missing in6_ifa_put in addrconf")
2de7957072 ("ipv6: addrconf: don't remove address state on ifdown if the address is being kept")
8595805aaf ("IPv6: only notify protocols if address is compeletely gone")
27bdb2abcc ("IPv6: keep tentative addresses in hash table")
93fa159abe ("IPv6: keep route for tentative address")
8f37ada5b5 ("IPv6: fix race between cleanup and add/delete address")
84e8b803f1 ("IPv6: addrconf notify when address is unavailable")
dc2b99f71e ("IPv6: keep permanent addresses on admin down")

because the core semantic change to ipv6 address handling on ifdown
has broken some things, in particular "disable_ipv6" sysctl handling.

Stephen has made several attempts to get things back in working order,
but nothing has restored disable_ipv6 fully yet.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Tested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-25 12:49:08 -08:00
Andy Adamson
778be232a2 NFS do not find client in NFSv4 pg_authenticate
The information required to find the nfs_client cooresponding to the incoming
back channel request is contained in the NFS layer. Perform minimal checking
in the RPC layer pg_authenticate method, and push more detailed checking into
the NFS layer where the nfs_client can be found.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 15:26:51 -05:00
Changli Gao
9f4e1ccd80 netfilter: ipvs: fix compiler warnings
Fix compiler warnings when IP_VS_DBG() isn't defined.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-25 23:17:51 +10:00
Vlad Dogaru
a512b92b3a net: add sysfs entry for device group
The group of a network device can be queried or changed from userspace
using sysfs.

For example, considering sysfs mounted in /sys, one can change the group
that interface lo belongs to:
	echo 1 > /sys/class/net/lo/group

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 23:23:28 -08:00
Eugene Teo
b7c7d01aae net: clear heap allocation for ethtool_get_regs()
There is a conflict between commit b00916b1 and a77f5db3. This patch resolves
the conflict by clearing the heap allocation in ethtool_get_regs().

Cc: stable@kernel.org
Signed-off-by: Eugene Teo <eugeneteo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 21:05:17 -08:00
Hans Schillstrom
07924709f6 IPVS netns BUG, register sysctl for root ns
The newly created table was not used when register sysctl for a new namespace.
I.e. sysctl doesn't work for other than root namespace (init_net)

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-25 12:13:08 +10:00
David S. Miller
d80bc0fd26 ipv6: Always clone offlink routes.
Do not handle PMTU vs. route lookup creation any differently
wrt. offlink routes, always clone them.

Reported-by: PK <runningdoglackey@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 16:01:58 -08:00
Michał Mirosław
acd1130e87 net: reduce and unify printk level in netdev_fix_features()
Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will
be used by ethtool and might spam dmesg unnecessarily.

This converts the function to use netdev_info() instead of plain printk().

As a side effect, bonding and bridge devices will now log dropped features
on every slave device change.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:45:15 -08:00
Michał Mirosław
04ed3e741d net: change netdev->features to u32
Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
  struct netdev, as per Eric Dumazet's suggestion -DaveM ]

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:32:47 -08:00
Michał Mirosław
57422dc530 net: Move check of checksum features to netdev_fix_features()
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:29:11 -08:00
John Fastabend
3dce38a02d dcbnl: make get_app handling symmetric for IEEE and CEE DCBx
The IEEE get/set app handlers use generic routines and do not
require the net_device to implement the dcbnl_ops routines. This
patch makes it symmetric so user space and drivers do not have
to handle the CEE version and IEEE DCBx versions differently.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:19:55 -08:00
Ben Hutchings
c445477d74 net: RPS: Enable hardware acceleration of RFS
Allow drivers for multiqueue hardware with flow filter tables to
accelerate RFS.  The driver must:

1. Set net_device::rx_cpu_rmap to a cpu_rmap of the RX completion
IRQs (in queue order).  This will provide a mapping from CPUs to the
queues for which completions are handled nearest to them.

2. Implement net_device_ops::ndo_rx_flow_steer.  This operation adds
or replaces a filter steering the given flow to the given RX queue, if
possible.

3. Periodically remove filters for which rps_may_expire_flow() returns
true.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 14:53:01 -08:00
Eric Dumazet
fd0273c503 tcp: fix bug in listening_get_next()
commit a8b690f98b (tcp: Fix slowness in read /proc/net/tcp)
introduced a bug in handling of SYN_RECV sockets.

st->offset represents number of sockets found since beginning of
listening_hash[st->bucket].

We should not reset st->offset when iterating through
syn_table[st->sbucket], or else if more than ~25 sockets (if
PAGE_SIZE=4096) are in SYN_RECV state, we exit from listening_get_next()
with a too small st->offset

Next time we enter tcp_seek_last_pos(), we are not able to seek past
already found sockets.

Reported-by: PK <runningdoglackey@yahoo.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 14:41:20 -08:00
David S. Miller
3408404a4c inetpeer: Use correct AVL tree base pointer in inet_getpeer().
Family was hard-coded to AF_INET but should be daddr->family.

This fixes crashes when unlinking ipv6 peer entries, since the
unlink code was looking up the base pointer properly.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 14:38:09 -08:00
Michal Schmidt
d1dc7abf2f GRO: fix merging a paged skb after non-paged skbs
Suppose that several linear skbs of the same flow were received by GRO. They
were thus merged into one skb with a frag_list. Then a new skb of the same flow
arrives, but it is a paged skb with data starting in its frags[].

Before adding the skb to the frag_list skb_gro_receive() will of course adjust
the skb to throw away the headers. It correctly modifies the page_offset and
size of the frag, but it leaves incorrect information in the skb:
 ->data_len is not decreased at all.
 ->len is decreased only by headlen, as if no change were done to the frag.
Later in a receiving process this causes skb_copy_datagram_iovec() to return
-EFAULT and this is seen in userspace as the result of the recv() syscall.

In practice the bug can be reproduced with the sfc driver. By default the
driver uses an adaptive scheme when it switches between using
napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
reproduced when under rx load with enough successful GRO merging the driver
decides to switch from the former to the latter.

Manual control is also possible, so reproducing this is easy with netcat:
 - on machine1 (with sfc): nc -l 12345 > /dev/null
 - on machine2: nc machine1 12345 < /dev/zero
 - on machine1:
   echo 1 > /sys/module/sfc/parameters/rx_alloc_method  # use skbs
   echo 2 > /sys/module/sfc/parameters/rx_alloc_method  # use pages
 - See that nc has quit suddenly.

[v2: Modified by Eric Dumazet to avoid advancing skb->data past the end
     and to use a temporary variable.]

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 14:27:18 -08:00
David S. Miller
5bdc22a565 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/sched/sch_hfsc.c
	net/sched/sch_htb.c
	net/sched/sch_tbf.c
2011-01-24 14:09:35 -08:00
David S. Miller
e92427b289 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 2011-01-24 13:17:06 -08:00
Eric Dumazet
c506653d35 net: arp_ioctl() must hold RTNL
Commit 941666c2e3 "net: RCU conversion of dev_getbyhwaddr() and
arp_ioctl()" introduced a regression, reported by Jamie Heilman.
"arp -Ds 192.168.2.41 eth0 pub" triggered the ASSERT_RTNL() assert
in pneigh_lookup()

Removing RTNL requirement from arp_ioctl() was a mistake, just revert
that part.

Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 13:16:16 -08:00
Thomas Jacob
08b5194b5d netfilter: xt_iprange: Incorrect xt_iprange boundary check for IPv6
iprange_ipv6_sub was substracting 2 unsigned ints and then casting
the result to int to find out whether they are lt, eq or gt each
other, this doesn't work if the full 32 bits of each part
can be used in IPv6 addresses. Patch should remedy that without
significant performance penalties. Also number of ntohl
calls can be reduced this way (Jozsef Kadlecsik).

Signed-off-by: Thomas Jacob <jacob@internet24.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-24 21:35:36 +01:00
Pablo Neira Ayuso
c71caf4114 netfilter: ctnetlink: fix missing refcount increment during dumps
In 13ee6ac netfilter: fix race in conntrack between dump_table and
destroy, we recovered spinlocks to protect the dump of the conntrack
table according to reports from Stephen and acknowledgments on the
issue from Eric.

In that patch, the refcount bump that allows to keep a reference
to the current ct object was removed. However, we still decrement
the refcount for that object in the output path of
ctnetlink_dump_table():

        if (last)
                nf_ct_put(last)

Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-24 19:01:07 +01:00
Rusty Russell
577d6a7c3a module: fix missing semicolons in MODULE macro usage
You always needed them when you were a module, but the builtin versions
of the macros used to be more lenient.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-24 14:32:54 +10:30
Simon Horman
4b3fd57138 IPVS: Change sock_create_kernel() to __sock_create()
The recent netns changes omitted to change
sock_create_kernel() to __sock_create() in ip_vs_sync.c

The effect of this is that the interface will be selected in the
root-namespace, from my point of view it's a major bug.

Reported-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-22 13:48:01 +11:00
Changli Gao
091bb34c14 netfilter: ipvs: fix compiler warnings
Fix compiler warnings when no transport protocol load balancing support
is configured.

[horms@verge.net.au: removed suprious __ip_vs_cleanup() clean-up hunk]
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-22 13:19:36 +11:00
Eric Dumazet
23624935e0 net_sched: TCQ_F_CAN_BYPASS generalization
Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in
__dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs
than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq

SFQ is special because it can have external classifiers, and in these
cases, we cannot bypass queue discipline (packet could be dropped by
classifier) without admin asking it, or further changes.

Its worth doing this, especially for SFQ, avoiding dirtying memory in
case no packets are already waiting in queue.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-21 16:26:09 -08:00
Eric Dumazet
bb134d2298 net: netif_setup_tc() is static
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-21 13:08:27 -08:00
Bruno Randolf
59eb21a650 cfg80211: Extend channel to frequency mapping for 802.11j
Extend channel to frequency mapping for 802.11j Japan 4.9GHz band, according to
IEEE802.11 section 17.3.8.3.2 and Annex J. Because there are now overlapping
channel numbers in the 2GHz and 5GHz band we can't map from channel to
frequency without knowing the band. This is no problem as in most contexts we
know the band. In places where we don't know the band (and WEXT compatibility)
we assume the 2GHz band for channels below 14.

This patch does not implement all channel to frequency mappings defined in
802.11, it's just an extension for 802.11j 20MHz channels. 5MHz and 10MHz
channels as well as 802.11y channels have been omitted.

The following drivers have been updated to reflect the API changes:
iwl-3945, iwl-agn, iwmc3200wifi, libertas, mwl8k, rt2x00, wl1251, wl12xx.
The drivers have been compile-tested only.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: Brian Prodoehl <bprodoehl@gmail.com>
Acked-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-21 15:34:17 -05:00
Ben Greear
b305dae488 mac80211: Fix skb-copy failure debug message.
This particular error isn't about multicast.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-21 15:32:21 -05:00
Eric Dumazet
9190b3b320 net_sched: accurate bytes/packets stats/rates
In commit 44b8288308 (net_sched: pfifo_head_drop problem), we fixed
a problem with pfifo_head drops that incorrectly decreased
sch->bstats.bytes and sch->bstats.packets

Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
previously enqueued packet, and bstats cannot be changed, so
bstats/rates are not accurate (over estimated)

This patch changes the qdisc_bstats updates to be done at dequeue() time
instead of enqueue() time. bstats counters no longer account for dropped
frames, and rates are more correct, since enqueue() bursts dont have
effect on dequeue() rate.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 23:31:33 -08:00
ffa934f192 rtnetlink: fix link attribute validation with IFLA_GROUP
rtnl_group_changelink() is invoked by rtnl_newlink() before the link
attributes have been validated. Additionally the group changes are
performed even if NLM_F_CREATE is specified and a new link is
created, while more reasonable semantics would be to set the group
value on the newly created link.

Fix both problems by moving the rtnl_group_changelink() invocation
down to the handling of non-existant links without NLM_F_CREATE()
and add a dev_set_group() call to rtnl_create_link().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Vlad Dogaru <ddvlad@rosedu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 23:28:54 -08:00
David Rientjes
6a108a14fa kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel.  A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Eric Dumazet
f2eda47df4 ipv6: raw: rcu annotations
Remove sparse warnings, using a function typedef to be able to use __rcu
annotation on mh_filter pointer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:34 -08:00
Eric Dumazet
6193d2be29 neigh: __rcu annotations
fix some minor issues and sparse (__rcu) warnings

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:34 -08:00
Eric Dumazet
753ea8e962 net: ipv6: sit: fix rcu annotations
Fix minor __rcu annotations and remove sparse warnings

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:33 -08:00
Eric Dumazet
a2da570d62 net_sched: RCU conversion of stab
This patch converts stab qdisc management to RCU, so that we can perform
the qdisc_calculate_pkt_len() call before getting qdisc lock.

This shortens the lock's held time in __dev_xmit_skb().

This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of
cache misses and so reducing latencies.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:32 -08:00
Eric Dumazet
fd245a4adb net_sched: move TCQ_F_THROTTLED flag
In commit 3711210576 (net: QDISC_STATE_RUNNING dont need atomic bit
ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
the cache line containing qdisc lock and often dirtied fields.

I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
mostly, and shared by all cpus. This should speedup HTB/CBQ for example.

Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
"unsigned int" for __state container, reducing by 8 bytes Qdisc size.

Introduce helpers to hide implementation details.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:32 -08:00
Eric Dumazet
817fb15dfd net_sched: sfq: allow divisor to be a parameter
SFQ currently uses a 1024 slots hash table, and its internal structure
(sfq_sched_data) allocation needs order-1 page on x86_64

Allow tc command to specify a divisor value (hash table size), between 1
and 65536.
If no value is provided, assume the 1024 default size.

This allows admins to setup smaller (or bigger) SFQ for specific needs.

This also brings back sfq_sched_data allocations to order-0 ones, saving
3KB per SFQ qdisc.

Jesper uses ~55.000 SFQ in one machine, this patch should free 165 MB of
memory.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:16 -08:00
Eric Dumazet
3fbd8758b0 net: dev_close_many() is static
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Octavian Purdila <opurdila@ixiacom.com>
Reviewed-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:55:30 -08:00
Eric Dumazet
bced94ed5e netfilter: add a missing include in nf_conntrack_reasm.c
After commit ae90bdeaea (netfilter: fix compilation when conntrack is
disabled but tproxy is enabled) we have following warnings :

net/ipv6/netfilter/nf_conntrack_reasm.c:520:16: warning: symbol
'nf_ct_frag6_gather' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:591:6: warning: symbol
'nf_ct_frag6_output' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:612:5: warning: symbol
'nf_ct_frag6_init' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:640:6: warning: symbol
'nf_ct_frag6_cleanup' was not declared. Should it be static?

Fix this including net/netfilter/ipv6/nf_defrag_ipv6.h

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20 21:00:38 +01:00
Changli Gao
41a7cab6d3 netfilter: nf_nat: place conntrack in source hash after SNAT is done
If SNAT isn't done, the wrong info maybe got by the other cts.

As the filter table is after DNAT table, the packets dropped in filter
table also bother bysource hash table.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20 15:49:52 +01:00
82d800d8e7 Merge branch 'connlimit' of git://dev.medozas.de/linux
Conflicts:
	Documentation/feature-removal-schedule.txt

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20 10:33:55 +01:00
Florian Westphal
28a51ba59a netfilter: do not omit re-route check on NF_QUEUE verdict
ret != NF_QUEUE only works in the "--queue-num 0" case; for
queues > 0 the test should be '(ret & NF_VERDICT_MASK) != NF_QUEUE'.

However, NF_QUEUE no longer DROPs the skb unconditionally if queueing
fails (due to NF_VERDICT_FLAG_QUEUE_BYPASS verdict flag), so the
re-route test should also be performed if this flag is set in the
verdict.

The full test would then look something like

&& ((ret & NF_VERDICT_MASK) == NF_QUEUE && (ret & NF_VERDICT_FLAG_QUEUE_BYPASS))

This is rather ugly, so just remove the NF_QUEUE test altogether.

The only effect is that we might perform an unnecessary route lookup
in the NF_QUEUE case.

ip6table_mangle did not have such a check.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20 10:23:26 +01:00
David S. Miller
a07aa004c8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2011-01-20 00:06:15 -08:00
Eric Dumazet
cc7ec456f8 net_sched: cleanups
Cleanup net/sched code to current CodingStyle and practices.

Reduce inline abuse

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:12 -08:00
Alban Crequy
7180a03118 af_unix: coding style: remove one level of indentation in unix_shutdown()
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:11 -08:00
John Fastabend
b8970f0bfc net_sched: implement a root container qdisc sch_mqprio
This implements a mqprio queueing discipline that by default creates
a pfifo_fast qdisc per tx queue and provides the needed configuration
interface.

Using the mqprio qdisc the number of tcs currently in use along
with the range of queues alloted to each class can be configured. By
default skbs are mapped to traffic classes using the skb priority.
This mapping is configurable.

Configurable parameters,

struct tc_mqprio_qopt {
	__u8    num_tc;
	__u8    prio_tc_map[TC_BITMASK + 1];
	__u8    hw;
	__u16   count[TC_MAX_QUEUE];
	__u16   offset[TC_MAX_QUEUE];
};

Here the count/offset pairing give the queue alignment and the
prio_tc_map gives the mapping from skb->priority to tc.

The hw bit determines if the hardware should configure the count
and offset values. If the hardware bit is set then the operation
will fail if the hardware does not implement the ndo_setup_tc
operation. This is to avoid undetermined states where the hardware
may or may not control the queue mapping. Also minimal bounds
checking is done on the count/offset to verify a queue does not
exceed num_tx_queues and that queue ranges do not overlap. Otherwise
it is left to user policy or hardware configuration to create
useful mappings.

It is expected that hardware QOS schemes can be implemented by
creating appropriate mappings of queues in ndo_tc_setup().

One expected use case is drivers will use the ndo_setup_tc to map
queue ranges onto 802.1Q traffic classes. This provides a generic
mechanism to map network traffic onto these traffic classes and
removes the need for lower layer drivers to know specifics about
traffic types.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:11 -08:00
John Fastabend
4f57c087de net: implement mechanism for HW based QOS
This patch provides a mechanism for lower layer devices to
steer traffic using skb->priority to tx queues. This allows
for hardware based QOS schemes to use the default qdisc without
incurring the penalties related to global state and the qdisc
lock. While reliably receiving skbs on the correct tx ring
to avoid head of line blocking resulting from shuffling in
the LLD. Finally, all the goodness from txq caching and xps/rps
can still be leveraged.

Many drivers and hardware exist with the ability to implement
QOS schemes in the hardware but currently these drivers tend
to rely on firmware to reroute specific traffic, a driver
specific select_queue or the queue_mapping action in the
qdisc.

By using select_queue for this drivers need to be updated for
each and every traffic type and we lose the goodness of much
of the upstream work. Firmware solutions are inherently
inflexible. And finally if admins are expected to build a
qdisc and filter rules to steer traffic this requires knowledge
of how the hardware is currently configured. The number of tx
queues and the queue offsets may change depending on resources.
Also this approach incurs all the overhead of a qdisc with filters.

With the mechanism in this patch users can set skb priority using
expected methods ie setsockopt() or the stack can set the priority
directly. Then the skb will be steered to the correct tx queues
aligned with hardware QOS traffic classes. In the normal case with
single traffic class and all queues in this class everything
works as is until the LLD enables multiple tcs.

To steer the skb we mask out the lower 4 bits of the priority
and allow the hardware to configure upto 15 distinct classes
of traffic. This is expected to be sufficient for most applications
at any rate it is more then the 8021Q spec designates and is
equal to the number of prio bands currently implemented in
the default qdisc.

This in conjunction with a userspace application such as
lldpad can be used to implement 8021Q transmission selection
algorithms one of these algorithms being the extended transmission
selection algorithm currently being used for DCB.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:10 -08:00
Vlad Dogaru
e7ed828f10 netlink: support setting devgroup parameters
If a rtnetlink request specifies a negative or zero ifindex and has no
interface name attribute, but has a group attribute, then the chenges
are made to all the interfaces belonging to the specified group.

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:10 -08:00
Vlad Dogaru
cbda10fa97 net_device: add support for network device groups
Net devices can now be grouped, enabling simpler manipulation from
userspace. This patch adds a group field to the net_device structure, as
well as rtnetlink support to query and modify it.

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:09 -08:00
Shan Wei
441c793a56 net: cleanup unused macros in net directory
Clean up some unused macros in net/*.
1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE.
2. never be used since introduced to kernel.
   e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:20:04 -08:00
Linus Torvalds
1268afe676 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits)
  sctp: user perfect name for Delayed SACK Timer option
  net: fix can_checksum_protocol() arguments swap
  Revert "netlink: test for all flags of the NLM_F_DUMP composite"
  gianfar: Fix misleading indentation in startup_gfar()
  net/irda/sh_irda: return to RX mode when TX error
  net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.
  USB CDC NCM: tx_fixup() race condition fix
  ns83820: Avoid bad pointer deref in ns83820_init_one().
  ipv6: Silence privacy extensions initialization
  bnx2x: Update bnx2x version to 1.62.00-4
  bnx2x: Fix AER setting for BCM57712
  bnx2x: Fix BCM84823 LED behavior
  bnx2x: Mark full duplex on some external PHYs
  bnx2x: Fix BCM8073/BCM8727 microcode loading
  bnx2x: LED fix for BCM8727 over BCM57712
  bnx2x: Common init will be executed only once after POR
  bnx2x: Swap BCM8073 PHY polarity if required
  iwlwifi: fix valid chain reading from EEPROM
  ath5k: fix locking in tx_complete_poll_work
  ath9k_hw: do PA offset calibration only on longcal interval
  ...
2011-01-19 20:25:45 -08:00
Shan Wei
4580ccc04d sctp: user perfect name for Delayed SACK Timer option
The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK,
not SCTP_DELAYED_ACK.

Left SCTP_DELAYED_ACK be concomitant with SCTP_DELAYED_SACK,
for making compatibility with existing applications.

Reference:
8.1.19.  Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK)
(http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25)

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 16:51:29 -08:00
14f0290ba4 Merge branch 'master' of /repos/git/net-next-2.6 2011-01-19 23:51:37 +01:00
Eric Dumazet
d402786ea4 net: fix can_checksum_protocol() arguments swap
commit 0363466866 (net offloading: Convert checksums to use
centrally computed features.) mistakenly swapped can_checksum_protocol()
arguments.

This broke IPv6 on bnx2 for instance, on NIC without TCPv6 checksum
offloads.

Reported-by: Hans de Bruin <jmdebruin@xmsnet.nl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 14:15:21 -08:00
David S. Miller
b8f3ab4290 Revert "netlink: test for all flags of the NLM_F_DUMP composite"
This reverts commit 0ab03c2b14.

It breaks several things including the avahi daemon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 13:34:20 -08:00
f5c88f56b3 netfilter: nf_conntrack: fix lifetime display for disabled connections
When no tstamp extension exists, ct_delta_time() returns -1, which is
then assigned to an u64 and tested for negative values to decide
whether to display the lifetime. This obviously doesn't work, use
a s64 and merge the two minor functions into one.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-19 19:10:49 +01:00
Jan Engelhardt
cc4fc02257 netfilter: xtables: connlimit revision 1
This adds destination address-based selection. The old "inverse"
member is overloaded (memory-wise) with a new "flags" variable,
similar to how J.Park did it with xt_string rev 1. Since revision 0
userspace only sets flag 0x1, no great changes are made to explicitly
test for different revisions.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-19 18:27:46 +01:00
Johan Hedberg
765c2a964b Bluetooth: Fix race condition with conn->sec_level
The conn->sec_level value is supposed to represent the current level of
security that the connection has. However, by assigning to it before
requesting authentication it will have the wrong value during the
authentication procedure. To fix this a pending_sec_level variable is
added which is used to track the desired security level while making
sure that sec_level always represents the current level of security.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:43:11 -02:00
Johan Hedberg
d00ef24fc2 Bluetooth: Fix authentication request for L2CAP raw sockets
When there is an existing connection l2cap_check_security needs to be
called to ensure that the security level of the new socket is fulfilled.
Normally l2cap_do_start takes care of this, but that function doesn't
get called for SOCK_RAW type sockets. This patch adds the necessary
l2cap_check_security call to the appropriate branch in l2cap_do_connect.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:43 -02:00
Johan Hedberg
8556edd32f Bluetooth: Create a unified auth_type evaluation function
The logic for determining the needed auth_type for an L2CAP socket is
rather complicated and has so far been duplicated in
l2cap_check_security as well as l2cap_do_connect. Additionally the
l2cap_check_security code was completely missing the handling of
SOCK_RAW type sockets. This patch creates a unified function for the
evaluation and makes l2cap_do_connect and l2cap_check_security use that
function.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:43 -02:00
Johan Hedberg
65cf686ee1 Bluetooth: Fix MITM protection requirement preservation
If an existing connection has a MITM protection requirement (the first
bit of the auth_type) then that requirement should not be cleared by new
sockets that reuse the ACL but don't have that requirement.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:43 -02:00
Johan Hedberg
88644bb9fe Revert "Bluetooth: Update sec_level/auth_type for already existing connections"
This reverts commit 045309820a. That
commit is wrong for two reasons:

- The conn->sec_level shouldn't be updated without performing
authentication first (as it's supposed to represent the level of
security that the existing connection has)

- A higher auth_type value doesn't mean "more secure" like the commit
seems to assume. E.g. dedicated bonding with MITM protection is 0x03
whereas general bonding without MITM protection is 0x04. hci_conn_auth
already takes care of updating conn->auth_type so hci_connect doesn't
need to do it.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:42 -02:00
Lukáš Turek
683d949a7f Bluetooth: Never deallocate a session when some DLC points to it
Fix a bug introduced in commit 9cf5b0ea3a:
function rfcomm_recv_ua calls rfcomm_session_put without checking that
the session is not referenced by some DLC. If the session is freed, that
DLC would refer to deallocated memory, causing an oops later, as shown
in this bug report: https://bugzilla.kernel.org/show_bug.cgi?id=15994

Signed-off-by: Lukas Turek <8an@praha12.net>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:42 -02:00
Johan Hedberg
e2e0cacbd4 Bluetooth: Fix leaking blacklist when unregistering a hci device
The blacklist should be freed before the hci device gets unregistered.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:42 -02:00
David Sterba
4571928fc7 Bluetooth: l2cap: fix misuse of logical operation in place of bitop
CC: Marcel Holtmann <marcel@holtmann.org>
CC: "Gustavo F. Padovan" <padovan@profusion.mobi>
CC: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:42 -02:00
Felix Fietkau
fbb327c594 mac80211: drop non-auth 3-addr data frames when running as a 4-addr station
When running as a 4-addr station against an AP that has the 4-addr VLAN
interface and the main 3-addr AP interface bridged together, sometimes
frames originating from the station were looping back from the 3-addr AP
interface, causing the bridge code to emit warnings about receiving frames
with its own source address.
I'm not sure why this is happening yet, but I think it's a good idea to
drop all frames (except 802.1x/EAP frames) that do not match the configured
addressing mode, including 4-address frames sent to a 3-address station.
User test reports indicate that the problem goes away with this patch.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:12 -05:00
Johannes Berg
5dd36bc933 mac80211: allow advertising correct maximum aggregate size
Currently, mac80211 always advertises that it may send
up to 64 subframes in an aggregate. This is fine, since
it's the max, but might as well be set to zero instead
since it doesn't have any information.

However, drivers might have that information, so allow
them to set a variable giving it, which will then be
used. The default of zero will be fine since to the
peer that means we don't know and it will just use its
own limit for the buffer size.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:12 -05:00
Johannes Berg
0b01f030d3 mac80211: track receiver's aggregation reorder buffer size
The aggregation code currently doesn't implement the
buffer size negotiation. It will always request a max
buffer size (which is fine, if a little pointless, as
the mac80211 code doesn't know and might just use 0
instead), but if the peer requests a smaller size it
isn't possible to honour this request.

In order to fix this, look at the buffer size in the
addBA response frame, keep track of it and pass it to
the driver in the ampdu_action callback when called
with the IEEE80211_AMPDU_TX_OPERATIONAL action. That
way the driver can limit the number of subframes in
aggregates appropriately.

Note that this doesn't fix any drivers apart from the
addition of the new argument -- they all need to be
updated separately to use this variable!

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:11 -05:00
Johannes Berg
ac1bd8464f mac80211: don't return beacons when mesh is disabled
When mesh is disabled, mac80211 was returning
beacons with an empty mesh ID. That isn't
desirable, even if drivers shouldn't be trying
to get beacons to start with.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:11 -05:00
Ben Greear
bfc31df33b mac80211: Show max retry-counts in kernel messages.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:09 -05:00
Wey-Yi Guy
0a65169b1f mac80211: mesh only parameter mppath maybe unused
mppath is mesh related parameter and maybe unused

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:09 -05:00
Luciano Coelho
df6ba5d80d mac80211: add hw configuration for max ampdu buffer size
Some devices don't support the maximum AMDPU buffer size of 64, so we
need to add an option to configure this in the hardware configuration.
This value will be used in the ADDBA response instead of the value
suggested in the request, if the latter is greater than the max
supported.

Signed-off-by: Luciano Coelho <coelho@ti.com>
Tested-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:09 -05:00
Nick Ledovskikh
dcac908bab mac80211:mesh_mpp_table_grow call should depend on MESH_WORK_GROW_MPP_TABLE flag.
Replace MESH_WORK_GROW_MPATH_TABLE by MESH_WORK_GROW_MPP_TABLE in
mesh_mpp_table_grow call condition.

(Clearly the original was a typo... -- JWL)

Signed-off-by: Nickolay Ledovskikh <nledovskikh@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:08 -05:00
Joel A Fernandes
9d52501b42 mac80211: Rewrote code for checking if destinations are proxied.
Rewrote code for checking if the destination is proxied by a mesh portal, to facilitate better
understanding of the functionality.

Signed-off-by: Joel A Fernandes <agnel.joel@gmail.com>
Acked-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:07 -05:00
Pablo Neira Ayuso
a992ca2a04 netfilter: nf_conntrack_tstamp: add flow-based timestamp extension
This patch adds flow-based timestamping for conntracks. This
conntrack extension is disabled by default. Basically, we use
two 64-bits variables to store the creation timestamp once the
conntrack has been confirmed and the other to store the deletion
time. This extension is disabled by default, to enable it, you
have to:

echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp

This patch allows to save memory for user-space flow-based
loogers such as ulogd2. In short, ulogd2 does not need to
keep a hashtable with the conntrack in user-space to know
when they were created and destroyed, instead we use the
kernel timestamp. If we want to have a sane IPFIX implementation
in user-space, this nanosecs resolution timestamps are also
useful. Other custom user-space applications can benefit from
this via libnetfilter_conntrack.

This patch modifies the /proc output to display the delta time
in seconds since the flow start. You can also obtain the
flow-start date by means of the conntrack-tools.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-19 16:00:07 +01:00
Eric Dumazet
80f8f1027b net: filter: dont block softirqs in sk_run_filter()
Packet filter (BPF) doesnt need to disable softirqs, being fully
re-entrant and lock-less.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18 21:33:05 -08:00
Alban Crequy
d6ae3bae3d af_unix: implement socket filter
Linux Socket Filters can already be successfully attached and detached on unix
sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...).
See: Documentation/networking/filter.txt

But the filter was never used in the unix socket code so it did not work. This
patch uses sk_filter() to filter buffers before delivery.

This short program demonstrates the problem on SOCK_DGRAM.

int main(void) {
  int i, j, ret;
  int sv[2];
  struct pollfd fds[2];
  char *message = "Hello world!";
  char buffer[64];
  struct sock_filter ins[32] = {{0,},};
  struct sock_fprog filter;

  socketpair(AF_UNIX, SOCK_DGRAM, 0, sv);

  for (i = 0 ; i < 2 ; i++) {
    fds[i].fd = sv[i];
    fds[i].events = POLLIN;
    fds[i].revents = 0;
  }

  for(j = 1 ; j < 13 ; j++) {

    /* Set a socket filter to truncate the message */
    memset(ins, 0, sizeof(ins));
    ins[0].code = BPF_RET|BPF_K;
    ins[0].k = j;
    filter.len = 1;
    filter.filter = ins;
    setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter));

    /* send a message */
    send(sv[0], message, strlen(message) + 1, 0);

    /* The filter should let the message pass but truncated. */
    poll(fds, 2, 0);

    /* Receive the truncated message*/
    ret = recv(sv[1], buffer, 64, 0);
    printf("received %d bytes, expected %d\n", ret, j);
  }

    for (i = 0 ; i < 2 ; i++)
      close(sv[i]);

  return 0;
}

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18 21:33:05 -08:00
David S. Miller
a5db219f4c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-18 16:28:31 -08:00
Jesse Gross
6ee400aafb net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.
In netif_skb_features() we return only the features that are valid for vlans
if we have a vlan packet.  However, we should not mask out NETIF_F_HW_VLAN_TX
since it enables transmission of vlan tags and is obviously valid.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18 16:13:50 -08:00
Romain Francoise
2fdc1c8093 ipv6: Silence privacy extensions initialization
When a network namespace is created (via CLONE_NEWNET), the loopback
interface is automatically added to the new namespace, triggering a
printk in ipv6_add_dev() if CONFIG_IPV6_PRIVACY is set.

This is problematic for applications which use CLONE_NEWNET as
part of a sandbox, like Chromium's suid sandbox or recent versions of
vsftpd. On a busy machine, it can lead to thousands of useless
"lo: Disabled Privacy Extensions" messages appearing in dmesg.

It's easy enough to check the status of privacy extensions via the
use_tempaddr sysctl, so just removing the printk seems like the most
sensible solution.

Signed-off-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18 16:13:49 -08:00
David S. Miller
f966a13f92 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-01-18 12:50:19 -08:00
Jiri Olsa
93557f53e1 netfilter: nf_conntrack: nf_conntrack snmp helper
Adding support for SNMP broadcast connection tracking. The SNMP
broadcast requests are now paired with the SNMP responses.
Thus allowing using SNMP broadcasts with firewall enabled.

Please refer to the following conversation:
http://marc.info/?l=netfilter-devel&m=125992205006600&w=2

Patrick McHardy wrote:
> > The best solution would be to add generic broadcast tracking, the
> > use of expectations for this is a bit of abuse.
> > The second best choice I guess would be to move the help() function
> > to a shared module and generalize it so it can be used for both.
This patch implements the "second best choice".

Since the netbios-ns conntrack module uses the same helper
functionality as the snmp, only one helper function is added
for both snmp and netbios-ns modules into the new object -
nf_conntrack_broadcast.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 18:12:24 +01:00
Eric Dumazet
94d117a1c7 netfilter: ipt_CLUSTERIP: remove "no conntrack!"
When a packet is meant to be handled by another node of the cluster,
silently drop it instead of flooding kernel log.

Note : INVALID packets are also dropped without notice.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 16:27:56 +01:00
a8fc0d9b34 Merge branch 'master' of git://dev.medozas.de/linux 2011-01-18 16:20:53 +01:00
Florian Westphal
94b27cc361 netfilter: allow NFQUEUE bypass if no listener is available
If an skb is to be NF_QUEUE'd, but no program has opened the queue, the
packet is dropped.

This adds a v2 target revision of xt_NFQUEUE that allows packets to
continue through the ruleset instead.

Because the actual queueing happens outside of the target context, the
'bypass' flag has to be communicated back to the netfilter core.

Unfortunately the only choice to do this without adding a new function
argument is to use the target function return value (i.e. the verdict).

In the NF_QUEUE case, the upper 16bit already contain the queue number
to use.  The previous patch reduced NF_VERDICT_MASK to 0xff, i.e.
we now have extra room for a new flag.

If a hook issued a NF_QUEUE verdict, then the netfilter core will
continue packet processing if the queueing hook
returns -ESRCH (== "this queue does not exist") and the new
NF_VERDICT_FLAG_QUEUE_BYPASS flag is set in the verdict value.

Note: If the queue exists, but userspace does not consume packets fast
enough, the skb will still be dropped.

Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 16:08:30 +01:00
Florian Westphal
f615df76ed netfilter: reduce NF_VERDICT_MASK to 0xff
NF_VERDICT_MASK is currently 0xffff. This is because the upper
16 bits are used to store errno (for NF_DROP) or the queue number
(NF_QUEUE verdict).

As there are up to 0xffff different queues available, there is no more
room to store additional flags.

At the moment there are only 6 different verdicts, i.e. we can reduce
NF_VERDICT_MASK to 0xff to allow storing additional flags in the 0xff00 space.

NF_VERDICT_BITS would then be reduced to 8, but because the value is
exported to userspace, this might cause breakage; e.g.:

e.g. 'queuenr = (1 << NF_VERDICT_BITS) | NF_QUEUE'  would now break.

Thus, remove NF_VERDICT_BITS usage in the kernel and move the old value
to the 'userspace compat' section.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:52:14 +01:00
Florian Westphal
06cdb6349c netfilter: nfnetlink_queue: do not free skb on error
Move free responsibility from nf_queue to caller.
This enables more flexible error handling; we can now accept the skb
instead of freeing it.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:28:38 +01:00
Florian Westphal
f158508618 netfilter: nfnetlink_queue: return error number to caller
instead of returning -1 on error, return an error number to allow the
caller to handle some errors differently.

ECANCELED is used to indicate that the hook is going away and should be
ignored.

A followup patch will introduce more 'ignore this hook' conditions,
(depending on queue settings) and will move kfree_skb responsibility
to the caller.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:27:28 +01:00
Florian Westphal
5f2cafe736 netfilter: Kconfig: NFQUEUE is useless without NETFILTER_NETLINK_QUEUE
NFLOG already does the same thing for NETFILTER_NETLINK_LOG.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:18:08 +01:00
Changli Gao
45eec34195 netfilter: nf_conntrack: remove an atomic bit operation
As this ct won't be seen by the others, we don't need to set the
IPS_CONFIRMED_BIT in atomic way.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:08:13 +01:00
Changli Gao
a7c2f4d7da netfilter: nf_nat: fix conversion to non-atomic bit ops
My previous patch (netfilter: nf_nat: don't use atomic bit operation)
made a mistake when converting atomic_set to a normal bit 'or'.
IPS_*_BIT should be replaced with IPS_*.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:02:48 +01:00
Richard Weinberger
1cc34c30be netfilter: xt_connlimit: use hotdrop jump mark
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-18 06:50:41 +01:00
Jan Engelhardt
f1e231a356 netfilter: xtables: add missing aliases for autoloading via iptables
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-18 06:33:54 +01:00
Thomas Graf
fbabf31e4d netfilter: create audit records for x_tables replaces
The setsockopt() syscall to replace tables is already recorded
in the audit logs. This patch stores additional information
such as table name and netfilter protocol.

Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-16 18:12:59 +01:00
Thomas Graf
43f393caec netfilter: audit target to record accepted/dropped packets
This patch adds a new netfilter target which creates audit records
for packets traversing a certain chain.

It can be used to record packets which are rejected administraively
as follows:

  -N AUDIT_DROP
  -A AUDIT_DROP -j AUDIT --type DROP
  -A AUDIT_DROP -j DROP

a rule which would typically drop or reject a packet would then
invoke the new chain to record packets before dropping them.

  -j AUDIT_DROP

The module is protocol independant and works for iptables, ip6tables
and ebtables.

The following information is logged:
 - netfilter hook
 - packet length
 - incomming/outgoing interface
 - MAC src/dst/proto for ethernet packets
 - src/dst/protocol address for IPv4/IPv6
 - src/dst port for TCP/UDP/UDPLITE
 - icmp type/code

Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-16 18:10:28 +01:00
Dan Carpenter
01a859014b caif: checking the wrong variable
In the original code we check if (servl == NULL) twice.  The first time
should print the message that cfmuxl_remove_uplayer() failed and set
"ret" correctly, but instead it just returns success.  The second check
should be checking the value of "ret" instead of "servl".

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-15 20:58:11 -08:00
Kurt Van Dijck
5e50732803 can: test size of struct sockaddr in sendmsg
This patch makes the CAN socket code conform to the manpage of sendmsg.

Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-15 20:56:42 -08:00
David S. Miller
d78c68efa8 Merge branch 'for-david' of git://git.open-mesh.org/ecsv/linux-merge 2011-01-15 20:48:28 -08:00
Sven Eckelmann
aa0adb1a85 batman-adv: Use "__attribute__" shortcut macros
Linux 2.6.21 defines different macros for __attribute__ which are also
used inside batman-adv. The next version of checkpatch.pl warns about
the usage of __attribute__((packed))).

Linux 2.6.33 defines an extra macro __always_unused which is used to
assist source code analyzers and can be used to removed the last
existing __attribute__ inside the source code.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-16 03:25:19 +01:00
Linus Torvalds
d018b6f4f1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
  GRETH: resolve SMP issues and other problems
  GRETH: handle frame error interrupts
  GRETH: avoid writing bad speed/duplex when setting transfer mode
  GRETH: fixed skb buffer memory leak on frame errors
  GRETH: GBit transmit descriptor handling optimization
  GRETH: fix opening/closing
  GRETH: added raw AMBA vendor/device number to match against.
  cassini: Fix build bustage on x86.
  e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs
  e1000e: update Copyright for 2011
  e1000: Avoid unhandled IRQ
  r8169: keep firmware in memory.
  netdev: tilepro: Use is_unicast_ether_addr helper
  etherdevice.h: Add is_unicast_ether_addr function
  ks8695net: Use default implementation of ethtool_ops::get_link
  ks8695net: Disable non-working ethtool operations
  USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable.
  vxge: Remember to release firmware after upgrading firmware
  netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr
  ipsec: update MAX_AH_AUTH_LEN to support sha512
  ...
2011-01-14 13:25:30 -08:00
Linus Torvalds
18bce371ae Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits)
  nfsd4: fix callback restarting
  nfsd: break lease on unlink, link, and rename
  nfsd4: break lease on nfsd setattr
  nfsd: don't support msnfs export option
  nfsd4: initialize cb_per_client
  nfsd4: allow restarting callbacks
  nfsd4: simplify nfsd4_cb_prepare
  nfsd4: give out delegations more quickly in 4.1 case
  nfsd4: add helper function to run callbacks
  nfsd4: make sure sequence flags are set after destroy_session
  nfsd4: re-probe callback on connection loss
  nfsd4: set sequence flag when backchannel is down
  nfsd4: keep finer-grained callback status
  rpc: allow xprt_class->setup to return a preexisting xprt
  rpc: keep backchannel xprt as long as server connection
  rpc: move sk_bc_xprt to svc_xprt
  nfsd4: allow backchannel recovery
  nfsd4: support BIND_CONN_TO_SESSION
  nfsd4: modify session list under cl_lock
  Documentation: fl_mylease no longer exists
  ...

Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work.  The
vfs-scale work touched some msnfs cases, and this merge removes support
for that entirely, so the conflict was trivial to resolve.
2011-01-14 13:17:26 -08:00
Tejun Heo
e1fcc7e2a7 rxrpc: rxrpc_workqueue isn't used during memory reclaim
rxrpc_workqueue isn't depended upon while reclaiming memory.  Convert
to alloc_workqueue() without WQ_MEM_RECLAIM.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: linux-afs@lists.infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-14 09:25:11 -08:00
d862a6622e netfilter: nf_conntrack: use is_vmalloc_addr()
Use is_vmalloc_addr() in nf_ct_free_hashtable() and get rid of
the vmalloc flags to indicate that a hash table has been allocated
using vmalloc().

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-14 15:45:56 +01:00
0134e89c7b Merge branch 'master' of git://1984.lsi.us.es/net-next-2.6
Conflicts:
	net/ipv4/route.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-14 14:12:37 +01:00
c7066f70d9 netfilter: fix Kconfig dependencies
Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE,
which itself depends on NET_SCHED; this dependency is missing from netfilter.

Since matching on realms is also useful without having NET_SCHED enabled and
the option really only controls whether the tclassid member is included in
route and dst entries, rename the config option to IP_ROUTE_CLASSID and move
it outside of traffic scheduling context to get rid of the NET_SCHED dependeny.

Reported-by: Vladis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-14 13:36:42 +01:00
Eric Dumazet
1ac9ad1394 net: remove dev_txq_stats_fold()
After recent changes, (percpu stats on vlan/tunnels...), we dont need
anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.

Only remaining users are ixgbe, sch_teql, gianfar & macvlan :

1) ixgbe can be converted to use existing tx_ring counters.

2) macvlan incremented txq->tx_dropped, it can use the
dev->stats.tx_dropped counter.

3) sch_teql : almost revert ab35cd4b8f (Use net_device internal stats)
    Now we have ndo_get_stats64(), use it, even for "unsigned long"
fields (No need to bring back a struct net_device_stats)

4) gianfar adds a stats structure per tx queue to hold
tx_bytes/tx_packets

This removes a lockdep warning (and possible lockup) in rndis gadget,
calling dev_get_stats() from hard IRQ context.

Ref: http://www.spinics.net/lists/netdev/msg149202.html

Reported-by: Neil Jones <neiljay@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Sandeep Gopalpet <sandeep.kumar@freescale.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-13 21:44:34 -08:00
Jesper Juhl
ed7809d9c4 batman-adv: Even Batman should not dereference NULL pointers
There's a problem in net/batman-adv/unicast.c::frag_send_skb().
dev_alloc_skb() allocates memory and may fail, thus returning NULL. If
this happens we'll pass a NULL pointer on to skb_split() which in turn
hands it to skb_split_inside_header() from where it gets passed to
skb_put() that lets skb_tail_pointer() play with it and that function
dereferences it. And thus the bat dies.

While I was at it I also moved the call to dev_alloc_skb() above the
assignment to 'unicast_packet' since there's no reason to do that
assignment if the memory allocation fails.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-13 22:11:12 +01:00
Luciano Coelho
82694f764d mac80211: use maximum number of AMPDU frames as default in BA RX
When the buffer size is set to zero in the block ack parameter set
field, we should use the maximum supported number of subframes.  The
existing code was bogus and was doing some unnecessary calculations
that lead to wrong values.

Thanks Johannes for helping me figure this one out.

Cc: stable@kernel.org
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luciano Coelho <coelho@ti.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-13 15:46:45 -05:00
Johannes Berg
681c4d07dd mac80211: fix lockdep warning
Since the introduction of the fixes for the
reorder timer, mac80211 will cause lockdep
warnings because lockdep confuses
local->skb_queue and local->rx_skb_queue
and treats their lock as the same.

However, their locks are different, and are
valid in different contexts (the former is
used in IRQ context, the latter in BH only)
and the only thing to be done is mark the
former as a different lock class so that
lockdep can tell the difference.

Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Sujith <m.sujith@gmail.com>
Reported-by: Miles Lane <miles.lane@gmail.com>
Tested-by: Sujith <m.sujith@gmail.com>
Tested-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-13 15:46:45 -05:00
David S. Miller
1949e084bf Merge branch 'master' of git://1984.lsi.us.es/net-2.6 2011-01-13 12:34:21 -08:00
Linus Torvalds
b2034d474b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits)
  fs: add documentation on fallocate hole punching
  Gfs2: fail if we try to use hole punch
  Btrfs: fail if we try to use hole punch
  Ext4: fail if we try to use hole punch
  Ocfs2: handle hole punching via fallocate properly
  XFS: handle hole punching via fallocate properly
  fs: add hole punching to fallocate
  vfs: pass struct file to do_truncate on O_TRUNC opens (try #2)
  fix signedness mess in rw_verify_area() on 64bit architectures
  fs: fix kernel-doc for dcache::prepend_path
  fs: fix kernel-doc for dcache::d_validate
  sanitize ecryptfs ->mount()
  switch afs
  move internal-only parts of ncpfs headers to fs/ncpfs
  switch ncpfs
  switch 9p
  pass default dentry_operations to mount_pseudo()
  switch hostfs
  switch affs
  switch configfs
  ...
2011-01-13 10:27:28 -08:00
Linus Torvalds
27d189c02b Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits)
  hwrng: via_rng - Fix memory scribbling on some CPUs
  crypto: padlock - Move padlock.h into include/crypto
  hwrng: via_rng - Fix asm constraints
  crypto: n2 - use __devexit not __exit in n2_unregister_algs
  crypto: mark crypto workqueues CPU_INTENSIVE
  crypto: mv_cesa - dont return PTR_ERR() of wrong pointer
  crypto: ripemd - Set module author and update email address
  crypto: omap-sham - backlog handling fix
  crypto: gf128mul - Remove experimental tag
  crypto: af_alg - fix af_alg memory_allocated data type
  crypto: aesni-intel - Fixed build with binutils 2.16
  crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets
  net: Add missing lockdep class names for af_alg
  include: Install linux/if_alg.h for user-space crypto API
  crypto: omap-aes - checkpatch --file warning fixes
  crypto: omap-aes - initialize aes module once per request
  crypto: omap-aes - unnecessary code removed
  crypto: omap-aes - error handling implementation improved
  crypto: omap-aes - redundant locking is removed
  crypto: omap-aes - DMA initialization fixes for OMAP off mode
  ...
2011-01-13 10:25:58 -08:00
Linus Torvalds
a170315420 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: fix cleanup when trying to mount inexistent image
  net/ceph: make ceph_msgr_wq non-reentrant
  ceph: fsc->*_wq's aren't used in memory reclaim path
  ceph: Always free allocated memory in osdmap_decode()
  ceph: Makefile: Remove unnessary code
  ceph: associate requests with opening sessions
  ceph: drop redundant r_mds field
  ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS
  ceph: add dir_layout to inode
2011-01-13 10:25:24 -08:00
Linus Torvalds
008d23e485 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  Documentation/trace/events.txt: Remove obsolete sched_signal_send.
  writeback: fix global_dirty_limits comment runtime -> real-time
  ppc: fix comment typo singal -> signal
  drivers: fix comment typo diable -> disable.
  m68k: fix comment typo diable -> disable.
  wireless: comment typo fix diable -> disable.
  media: comment typo fix diable -> disable.
  remove doc for obsolete dynamic-printk kernel-parameter
  remove extraneous 'is' from Documentation/iostats.txt
  Fix spelling milisec -> ms in snd_ps3 module parameter description
  Fix spelling mistakes in comments
  Revert conflicting V4L changes
  i7core_edac: fix typos in comments
  mm/rmap.c: fix comment
  sound, ca0106: Fix assignment to 'channel'.
  hrtimer: fix a typo in comment
  init/Kconfig: fix typo
  anon_inodes: fix wrong function name in comment
  fix comment typos concerning "consistent"
  poll: fix a typo in comment
  ...

Fix up trivial conflicts in:
 - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
 - fs/ext4/ext4.h

Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13 10:05:56 -08:00
Pablo Neira Ayuso
f31e8d4982 netfilter: ctnetlink: fix loop in ctnetlink_get_conntrack()
This patch fixes a loop in ctnetlink_get_conntrack() that can be
triggered if you use the same socket to receive events and to
perform a GET operation. Under heavy load, netlink_unicast()
may return -EAGAIN, this error code is reserved in nfnetlink for
the module load-on-demand. Instead, we return -ENOBUFS which is
the appropriate error code that has to be propagated to
user-space.

Reported-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-13 17:03:39 +01:00
Florian Westphal
6faee60a4e netfilter: ebt_ip6: allow matching on ipv6-icmp types/codes
To avoid adding a new match revision icmp type/code are stored
in the sport/dport area.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Holger Eitzenberger <holger@eitzenberger.org>
Reviewed-by: Bart De Schuymer<bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-13 12:05:12 +01:00
Eric Dumazet
255d0dc340 netfilter: x_table: speedup compat operations
One iptables invocation with 135000 rules takes 35 seconds of cpu time
on a recent server, using a 32bit distro and a 64bit kernel.

We eventually trigger NMI/RCU watchdog.

INFO: rcu_sched_state detected stall on CPU 3 (t=6000 jiffies)

COMPAT mode has quadratic behavior and consume 16 bytes of memory per
rule.

Switch the xt_compat algos to use an array instead of list, and use a
binary search to locate an offset in the sorted array.

This halves memory need (8 bytes per rule), and removes quadratic
behavior [ O(N*N) -> O(N*log2(N)) ]

Time of iptables goes from 35 s to 150 ms.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-13 12:05:12 +01:00
b017900aac netfilter: xt_conntrack: support matching on port ranges
Add a new revision 3 that contains port ranges for all of origsrc,
origdst, replsrc and repldst. The high ports are appended to the
original v2 data structure to allow sharing most of the code with
v1 and v2. Use of the revision specific port matching function is
made dependant on par->match->revision.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-13 12:05:12 +01:00
Randy Dunlap
3806b4f3b6 eth: fix new kernel-doc warning
Fix new kernel-doc warning (copy-paste typo):

Warning(net/ethernet/eth.c:366): No description found for parameter 'rxqs'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-12 19:00:40 -08:00
David S. Miller
464143c911 Merge branch 'master' of git://1984.lsi.us.es/net-2.6 2011-01-12 18:58:40 -08:00
Alexey Kuznetsov
72b43d0898 inet6: prevent network storms caused by linux IPv6 routers
Linux IPv6 forwards unicast packets, which are link layer multicasts...
The hole was present since day one. I was 100% this check is there, but it is not.

The problem shows itself, f.e. when Microsoft Network Load Balancer runs on a network.
This software resolves IPv6 unicast addresses to multicast MAC addresses.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-12 18:51:55 -08:00
Hans Schillstrom
c6d2d445d8 IPVS: netns, final patch enabling network name space.
all init_net removed, (except for some alloc related
that needs to be there)

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:29 +09:00
Hans Schillstrom
4a98480bcc IPVS: netns, misc init_net removal in core.
init_net removed in __ip_vs_addr_is_local_v6, and got net as param.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:29 +09:00
Hans Schillstrom
763f8d0ed4 IPVS: netns, svc counters moved in ip_vs_ctl,c
Last two global vars to be moved,
ip_vs_ftpsvc_counter and ip_vs_nullsvc_counter.

[horms@verge.net.au: removed whitespace-change-only hunk]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:28 +09:00
Hans Schillstrom
f2431e6e92 IPVS: netns, trash handling
trash list per namspace,
and reordering of some params in dst struct.

[ horms@verge.net.au: Use cancel_delayed_work_sync() instead of
	              cancel_rearming_delayed_work(). Found during
		      merge conflict resoliution ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:28 +09:00