Archived
14
0
Fork 0
Commit graph

469 commits

Author SHA1 Message Date
abc3bc5804 [NET]: Kill skb->tc_classid
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:31:18 -07:00
David S. Miller
8728b834b2 [NET]: Kill skb->list
Remove the "list" member of struct sk_buff, as it is entirely
redundant.  All SKB list removal callers know which list the
SKB is on, so storing this in sk_buff does nothing other than
taking up some space.

Two tricky bits were SCTP, which I took care of, and two ATM
drivers which Francois Romieu <romieu@fr.zoreil.com> fixed
up.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
2005-08-29 15:31:14 -07:00
Harald Welte
6869c4d8e0 [NETFILTER]: reduce netfilter sk_buff enlargement
As discussed at netconf'05, we're trying to save every bit in sk_buff.
The patch below makes sk_buff 8 bytes smaller.  I did some basic
testing on my notebook and it seems to work.

The only real in-tree user of nfcache was IPVS, who only needs a
single bit.  Unfortunately I couldn't find some other free bit in
sk_buff to stuff that bit into, so I introduced a separate field for
them.  Maybe the IPVS guys can resolve that to further save space.

Initially I wanted to shrink pkt_type to three bits (PACKET_HOST and
alike are only 6 values defined), but unfortunately the bluetooth code
overloads pkt_type :(

The conntrack-event-api (out-of-tree) uses nfcache, but Rusty just
came up with a way how to do it without any skb fields, so it's safe
to remove it.

- remove all never-implemented 'nfcache' code
- don't have ipvs code abuse 'nfcache' field. currently get's their own
  compile-conditional skb->ipvs_property field.  IPVS maintainers can
  decide to move this bit elswhere, but nfcache needs to die.
- remove skb->nfcache field to save 4 bytes
- move skb->nfctinfo into three unused bits to save further 4 bytes

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:31:04 -07:00
Harald Welte
bf3a46aa9b [NETFILTER]: convert nfmark and conntrack mark to 32bit
As discussed at netconf'05, we convert nfmark and conntrack-mark to be
32bits even on 64bit architectures.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:29:31 -07:00
06c7427021 [FIB_TRIE]: Don't ignore negative results from fib_semantic_match
When a semantic match occurs either success, not found or an error
(for matching unreachable routes/blackholes) is returned. fib_trie
ignores the errors and looks for a different matching route. Treat
results other than "no match" as success and end lookup.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 22:06:09 -07:00
David S. Miller
c1cc168442 [ROSE]: Fix typo in rose_route_frame() locking fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 14:55:32 -07:00
David S. Miller
dc16aaf29d [ROSE]: Fix missing unlocks in rose_route_frame()
Noticed by Coverity checker.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:50:09 -07:00
David S. Miller
d5d283751e [TCP]: Document non-trivial locking path in tcp_v{4,6}_get_port().
This trips up a lot of folks reading this code.
Put an unlikely() around the port-exhaustion test
for good measure.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:49:54 -07:00
David S. Miller
89ebd197eb [TCP]: Unconditionally clear TCP_NAGLE_PUSH in skb_entail().
Intention of this bit is to force pushing of the existing
send queue when TCP_CORK or TCP_NODELAY state changes via
setsockopt().

But it's easy to create a situation where the bit never
clears.  For example, if the send queue starts empty:

1) set TCP_NODELAY
2) clear TCP_NODELAY
3) set TCP_CORK
4) do small write()

The current code will leave TCP_NAGLE_PUSH set after that
sequence.  Unconditionally clearing the bit when new data
is added via skb_entail() solves the problem.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:13:06 -07:00
Thomas Graf
0fbbeb1ba4 [PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()
qdisc_create_dflt() is missing to destroy the newly allocated
default qdisc if the initialization fails resulting in leaks
of all kinds. The only caller in mainline which may trigger
this bug is sch_tbf.c in tbf_create_dflt_qdisc().

Note: qdisc_create_dflt() doesn't fulfill the official locking
      requirements of qdisc_destroy() but since the qdisc could
      never be seen by the outside world this doesn't matter
      and it can stay as-is until the locking of pkt_sched
      is cleaned up.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:12:44 -07:00
Vlad Yasevich
d2287f8441 [SCTP]: Add SENTINEL to SCTP MIB stats
Add SNMP_MIB_SENTINEL to the definition of the sctp_snmp_list so that
the output routine in proc correctly terminates.  This was causing some
problems running on ia64 systems.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:12:04 -07:00
Ralf Baechle
01d7dd0e9f [AX25]: UID fixes
o Brown paperbag bug - ax25_findbyuid() was always returning a NULL pointer
   as the result.  Breaks ROSE completly and AX.25 if UID policy set to deny.

 o While the list structure of AX.25's UID to callsign mapping table was
   properly protected by a spinlock, it's elements were not refcounted
   resulting in a race between removal and usage of an element.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:11:45 -07:00
Ralf Baechle
53b924b31f [NET]: Fix socket bitop damage
The socket flag cleanups that went into 2.6.12-rc1 are basically oring
the flags of an old socket into the socket just being created.
Unfortunately that one was just initialized by sock_init_data(), so already
has SOCK_ZAPPED set.  As the result zapped sockets are created and all
incoming connection will fail due to this bug which again was carefully
replicated to at least AX.25, NET/ROM or ROSE.

In order to keep the abstraction alive I've introduced sock_copy_flags()
to copy the socket flags from one sockets to another and used that
instead of the bitwise copy thing.  Anyway, the idea here has probably
been to copy all flags, so sock_copy_flags() should be the right thing.
With this the ham radio protocols are usable again, so I hope this will
make it into 2.6.13.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:11:30 -07:00
66a79a19a7 [NETFILTER]: Fix HW checksum handling in ip_queue/ip6_queue
The checksum needs to be filled in on output, after mangling a packet
ip_summed needs to be reset.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:10:35 -07:00
Dave Johnson
1344a41637 [IPV4]: Fix negative timer loop with lots of ipv4 peers.
From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>

Found this bug while doing some scaling testing that created 500K inet
peers.

peer_check_expire() in net/ipv4/inetpeer.c isn't using inet_peer_gc_mintime
correctly and will end up creating an expire timer with less than the
minimum duration, and even zero/negative if enough active peers are
present.

If >65K peers, the timer will be less than inet_peer_gc_mintime, and with
>70K peers, the timer duration will reach zero and go negative.

The timer handler will continue to schedule another zero/negative timer in
a loop until peers can be aged.  This can continue for at least a few
minutes or even longer if the peers remain active due to arriving packets
while the loop is occurring.

Bug is present in both 2.4 and 2.6.  Same patch will apply to both just
fine.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:10:15 -07:00
Herbert Xu
c3a20692ca [RPC]: Kill bogus kmap in krb5
While I was going through the crypto users recently, I noticed this
bogus kmap in sunrpc.  It's totally unnecessary since the crypto
layer will do its own kmap before touching the data.  Besides, the
kmap is throwing the return value away.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:09:53 -07:00
Dmitry Yusupov
14869c3886 [TCP]: Do TSO deferral even if tail SKB can go out now.
If the tail SKB fits into the window, it is still
benefitical to defer until the goal percentage of
the window is available.  This give the application
time to feed more data into the send queue and thus
results in larger TSO frames going out.

Patch from Dmitry Yusupov <dima@neterion.com>.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:09:27 -07:00
7e71af49d4 [NETFILTER]: Fix HW checksum handling in TCPMSS target
Most importantly, remove bogus BUG() in receive path.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20 17:40:41 -07:00
f93592ff4f [NETFILTER]: Fix HW checksum handling in ECN target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20 17:39:15 -07:00
fd841326d7 [NETFILTER]: Fix ECN target TCP marking
An incorrect check made it bail out before doing anything.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20 17:38:40 -07:00
Herbert Xu
6fc8b9e7c6 [IPCOMP]: Fix false smp_processor_id warning
This patch fixes a false-positive from debug_smp_processor_id().

The processor ID is only used to look up crypto_tfm objects.
Any processor ID is acceptable here as long as it is one that is
iterated on by for_each_cpu().

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-18 14:36:59 -07:00
cb94c62c25 [IPV4]: Fix DST leak in icmp_push_reply()
Based upon a bug report and initial patch by
Ollie Wild.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-18 14:05:44 -07:00
Jay Vosburgh
001dd250c1 [TOKENRING]: Use interrupt-safe locking with rif_lock.
Change operations on rif_lock from spin_{un}lock_bh to
spin_{un}lock_irq{save,restore} equivalents.  Some of the
rif_lock critical sections are called from interrupt context via
tr_type_trans->tr_add_rif_info.  The TR NIC drivers call tr_type_trans
from their packet receive handlers.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-18 14:04:51 -07:00
Paul E. McKenney
1f07247de5 [DECNET]: Fix RCU race condition in dn_neigh_construct().
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17 12:05:27 -07:00
bfd272b1ca [IPV6]: Fix SKB leak in ip6_input_finish()
Changing it to how ip_input handles should fix it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17 12:04:22 -07:00
Herbert Xu
35d59efd10 [TCP]: Fix bug #5070: kernel BUG at net/ipv4/tcp_output.c:864
1) We send out a normal sized packet with TSO on to start off.
2) ICMP is received indicating a smaller MTU.
3) We send the current sk_send_head which needs to be fragmented
since it was created before the ICMP event.  The first fragment
is then sent out.

At this point the remaining fragment is allocated by tcp_fragment.
However, its size is padded to fit the L1 cache-line size therefore
creating tail-room up to 124 bytes long.

This fragment will also be sitting at sk_send_head.

4) tcp_sendmsg is called again and it stores data in the tail-room of
of the fragment.
5) tcp_push_one is called by tcp_sendmsg which then calls tso_fragment
since the packet as a whole exceeds the MTU.

At this point we have a packet that has data in the head area being
fed to tso_fragment which bombs out.

My take on this is that we shouldn't ever call tcp_fragment on a TSO
socket for a packet that is yet to be transmitted since this creates
a packet on sk_send_head that cannot be extended.

So here is a patch to change it so that tso_fragment is always used
in this case.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17 12:03:59 -07:00
97077c4a98 [IPV6]: Fix raw socket hardware checksum failures
When packets hit raw sockets the csum update isn't done yet, do it manually.
Packets can also reach rawv6_rcv on the output path through
ip6_call_ra_chain, in this case skb->ip_summed is CHECKSUM_NONE and this
codepath isn't executed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17 12:03:32 -07:00
Trond Myklebust
58fcb8df0b [PATCH] NFS: Ensure ACL xdr code doesn't overflow.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-16 08:52:11 -07:00
Matt Mackall
d7b9dfc8ea [NETPOLL]: remove unused variable
Remove unused variable

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:28:05 -07:00
Matt Mackall
53fb95d3c1 [NETPOLL]: fix initialization/NAPI race
This fixes a race during initialization with the NAPI softirq
processing by using an RCU approach.

This race was discovered when refill_skbs() was added to
the setup code.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:27:43 -07:00
Ingo Molnar
2652076507 [NETPOLL]: pre-fill skb pool
we could do one thing (see the patch below): i think it would be useful 
to fill up the netlogging skb queue straight at initialization time.  
Especially if netpoll is used for dumping alone, the system might not be 
in a situation to fill up the queue at the point of crash, so better be 
a bit more prepared and keep the pipeline filled.

[ I've modified this to be called earlier - mpm ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:26:42 -07:00
Matt Mackall
0db1d6fc1e [NETPOLL]: add retry timeout
Add limited retry logic to netpoll_send_skb

Each time we attempt to send, decrement our per-device retry counter.
On every successful send, we reset the counter. 

We delay 50us between attempts with up to 20000 retries for a total of
1 second. After we've exhausted our retries, subsequent failed
attempts will try only once until reset by success.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:25:54 -07:00
Matt Mackall
f0d3459d07 [NETPOLL]: netpoll_send_skb simplify
Minor netpoll_send_skb restructuring

Restructure to avoid confusing goto and move some bits out of the
retry loop.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:25:11 -07:00
Jeff Moyer
a636e13579 [NETPOLL]: deadlock bugfix
This fixes an obvious deadlock in the netpoll code.  netpoll_rx takes the
npinfo->rx_lock.  netpoll_rx is also the only caller of arp_reply (through
__netpoll_rx).  As such, it is not necessary to take this lock.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:23:50 -07:00
Jeff Moyer
11513128bb [NETPOLL]: rx_flags bugfix
Initialize npinfo->rx_flags.  The way it stands now, this will have random
garbage, and so will incur a locking penalty even when an rx_hook isn't
registered and we are not active in the netpoll polling code.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:23:04 -07:00
Herbert Xu
b5da623ae9 [TCP]: Adjust {p,f}ackets_out correctly in tcp_retransmit_skb()
Well I've only found one potential cause for the assertion
failure in tcp_mark_head_lost.  First of all, this can only
occur if cnt > 1 since tp->packets_out is never zero here.
If it did hit zero we'd have much bigger problems.

So cnt is equal to fackets_out - reordering.  Normally
fackets_out is less than packets_out.  The only reason
I've found that might cause fackets_out to exceed packets_out
is if tcp_fragment is called from tcp_retransmit_skb with a
TSO skb and the current MSS is greater than the MSS stored
in the TSO skb.  This might occur as the result of an expiring
dst entry.

In that case, packets_out may decrease (line 1380-1381 in
tcp_output.c).  However, fackets_out is unchanged which means
that it may in fact exceed packets_out.

Previously tcp_retrans_try_collapse was the only place where
packets_out can go down and it takes care of this by decrementing
fackets_out.

So we should make sure that fackets_out is reduced by an appropriate
amount here as well.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-10 18:32:36 -07:00
Steven Whitehouse
001ab02a8c [DECNET]: Use sk_stream_error function rather than DECnet's own
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-10 11:32:57 -07:00
Andrew Morton
d64d387372 [NET]: Fix memory leak in sys_{send,recv}msg() w/compat
From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>

sendmsg()/recvmsg() syscalls from o32/n32 apps to a 64bit kernel will
cause a kernel memory leak if iov_len > UIO_FASTIOV for each syscall!

This is because both sys_sendmsg() and verify_compat_iovec() kmalloc a
new iovec structure.  Only the one from sys_sendmsg() is free'ed.

I wrote a simple test program to confirm this after identifying the
problem:

http://davej.org/programs/testsendmsg.c

Note that the below fix will break solaris_sendmsg()/solaris_recvmsg() as
it also calls verify_compat_iovec() but expects it to malloc internally.

[ I fixed that. -DaveM ]

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-09 15:29:19 -07:00
David S. Miller
3501466941 [SUNRPC]: Fix nsec --> usec conversion.
We need to divide, not multiply.  While we're here,
use NSEC_PER_USEC instead of a magic constant.

Based upon a report from Josip Loncaric and a patch
by Andrew Morton.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-09 14:57:12 -07:00
Linus Torvalds
92e52b2e82 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-08-08 16:06:01 -07:00
Heikki Orsila
ca9334523c [IPV4]: Debug cleanup
Here's a small patch to cleanup NETDEBUG() use in net/ipv4/ for Linux 
kernel 2.6.13-rc5. Also weird use of indentation is changed in some
places.

Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-08 14:26:52 -07:00
Harald Welte
8b83bc77bf [PATCH] don't try to do any NAT on untracked connections
With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing
NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no
longer sufficient.

The ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK effectively
prevents iteration of the 'nat' table, but doesn't prevent nat_packet()
to be executed.  Since nr_manips is gone in 'rustynat', nat_packet() now
implicitly thinks that it has to do NAT on the packet.

This patch fixes that problem by explicitly checking for
ip_conntrack_untracked in ip_nat_fn().

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-08 11:48:28 -07:00
Herbert Xu
6fc0b4a7a7 [IPSEC]: Restrict socket policy loading to CAP_NET_ADMIN.
The interface needs much redesigning if we wish to allow
normal users to do this in some way.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-06 06:33:15 -07:00
Marcel Holtmann
576c7d858f [Bluetooth] Add direction and timestamp to stack internal events
This patch changes the direction to incoming and adds the timestamp
to all stack internal events.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-06 12:36:54 +02:00
Marcel Holtmann
66e8b6c31b [Bluetooth] Remove unused functions and cleanup symbol exports
This patch removes the unused bt_dump() function and it also removes
its BT_DMP macro. It also unexports the hci_dev_get(), hci_send_cmd()
and hci_si_event() functions.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-06 12:36:51 +02:00
Marcel Holtmann
dcc365d8f2 [Bluetooth] Revert session reference counting fix
The fix for the reference counting problem of the signal DLC introduced
a race condition which leads to an oops. The reason for it is not fully
understood by now and so revert this fix, because the reference counting
problem is not crashing the RFCOMM layer and its appearance it rare.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-06 12:36:42 +02:00
David S. Miller
b7656e7f29 [IPV4]: Fix memory leak during fib_info hash expansion.
When we grow the tables, we forget to free the olds ones
up.

Noticed by Yan Zheng.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-05 04:12:48 -07:00
Herbert Xu
b68e9f8572 [PATCH] tcp: fix TSO cwnd caching bug
tcp_write_xmit caches the cwnd value indirectly in cwnd_quota.  When
tcp_transmit_skb reduces the cwnd because of tcp_enter_cwr, the cached
value becomes invalid.

This patch ensures that the cwnd value is always reread after each
tcp_transmit_skb call.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04 21:43:14 -07:00
David S. Miller
846998ae87 [PATCH] tcp: fix TSO sizing bugs
MSS changes can be lost since we preemptively initialize the tso_segs count
for an SKB before we %100 commit to sending it out.

So, by the time we send it out, the tso_size information can be stale due
to PMTU events.  This mucks up all of the logic in our send engine, and can
even result in the BUG() triggering in tcp_tso_should_defer().

Another problem we have is that we're storing the tp->mss_cache, not the
SACK block normalized MSS, as the tso_size.  That's wrong too.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04 21:43:14 -07:00
Denis Lunev
f0098f7863 [NET] Fix too aggressive backoff in dst garbage collection
The bug is evident when it is seen once. dst gc timer was backed off,
when gc queue is not empty. But this means that timer quickly backs off,
if at least one destination remains in use. Normally, the bug is invisible,
because adding new dst entry to queue cancels the backoff. But it shots
deadly with destination cache overflow when new destinations are not released
for long time f.e. after an interface goes down.

The fix is to cancel backoff when something was released.

Signed-off-by: Denis Lunev <den@sw.ru>
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30 17:47:25 -07:00
Alexey Kuznetsov
db44575f6f [NET]: fix oops after tunnel module unload
Tunnel modules used to obtain module refcount each time when
some tunnel was created, which meaned that tunnel could be unloaded
only after all the tunnels are deleted.

Since killing old MOD_*_USE_COUNT macros this protection has gone.
It is possible to return it back as module_get/put, but it looks
more natural and practically useful to force destruction of all
the child tunnels on module unload.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30 17:46:44 -07:00
Harald Welte
1f494c0e04 [NETFILTER] Inherit masq_index to slave connections
masq_index is used for cleanup in case the interface address changes
(such as a dialup ppp link with dynamic addreses).  Without this patch,
slave connections are not evicted in such a case, since they don't inherit
masq_index.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30 17:44:07 -07:00
Baruch Even
d1b04c081e [NET]: Spelling mistakes threshoulds -> thresholds
Just simple spelling mistake fixes.

Signed-Off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30 17:41:59 -07:00
David S. Miller
6192b54b84 [NET]: Fix busy waiting in dev_close().
If the current task has signal_pending(), the loop we have
to wait for the __LINK_STATE_RX_SCHED bit to clear becomes
a pure busy-loop.

Fixed by using msleep() instead of the hand-crafted version.

Noticed by Andrew Morton.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-28 12:12:58 -07:00
Linus Torvalds
839c5d2511 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-07-27 16:37:59 -07:00
Jesper Juhl
77933d7276 [PATCH] clean up inline static vs static inline
`gcc -W' likes to complain if the static keyword is not at the beginning of
the declaration.  This patch fixes all remaining occurrences of "inline
static" up with "static inline" in the entire kernel tree (140 occurrences in
47 files).

While making this change I came across a few lines with trailing whitespace
that I also fixed up, I have also added or removed a blank line or two here
and there, but there are no functional changes in the patch.

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:20 -07:00
Olaf Hering
44456d37b5 [PATCH] turn many #if $undefined_string into #ifdef $undefined_string
turn many #if $undefined_string into #ifdef $undefined_string to fix some
warnings after -Wno-def was added to global CFLAGS

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:08 -07:00
Matt Mackall
5e43db7730 [NET]: Move in_aton from net/ipv4/utils.c to net/core/utils.c
Move in_aton to allow netpoll and pktgen to work without the rest of
the IPv4 stack. Fix whitespace and add comment for the odd placement.

Delete now-empty net/ipv4/utils.c

Re-enable netpoll/netconsole without CONFIG_INET

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27 15:24:42 -07:00
Nick Sillik
7cee432a22 [NETFILTER]: Fix -Wunder error in ip_conntrack_core.c
Signed-off-by: Nick Sillik <n.sillik@temple.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27 14:46:03 -07:00
Kyle Moffett
a77be819f9 [NET]: Fix setsockopt locking bug
On Sparc, SO_DONTLINGER support resulted in sock_reset_flag being 
called without lock_sock().

Signed-off-by: Kyle Moffett <mrmacman_g4@mac.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27 14:22:30 -07:00
Hans-Juergen Tappe (SYSGO AG)
eaa1c5d059 [IPV4]: Fix Kconfig syntax error
From: "Hans-Juergen Tappe (SYSGO AG)" <hjt@sysgo.com>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27 13:00:04 -07:00
Herbert Xu
a4f1bac625 [XFRM]: Fix possible overflow of sock->sk_policy
Spotted by, and original patch by, Balazs Scheidler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-26 15:43:17 -07:00
7686ee1ad9 [EMATCH]: Remove feature ifdefs in meta ematch.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24 19:44:23 -07:00
Cal Peake
227510c7f1 [IPV6]: fix implicit declaration of function `xfrm6_tunnel_unregister'
Signed-off-by: Cal Peake <cp@absolutedigital.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24 19:30:06 -07:00
David S. Miller
261688d01e [PKT_SCHED]: em_meta: Kill TCF_META_ID_{INDEV,SECURITY,TCVERDICT}
More unusable TCF_META_* match types that need to get eliminated
before 2.6.13 goes out the door.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-22 14:43:52 -07:00
d3984a6b6a [NETFILTER]: Fix ip6t_LOG MAC format
I broke this in the patch that consolidated MAC logging.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22 12:52:47 -07:00
74bb421da7 [NETFILTER]: Use correct byteorder in ICMP NAT
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22 12:51:38 -07:00
21f930e4ab [NETFILTER]: Wait until all references to ip_conntrack_untracked are dropped on unload
Fixes a crash when unloading ip_conntrack.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22 12:51:03 -07:00
d04b4f8c1c [NETFILTER]: Fix potential memory corruption in NAT code (aka memory NAT)
The portptr pointing to the port in the conntrack tuple is declared static,
which could result in memory corruption when two packets of the same
protocol are NATed at the same time and one conntrack goes away.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22 12:50:29 -07:00
4c1217deeb [NETFILTER]: Fix deadlock in ip6_queue
Already fixed in ip_queue, ip6_queue was missed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22 12:49:30 -07:00
David S. Miller
28e212fb36 [PKT_SCHED]: Kill TCF_META_ID_REALDEV from meta ematch.
It won't exist any longer when we shrink the SKB in 2.6.14,
and we should kill this off before anyone in userspace starts
using it.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-22 11:47:25 -07:00
Rusty Russell
4acdbdbe50 [NETFILTER]: ip_conntrack_expect_related must not free expectation
If a connection tracking helper tells us to expect a connection, and
we're already expecting that connection, we simply free the one they
gave us and return success.

The problem is that NAT helpers (eg. FTP) have to allocate the
expectation first (to see what port is available) then rewrite the
packet.  If that rewrite fails, they try to remove the expectation,
but it was freed in ip_conntrack_expect_related.

This is one example of a larger problem: having registered the
expectation, the pointer is no longer ours to use.  Reference counting
is needed for ctnetlink anyway, so introduce it now.

To have a single "put" path, we need to grab the reference to the
connection on creation, rather than open-coding it in the caller.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-21 13:14:46 -07:00
David S. Miller
b72f6eccb0 [NET]: Fix tc_verd thinko in skb_clone()
It was overwriting the computer n->tc_verd value over
and over with skb->tc_verd, by mistake.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 14:13:54 -07:00
0303770deb [NET]: Make ipip/ip6_tunnel independant of XFRM
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 14:03:34 -07:00
Stephen Hemminger
c877efb207 [IPV4]: Fix up lots of little whitespace indentation stuff in fib_trie.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 14:01:51 -07:00
Adrian Bunk
eb3f8f5e22 [NET]: BRIDGE_EBT_ARPREPLY must depend on INET
BRIDGE_EBT_ARPREPLY=y and INET=n results in the following compile error:

net/built-in.o: In function `ebt_target_reply':
ebt_arpreply.c:(.text+0x68fb9): undefined reference to `arp_send'
make: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 14:00:13 -07:00
abaacad9bc [IPV4]: Don't select XFRM for ip_gre
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 13:59:17 -07:00
6aef4fdfea [NET]: Only build flow.o if CONFIG_XFRM=y
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 13:58:40 -07:00
Jesper Juhl
88e9fa8a54 [ATM]: Trivial spelling fix patch for net/Kconfig
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 13:56:53 -07:00
Chas Williams
322361b371 [ATM]: allow bind() on point-to-multpoint svcs (from Martin Whitaker <martin_whitaker@ntlworld.com>)
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 13:54:44 -07:00
David S. Miller
3f1c81ff10 [EMATCH]: Kill TCF_META_ID_TCCLASSID reference from meta ematch as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 17:10:55 -07:00
Adrian Bunk
6876f95f20 [IPV4]: fix IP_FIB_HASH kconfig warning
This patch fixes the following kconfig warning:
  net/ipv4/Kconfig:92:warning: defaults for choice values not supported

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:55:19 -07:00
Randy Dunlap
54208991e1 [NET]: Kconfig: NETCONSOLE and NETPOLL together
Put NETCONSOLE and NETPOLL options together since they are related.
This cuts down on the hassle of flipping back and forth between
the Networking menu and the Network drivers menu to change their
config settings.

Tested with menuconfig, gconfig, and xconfig.
gconfig has a small problem with this.  I think that it's
a bug in gconfig and I will take it up with Romain Lievin.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:45:12 -07:00
Sridhar Samudrala
d1ad1ff299 [SCTP]: Fix potential null pointer dereference while handling an icmp error
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:44:10 -07:00
Christophe Lucas
ee71a29eb5 [SCTP]: Audit return code of create_proc_*
From: Christophe Lucas <clucas@rotomalug.org>

Audit return of create_proc_* functions.

Signed-off-by: Christophe Lucas <clucas@rotomalug.org>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:38:07 -07:00
Victor Fusco
37da647d99 [NETLINK]: Fix "nocast type" warnings
From: Victor Fusco <victor@cetuc.puc-rio.br>

Fix the sparse warning "implicit cast to nocast type"

Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:35:43 -07:00
Thomas Graf
452f299da3 [PKT_SCHED]: Reduce branch mispredictions in pfifo_fast_dequeue
The current call to __qdisc_dequeue_head leads to a branch
misprediction for every loop iteration, the fact that the
most common priority is 2 makes this even worse.  This issue
has been brought up by Eric Dumazet <dada1@cosmosbay.com>
but unlike his solution which was to manually unroll the loop,
this approach preserves the possibility to increase the number
of bands at compile time. 

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:30:53 -07:00
Thomas Graf
d7c7ed4dbc [PKT_SCHED]: Remove debugging leftover from textsearch ematch
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:29:49 -07:00
Tommy Christensen
f4637b55ba [VLAN]: Fix early vlan adding leads to not functional device
OK, I can see what's happening here. eth0 doesn't detect link-up until
after a few seconds, so when the vlan interface is opened immediately
after eth0 has been opened, it inherits the link-down state. Subsequently
the vlan interface is never properly activated and are thus unable to
transmit any packets.

dev->state bits are not supposed to be manipulated directly. Something
similar is probably needed for the netif_device_present() bit, although
I don't know how this is meant to work for a virtual device.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-12 12:13:49 -07:00
Alexey Dobriyan
ab611487d8 [NET]: __be'ify *_type_trans()
tr_type_trans(), hippi_type_trans() left as-is.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-12 12:08:43 -07:00
Phil Oester
84531c24f2 [NETFILTER]: Revert nf_reset change
Revert the nf_reset change that caused so much trouble, drop conntrack
references manually before packets are queued to packet sockets.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-12 11:57:52 -07:00
Sam Ravnborg
6a2e9b738c [NET]: move config options out to individual protocols
Move the protocol specific config options out to the specific protocols.
With this change net/Kconfig now starts to become readable and serve as a
good basis for further re-structuring.

The menu structure is left almost intact, except that indention is
fixed in most cases. Most visible are the INET changes where several
"depends on INET" are replaced with a single ifdef INET / endif pair.

Several new files were created to accomplish this change - they are
small but serve the purpose that config options are now distributed
out where they belongs.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 21:13:56 -07:00
Sam Ravnborg
d5950b4355 [NET]: add a top-level Networking menu to *config
Create a new top-level menu named "Networking" thus moving
net related options and protocol selection way from the drivers
menu and up on the top-level where they belong.

To implement this all architectures has to source "net/Kconfig" before
drivers/*/Kconfig in their Kconfig file. This change has been
implemented for all architectures.

Device drivers for ordinary NIC's are still to be found
in the Device Drivers section, but Bluetooth, IrDA and ax25
are located with their corresponding menu entries under the new
networking menu item.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 21:03:49 -07:00
Olaf Kirch
0b7f22aab4 [IPV4]: Prevent oops when printing martian source
In some cases, we may be generating packets with a source address that
qualifies as martian. This can happen when we're in the middle of setting
up the network, and netfilter decides to reject a packet with an RST.
The IPv4 routing code would try to print a warning and oops, because
locally generated packets do not have a valid skb->mac.raw pointer
at this point.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 21:01:42 -07:00
Julian Anastasov
af9debd461 [IPVS]: Add and reorder bh locks after moving to keventd.
An addition to the last ipvs changes that move
update_defense_level/si_meminfo to keventd:

- ip_vs_random_dropentry now runs in process context and should use _bh
  locks to protect from softirqs

- update_defense_level still needs _bh locks after si_meminfo is called,
  for the same purpose

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 20:59:57 -07:00
Jesper Juhl
f5b8adb4f5 [NET]: Trivial spelling fix patch for net/Kconfig
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 20:59:03 -07:00
Alexey Dobriyan
3182cd84f0 [SCTP]: __nocast annotations
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 20:57:47 -07:00
David S. Miller
79af02c253 [SCTP]: Use struct list_head for chunk lists, not sk_buff_head.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 21:47:49 -07:00
David S. Miller
9c05989bb2 [IPV6]: Fix warning in ip6_mc_msfilter.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 21:44:39 -07:00
David L Stevens
84b42baef7 [IPV4]: fix IPv4 leave-group group matching
This patch fixes the multicast group matching for 
IP_DROP_MEMBERSHIP, similar to the IP_ADD_MEMBERSHIP fix in a prior
patch. Groups are identifiedby <group address,interface> and including
the interface address in the match will fail if a leave-group is done
by address when the join was done by index, or if different addresses
on the same interface are used in the join and leave.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:48:38 -07:00