Archived
14
0
Fork 0
Commit graph

2616 commits

Author SHA1 Message Date
Bart Van Assche
e968467822 IB/srp: stop sharing the host lock with SCSI
We don't need protection against the SCSI stack, so use our own lock to
allow parallel progress on separate CPUs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:50 -05:00
Bart Van Assche
94a9174c63 IB/srp: reduce lock coverage of command completion
We only need the lock to cover list and credit manipulations, so push
those into srp_remove_req() and update the call chains.

We reorder the request removal and command completion in
srp_process_rsp() to avoid the SCSI mid-layer sending another command
before we've released our request and added any credits returned by the
target. This prevents us from returning HOST_BUSY unneccesarily.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out, small cleanups, and modified to avoid potential extraneous
  HOST_BUSY returns by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:50 -05:00
Bart Van Assche
76c75b258f IB/srp: reduce local coverage for command submission and EH
We only need locks to protect our lists and number of credits available.
By pre-consuming the credit for the request, we can reduce our lock
coverage to just those areas. If we don't actually send the request,
we'll need to put the credit back into the pool.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:49 -05:00
Bart Van Assche
536ae14e75 IB/srp: don't move active requests to their own list
We use req->scmnd != NULL to indicate an active request, so there's no
need to keep a separate list for them. We can afford the array iteration
during error handling, and dropping it gives us one less item that needs
lock protection.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:42 -05:00
Linus Torvalds
b4a45f5fe8 Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
  fs: scale mntget/mntput
  fs: rename vfsmount counter helpers
  fs: implement faster dentry memcmp
  fs: prefetch inode data in dcache lookup
  fs: improve scalability of pseudo filesystems
  fs: dcache per-inode inode alias locking
  fs: dcache per-bucket dcache hash locking
  bit_spinlock: add required includes
  kernel: add bl_list
  xfs: provide simple rcu-walk ACL implementation
  btrfs: provide simple rcu-walk ACL implementation
  ext2,3,4: provide simple rcu-walk ACL implementation
  fs: provide simple rcu-walk generic_check_acl implementation
  fs: provide rcu-walk aware permission i_ops
  fs: rcu-walk aware d_revalidate method
  fs: cache optimise dentry and inode for rcu-walk
  fs: dcache reduce branches in lookup path
  fs: dcache remove d_mounted
  fs: fs_struct use seqlock
  fs: rcu-walk for path lookup
  ...
2011-01-07 08:56:33 -08:00
Nick Piggin
dc0474be3e fs: dcache rationalise dget variants
dget_locked was a shortcut to avoid the lazy lru manipulation when we already
held dcache_lock (lru manipulation was relatively cheap at that point).
However, how that the lru lock is an innermost one, we never hold it at any
caller, so the lock cost can now be avoided. We already have well working lazy
dcache LRU, so it should be fine to defer LRU manipulations to scan time.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:24 +11:00
Nick Piggin
b5c84bf6f6 fs: dcache remove dcache_lock
dcache_lock no longer protects anything. remove it.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:23 +11:00
Nick Piggin
b7ab39f631 fs: dcache scale dentry refcount
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:21 +11:00
Bart Van Assche
dcb4cb85f4 IB/srp: allow lockless work posting
Only one CPU at a time will own an RX IU, so using the address of the IU
as the work request cookie allows us to avoid taking a lock. We can
similarly prepare the TX path for lockless posting by moving the free TX
IUs to a list. This also removes the requirement that the queue sizes be
a power of 2.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out, small cleanups, and modified to avoid needing an extra field
  in the IU by David Dillow]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-05 15:24:25 -05:00
Bart Van Assche
9709f0e05b IB/srp: consolidate state change code
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-05 15:24:25 -05:00
David Dillow
f8b6e31e4e IB/srp: allow task management without a previous request
We can only have one task management comment outstanding, so move the
completion and status to the target port. This allows us to handle
resets of a LUN without a corresponding request having been sent.
Meanwhile, we don't need to play games with host_scribble, just use it
as the pointer it is.

This fixes a crash when we issue a bus reset using sg_reset.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=13893
Reported-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-05 15:24:25 -05:00
David S. Miller
17f7f4d9fc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/ipv4/fib_frontend.c
2010-12-26 22:37:05 -08:00
Jiri Kosina
4b7bd36470 Merge branch 'master' into for-next
Conflicts:
	MAINTAINERS
	arch/arm/mach-omap2/pm24xx.c
	drivers/scsi/bfa/bfa_fcpim.c

Needed to update to apply fixes for which the old branch was too
outdated.
2010-12-22 18:57:02 +01:00
Dan Carpenter
7182afea8d IB/uverbs: Handle large number of entries in poll CQ
In ib_uverbs_poll_cq() code there is a potential integer overflow if
userspace passes in a large cmd.ne.  The calls to kmalloc() would
allocate smaller buffers than intended, leading to memory corruption.
There iss also an information leak if resp wasn't all used.
Unprivileged userspace may call this function, although only if an
RDMA device that uses this function is present.

Fix this by copying CQ entries one at a time, which avoids the
allocation entirely, and also by moving this copying into a function
that makes sure to initialize all memory copied to userspace.

Special thanks to Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
for his help and advice.

Cc: <stable@kernel.org>
Signed-off-by: Dan Carpenter <error27@gmail.com>

[ Monkey around with things a bit to avoid bad code generation by gcc
  when designated initializers are used.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-08 15:23:49 -08:00
Linus Torvalds
75318ec327 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB: Fix information leak in marshalling code
  IB/pack: Remove some unused code added by the IBoE patches
  IB/mlx4: Fix IBoE link state
  IB/mlx4: Fix IBoE reported link rate
  mlx4_core: Workaround firmware bug in query dev cap
  IB/mlx4: Fix memory ordering of VLAN insertion control bits
  MAINTAINERS: Update NetEffect entry
2010-12-02 12:10:56 -08:00
Roland Dreier
7adce751ce Merge branches 'misc', 'mlx4' and 'nes' into for-next 2010-12-01 16:33:47 -08:00
Vasiliy Kulikov
91a4d157d0 IB: Fix information leak in marshalling code
ib_ucm_init_qp_attr() and ucma_init_qp_attr() pass struct ib_uverbs_qp_attr
with reserved, qp_state, {ah_attr,alt_ah_attr}{reserved,->grh.reserved}
fields uninitialized to copy_to_user().  This leads to leaking of
contents of kernel stack memory to userspace.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-01 16:33:18 -08:00
Or Gerlitz
f55864a4f4 IB/pack: Remove some unused code added by the IBoE patches
Remove unused functions added by commit ff7f5aab35 ("IB/pack: IBoE UD
packet packing support").

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
2010-12-01 16:30:18 -08:00
Eli Cohen
21d606090e IB/mlx4: Fix IBoE link state
Use netif_running() and netif_carrier_ok() to report link state,
exactly as is done to report Ethernet link state in sysfs.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-01 16:11:29 -08:00
Eli Cohen
328266c561 IB/mlx4: Fix IBoE reported link rate
The link rate is the product of the link speed in the link width. For
Etherent ports the rate is 10G, so we use 1 for the width and 4 for
speed to get the correct rate.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-01 16:10:35 -08:00
Eli Cohen
e27535b9c6 IB/mlx4: Fix memory ordering of VLAN insertion control bits
We must fully update the control segment before marking it as valid,
so that hardware doesn't start executing it before we're ready.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>

[ Move VLAN control bit setting to before wmb().  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-01 11:08:54 -08:00
Eric Dumazet
22f4fbd9bd infiniband: remove dev_base_lock use
dev_base_lock is the legacy way to lock the device list, and is planned
to disappear. (writers hold RTNL, readers hold RCU lock)

Convert rdma_translate_ip() and update_ipv6_gids() to RCU locking.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:41:56 -08:00
Arnd Bergmann
451a3c24b0 BKL: remove extraneous #include <smp_lock.h>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-17 08:59:32 -08:00
Jeff Garzik
f281233d3e SCSI host lock push-down
Move the mid-layer's ->queuecommand() invocation from being locked
with the host lock to being unlocked to facilitate speeding up the
critical path for drivers who don't need this lock taken anyway.

The patch below presents a simple SCSI host lock push-down as an
equivalent transformation.  No locking or other behavior should change
with this patch.  All existing bugs and locking orders are preserved.

Additionally, add one parameter to queuecommand,
	struct Scsi_Host *
and remove one parameter from queuecommand,
	void (*done)(struct scsi_cmnd *)

Scsi_Host* is a convenient pointer that most host drivers need anyway,
and 'done' is redundant to struct scsi_cmnd->scsi_done.

Minimal code disturbance was attempted with this change.  Most drivers
needed only two one-line modifications for their host lock push-down.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-16 13:33:23 -08:00
Jesper Juhl
e987fa357a infiniband: Only include mutex.h once in drivers/infiniband/hw/cxgb4/iw_cxgb4.h
Only include the header linux/mutex.h once inside
drivers/infiniband/hw/cxgb4/iw_cxgb4.h

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-11-15 14:35:19 +01:00
Eric Dumazet
72cdd1d971 net: get rid of rtable->idev
It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.

We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).

infiniband case is solved using dst.dev instead of idev->dev

Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.

About 5% speedup on routing test.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 10:29:40 -08:00
Uwe Kleine-König
b595076a18 tree-wide: fix comment/printk typos
"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-11-01 15:38:34 -04:00
Al Viro
fc14f2fef6 convert get_sb_single() users
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-29 04:16:28 -04:00
Linus Torvalds
426e1f5cec Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  split invalidate_inodes()
  fs: skip I_FREEING inodes in writeback_sb_inodes
  fs: fold invalidate_list into invalidate_inodes
  fs: do not drop inode_lock in dispose_list
  fs: inode split IO and LRU lists
  fs: switch bdev inode bdi's correctly
  fs: fix buffer invalidation in invalidate_list
  fsnotify: use dget_parent
  smbfs: use dget_parent
  exportfs: use dget_parent
  fs: use RCU read side protection in d_validate
  fs: clean up dentry lru modification
  fs: split __shrink_dcache_sb
  fs: improve DCACHE_REFERENCED usage
  fs: use percpu counter for nr_dentry and nr_dentry_unused
  fs: simplify __d_free
  fs: take dcache_lock inside __d_path
  fs: do not assign default i_ino in new_inode
  fs: introduce a per-cpu last_ino allocator
  new helper: ihold()
  ...
2010-10-26 17:58:44 -07:00
Linus Torvalds
9e5fca251f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (63 commits)
  IB/qib: clean up properly if pci_set_consistent_dma_mask() fails
  IB/qib: Allow driver to load if PCIe AER fails
  IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set
  IB/qib: Fix extra log level in qib_early_err()
  RDMA/cxgb4: Remove unnecessary KERN_<level> use
  RDMA/cxgb3: Remove unnecessary KERN_<level> use
  IB/core: Add link layer type information to sysfs
  IB/mlx4: Add VLAN support for IBoE
  IB/core: Add VLAN support for IBoE
  IB/mlx4: Add support for IBoE
  mlx4_en: Change multicast promiscuous mode to support IBoE
  mlx4_core: Update data structures and constants for IBoE
  mlx4_core: Allow protocol drivers to find corresponding interfaces
  IB/uverbs: Return link layer type to userspace for query port operation
  IB/srp: Sync buffer before posting send
  IB/srp: Use list_first_entry()
  IB/srp: Reduce number of BUSY conditions
  IB/srp: Eliminate two forward declarations
  IB/mlx4: Signal node desc changes to SM by using FW to generate trap 144
  IB: Replace EXTRA_CFLAGS with ccflags-y
  ...
2010-10-26 17:54:22 -07:00
Hagen Paul Pfeifer
732eacc054 replace nested max/min macros with {max,min}3 macro
Use the new {max,min}3 macros to save some cycles and bytes on the stack.
This patch substitutes trivial nested macros with their counterpart.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:12 -07:00
Roland Dreier
116e9535fe Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iboe', 'ipoib', 'misc', 'mlx4', 'nes', 'qib' and 'srp' into for-next 2010-10-26 16:09:11 -07:00
Jason Gunthorpe
2ca78d23a7 IB/qib: clean up properly if pci_set_consistent_dma_mask() fails
Clean up properly if pci_set_consistent_dma_mask() fails.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 16:09:02 -07:00
Ralph Campbell
5d26a1df23 IB/qib: Allow driver to load if PCIe AER fails
Some PCIe root complex chip sets don't support advanced error reporting.
Allow the driver to load OK if pci_enable_pcie_error_reporting() fails.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 16:09:02 -07:00
Ralph Campbell
9e43e0106d IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set
If CONFIG_PCI_MSI is not set, and a QLE7140 is present, the pointer
"dd" is uninitialized.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 16:09:02 -07:00
Jason Gunthorpe
82fdb0ab54 IB/qib: Fix extra log level in qib_early_err()
Noticed this odd looking thing in dmesg:

    ib_qib 0000:02:00.0: <3>ib_qib: Unable to enable pcie error reporting: -5

which is due to a bad use of dev_info.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 16:09:02 -07:00
Joe Perches
aa1ad26089 RDMA/cxgb4: Remove unnecessary KERN_<level> use
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 13:45:59 -07:00
Joe Perches
ca7cf94f8b RDMA/cxgb3: Remove unnecessary KERN_<level> use
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26 13:45:49 -07:00
Christoph Hellwig
85fe4025c6 fs: do not assign default i_ino in new_inode
Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves.  For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:11 -04:00
Eli Cohen
8ad330a002 IB/core: Add link layer type information to sysfs
Since an IB transport port may use either IB or Ethernet as its link layer,
add the file /sys/class/infiniband/<device>/ports/<port_num>/link_layer to
show the link layer for the port.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-25 10:20:39 -07:00
Eli Cohen
4c3eb3ca13 IB/mlx4: Add VLAN support for IBoE
This patch allows IBoE traffic to be encapsulated in 802.1Q tagged
VLAN frames.  The VLAN tag is encoded in the GID and derived from it
by a simple computation.

The netdev notifier callback is modified to catch VLAN device
addition/removal and the port's GID table is updated to reflect the
change, so that for each netdevice there is an entry in the GID table.
When the port's GID table is exhausted, GID entries will not be added.
Only children of the main interfaces can add to the GID table; if a
VLAN interface is added on another VLAN interface (e.g. "vconfig add
eth2.6 8"), then that interfaces will not add an entry to the GID
table.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-25 10:20:39 -07:00
Eli Cohen
af7bd46376 IB/core: Add VLAN support for IBoE
Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the
GID derived from a link local address in the following way:

    GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN.

The 3 bits user priority field of the packets are identical to the 3
bits of the SL.

In case of rdma_cm apps, the TOS field is used to generate the SL
field by doing a shift right of 5 bits effectively taking to 3 MS bits
of the TOS field.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-25 10:20:39 -07:00
Eli Cohen
fa417f7b52 IB/mlx4: Add support for IBoE
Add support for IBoE to mlx4_ib.  The bulk of the code is handling the
new address vector fields; mlx4 needs the MAC address of a remote node
to include it in a WQE (for datagrams) or in the QP context (for
connected QPs).  Address resolution is done by assuming all unicast
GIDs are either link-local IPv6 addresses.

Multicast group attach/detach needs to update the NIC's multicast
filters; but since attaching a QP to a multicast group can be done
before the QP is bound to a port, for IBoE we need to keep track of
all multicast groups that a QP is attached too before it transitions
from INIT to RTR (since it does not have a port in the INIT state).

Signed-off-by: Eli Cohen <eli@mellanox.co.il>

[ Many things cleaned up and otherwise monkeyed with; hope I didn't
  introduce too many bugs.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-25 10:20:39 -07:00
Eli Cohen
2420b60b1d IB/uverbs: Return link layer type to userspace for query port operation
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-25 10:20:39 -07:00
David Dillow
19081f31ce IB/srp: Sync buffer before posting send
srp_send_tsk_mgmt() was missing the proper DMA sync calls before posting
the buffer to the device.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-24 22:14:23 -07:00
Bart Van Assche
21c1a90769 IB/srp: Use list_first_entry()
Use the list_first_entry() macro in ib_srp instead of open-coding the equivalent,
which makes the source code slightly more descriptive.  The list_first_entry()
macro itself was introduced in kernel 2.6.22.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-24 22:14:19 -07:00
Bart Van Assche
7ade400aba IB/srp: Reduce number of BUSY conditions
As proposed by the SRP (draft) standard, ib_srp reserves one ring
element for SRP_TSK_MGMT requests. This patch makes sure that the SCSI
mid-layer never tries to queue more than (SRP request limit) - 1 SCSI
commands to ib_srp. This improves performance for targets whose request
limit is less than or equal to SRP_NORMAL_REQ_SQ_SIZE by reducing the
number of BUSY responses reported by ib_srp to the SCSI mid-layer.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-24 22:14:14 -07:00
David Dillow
05a1d7504f IB/srp: Eliminate two forward declarations
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-24 22:14:08 -07:00
Linus Torvalds
229aebb873 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  Update broken web addresses in arch directory.
  Update broken web addresses in the kernel.
  Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget
  Revert "Fix typo: configuation => configuration" partially
  ida: document IDA_BITMAP_LONGS calculation
  ext2: fix a typo on comment in ext2/inode.c
  drivers/scsi: Remove unnecessary casts of private_data
  drivers/s390: Remove unnecessary casts of private_data
  net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data
  drivers/infiniband: Remove unnecessary casts of private_data
  drivers/gpu/drm: Remove unnecessary casts of private_data
  kernel/pm_qos_params.c: Remove unnecessary casts of private_data
  fs/ecryptfs: Remove unnecessary casts of private_data
  fs/seq_file.c: Remove unnecessary casts of private_data
  arm: uengine.c: remove C99 comments
  arm: scoop.c: remove C99 comments
  Fix typo configue => configure in comments
  Fix typo: configuation => configuration
  Fix typo interrest[ing|ed] => interest[ing|ed]
  Fix various typos of valid in comments
  ...

Fix up trivial conflicts in:
	drivers/char/ipmi/ipmi_si_intf.c
	drivers/usb/gadget/rndis.c
	net/irda/irnet/irnet_ppp.c
2010-10-24 13:41:39 -07:00
Jack Morgenstein
d0d68b8693 IB/mlx4: Signal node desc changes to SM by using FW to generate trap 144
The Node Description cannot be changed via MADs (it is read-only).
Until now, it was changed in the driver via sysfs, and the new Node
Description was simply inserted by the driver into MAD responses
(replacing the description returned by FW).

System startup scripts use the sysfs interface to change the node
description at driver startup to show the hostname, etc. However, this
has a race condition: the SM could discover the original FW node
description rather than the system-specific description if it queried the
port before the startup scripts finish running.

For mlx4, we fix this with a new FW command (SET_NODE) that allows
passing the new node description to FW.  When this command is invoked,
FW sends a trap 144 to the SM.  When it gets this trap, the SM can
query the node to obtain the new node description -- thus eliminating
the effects of the race.

This patch simply calls SET_NODE command when a new node description
is entered via sysfs (thus causing trap 144 to be issued by the FW).
We ignore all failures of the SET_NODE command (including those caused
by using a device FW that predates the SET_NODE command), since in
that case things work just as before.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-23 13:53:09 -07:00
matt mooney
7454159d3c IB: Replace EXTRA_CFLAGS with ccflags-y
Signed-off-by: matt mooney <mfm@muteddisk.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-23 13:45:03 -07:00
Steve Wise
97cb7e40c6 RDMA/ucma: Allow tuning the max listen backlog
For iWARP connections, the connect request is carried in a TCP payload
on an already established TCP connection.  So if the ucma's backlog is
full, the connection request is transmitted and acked at the TCP level
by the time the connect request gets dropped in the ucma.  The end
result is the connection gets rejected by the iWARP provider.
Further, a 32 node 256NP OpenMPI job will generate > 128 connect
requests on some ranks.

This patch increases the default max backlog to 1024, and adds a
sysctl variable so the backlog can be adjusted at run time.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-23 13:41:40 -07:00
Eli Cohen
c3aa9b186b IPoIB: Set dev_id field of net_device
Use the net device's dev_id field to encode the port number of the pci
device.  This can be used to to associate a net device with the pci
device's port. The encoding is: dev_id = port - 1.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-23 13:35:48 -07:00
Linus Torvalds
5f05647dd8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
  bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
  vlan: Calling vlan_hwaccel_do_receive() is always valid.
  tproxy: use the interface primary IP address as a default value for --on-ip
  tproxy: added IPv6 support to the socket match
  cxgb3: function namespace cleanup
  tproxy: added IPv6 support to the TPROXY target
  tproxy: added IPv6 socket lookup function to nf_tproxy_core
  be2net: Changes to use only priority codes allowed by f/w
  tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
  tproxy: added tproxy sockopt interface in the IPV6 layer
  tproxy: added udp6_lib_lookup function
  tproxy: added const specifiers to udp lookup functions
  tproxy: split off ipv6 defragmentation to a separate module
  l2tp: small cleanup
  nf_nat: restrict ICMP translation for embedded header
  can: mcp251x: fix generation of error frames
  can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
  can-raw: add msg_flags to distinguish local traffic
  9p: client code cleanup
  rds: make local functions/variables static
  ...

Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
2010-10-23 11:47:02 -07:00
David Dillow
bb12588a38 IB/srp: Implement SRP_CRED_REQ and SRP_AER_REQ
This patch adds support for SRP_CRED_REQ to avoid a lockup by targets
that use that mechanism to return credits to the initiator. This
prevents a lockup observed in the field where we would never add the
credits from the SRP_CRED_REQ to our current count, and would therefore
never send another command to the target.

Minimal support for SRP_AER_REQ is also added, as these messages can
also be used to convey additional credits to the initiator.

Based upon extensive debugging and code by Bart Van Assche and a bug
report by Chris Worley.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 22:19:10 -07:00
Bart Van Assche
dd5e6e38b2 IB/srp: Preparation for transmit ring response allocation
The transmit ring in ib_srp (srp_target.tx_ring) is currently only used
for allocating requests sent by the initiator to the target. This patch
prepares using that ring for allocation of both requests and responses.
Also, this patch differentiates the uses of SRP_SQ_SIZE, increases the
size of the IB send completion queue by one element and reserves one
transmit ring slot for SRP_TSK_MGMT requests.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 22:19:10 -07:00
Jason Gunthorpe
5715f5d44b IB/qib: Process RDMA WRITE ONLY with IMMEDIATE properly
See table 35 in IBA - the header order for RDMA_WRITE_ONLY_WITH_IMMEDIATE
and SEND_LAST_WITH_IMMEDIATE is different: the RDMA_WRITE_ONLY has
a RETH header before the immediate data, so we need a different code path
to extract the immediate data.

I tested this with a userspace app that does RDMA_WRITE with immediate
on a QLE7140.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 22:12:15 -07:00
Steve Wise
b955150ea7 RDMA/cxgb3: When a user QP is marked in error, also mark the CQs in error
The flushing of work requests for user QPs is implemented entirely in
the user mode library.  The only kernel interaction is to mark the
user QP object indicating it is in error when the QP exits RTS.  When
the user QP operations are called by the application (eg: post_send,
post_recv), the QP in error bit is checked and if set, the library
flushes the QP.  If, however, the application is not doing IO, but
rather just polling the CQ, it will never get flushed work requests.
This breaks some classes of applications.

This patch adds logic to mark user CQs in error when a QP that is bound
to the CQ is marked in error.  The library poll code can then notice
the CQ is in error and flush all the in error QPs bound to that CQ.

Design:

 - add 1 extra CQE entry to the CQ memory that will be used to indicate
   in error status.
 - return the desired CQ memory size that should be mapped by the library
 - bump the ABI since the create_cq uverbs response changes.
 - detect older libraries and reduce the mmap size accordingly.
   (The ABI bump doesn't break old libraries, since they didn't check
   the ABI field anyway)

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 22:00:53 -07:00
Steve Wise
da411ba1da RDMA/cxgb4: Use cxgb4 service for packet gl to skb
Remove the local service t4_pktgl_to_skb() and use cxgb4_pktgl_to_skb()
exported by cxgb4.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 21:58:50 -07:00
Steve Wise
de5dd81b49 RDMA/cxgb4: Export T4 TCP MIB
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 21:57:26 -07:00
Linus Torvalds
092e0e7e52 Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
  vfs: make no_llseek the default
  vfs: don't use BKL in default_llseek
  llseek: automatically add .llseek fop
  libfs: use generic_file_llseek for simple_attr
  mac80211: disallow seeks in minstrel debug code
  lirc: make chardev nonseekable
  viotape: use noop_llseek
  raw: use explicit llseek file operations
  ibmasmfs: use generic_file_llseek
  spufs: use llseek in all file operations
  arm/omap: use generic_file_llseek in iommu_debug
  lkdtm: use generic_file_llseek in debugfs
  net/wireless: use generic_file_llseek in debugfs
  drm: use noop_llseek
2010-10-22 10:52:56 -07:00
Justin P. Mattock
631dd1a885 Update broken web addresses in the kernel.
The patch below updates broken web addresses in the kernel

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Ben Pfaff <blp@cs.stanford.edu>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-18 11:03:14 +02:00
Randy Dunlap
e00ce92e0b infiniband: fix mlx4 kconfig dependency warning
Fix kconfig dependency warning to satisfy dependencies:

warning: (MLX4_EN && NETDEVICES && NETDEV_10000 && PCI && INET || MLX4_INFINIBAND && INFINIBAND) selects MLX4_CORE which has unmet direct dependencies (NETDEVICES && NETDEV_10000 && PCI)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-16 11:13:25 -07:00
Arnd Bergmann
6038f373a3 llseek: automatically add .llseek fop
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time.  Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
//   but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
   *off = E
|
   *off += E
|
   func(..., off, ...)
|
   E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
  *off = E
|
  *off += E
|
  func(..., off, ...)
|
  E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
 ...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
 .llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
 .read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
 .write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
 .open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
...  .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
...  .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
...  .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+	.llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
 .write = write_f,
 .read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
2010-10-15 15:53:27 +02:00
Eli Cohen
ff7f5aab35 IB/pack: IBoE UD packet packing support
Add support for packing IBoE packet headers.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>

[ Clean up and fix ib_ud_header_init() a bit.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-14 12:41:29 -07:00
Eli Cohen
3c86aa70bf RDMA/cm: Add RDMA CM support for IBoE devices
Add support for IBoE device binding and IP --> GID resolution.  Path
resolving and multicast joining are implemented within cma.c by
filling in the responses and running callbacks in the CMA work queue.

IP --> GID resolution always yields IPv6 link local addresses; remote
GIDs are derived from the destination MAC address of the remote port.
Multicast GIDs are always mapped to multicast MACs as is done in IPv6.
(IPv4 multicast is enabled by translating IPv4 multicast addresses to
IPv6 multicast as described in
<http://www.mail-archive.com/ipng@sunroof.eng.sun.com/msg02134.html>.)

Some helper functions are added to ib_addr.h.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-13 15:46:43 -07:00
Eli Cohen
fac70d5191 IB/mad: IBoE supports only QP1 (no QP0)
Since IBoE is using Ethernet as its link layer, there is no central
management entity so there is need for QP0.  QP1 is still needed since
it handles communications between CM agents.  This patch will skip QP0
and create only QP1 for IBoE ports.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-13 09:38:11 -07:00
Eli Cohen
7b4c876961 IPoIB: Skip IBoE ports
IPoIB is IP-over-Infiniband link layer. In the case of IBoE, the link
layer is Ethernet and IP can work directly over Ethernet, so disable
IPoIB for non-IB_LINK_LAYER_INFINIBAND ports.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-13 09:38:11 -07:00
Eric Dumazet
29b4433d99 net: percpu net_device refcount
We tried very hard to remove all possible dev_hold()/dev_put() pairs in
network stack, using RCU conversions.

There is still an unavoidable device refcount change for every dst we
create/destroy, and this can slow down some workloads (routers or some
app servers, mmap af_packet)

We can switch to a percpu refcount implementation, now dynamic per_cpu
infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes
per device.

On x86, dev_hold(dev) code :

before
        lock    incl 0x280(%ebx)
after:
        movl    0x260(%ebx),%eax
        incl    fs:(%eax)

Stress bench :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE)

Before:

real    1m1.662s
user    0m14.373s
sys     12m55.960s

After:

real    0m51.179s
user    0m15.329s
sys     10m15.942s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-12 12:35:25 -07:00
Animesh K Trivedi
26012f0750 RDMA/iwcm: Fix hang in uninterruptible wait on cm_id destroy
A process can get stuck in an uninterruptible wait in the
kernel while destroying a cm_id when iw_cm_connect() fails:

For example, When creation of a PD fails but the user continues with
an attempt to connect to the server without checking the return value,
in iw_cm_connect() a NULL qp is found so the call fails.  However the
IWCM_F_CONNECT_WAIT bit is not cleared.  destroy_cm_id() then waits
forever for IWCM_F_CONNECT_WAIT to be cleared.

The same problem exists on the passive side with the accept call.

Fix this by clearing the bit and waking up any waiters in the
appropriate spots.

Signed-off-by: Animesh Trivedi <atr@zurich.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-11 20:24:04 -07:00
Steve Wise
3160977a6e RDMA/cxgb4: Use simple_read_from_buffer() for debugfs handlers
We can replace our equivalent open-coded version.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-11 20:15:14 -07:00
Steve Wise
8bbac892fb RDMA/cxgb4: Add default_llseek to debugfs files
Incorporate BKL removal changes.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-11 20:14:00 -07:00
Eli Cohen
5a0fd09428 IB/mlx4: Limit size of fast registration WRs
Fix the limit on the size of max fast registration WRs that can be
posted to match hardware capabilities.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-11 14:33:17 -07:00
Joe Perches
0f2f930a67 IB/qib: Remove unnecessary casts of private_data
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-06 14:43:51 -07:00
Maciej Sosnowski
52106bd24c RDMA/nes: Turn carrier off on ifdown
This lets the bonding driver to detect when an interface goes down.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-06 14:42:32 -07:00
Chien Tung
2932772156 RDMA/nes: Report correct port state if interface is down
With commit cd6860eb ("RDMA/nes: Fix hangs on ifdown") we no longer
remove nes interfaces on ifdown.  On nes_query_port(), add an
additional check of the netdev queue and report IB_PORT_DOWN if the
queue is not running.

Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-06 12:59:56 -07:00
Sonny Rao
625fbd3a36 IB/ehca: Fix driver on relocatable kernel
the eHCA driver registers a MR for all of kernel memory, but makes the
assumption that valid memory exists at KERNELBASE.  This assumption
may not be true in the case of a relocatable kernel, so use KERNELBASE
+ PHYSICAL_START to get the true beginning of usable kernel memory.

cc: Joachim Fenkes <fenkes@de.ibm.com>
cc: Christoph Raisch <raisch@de.ibm.com>
cc: Hoan-Ham Hguyen <hnguyen@de.ibm.com>
Signed-off-by: Sonny Rao <sonnyrao@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-06 12:57:07 -07:00
Thomas Gleixner
557d0540b9 IB/umad: Make user_mad semaphore a real one
Get rid of init_MUTEX[_LOCKED]() and use sema_init() instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 20:52:21 -07:00
Joe Perches
fc4ec9bd82 RDMA/amso1100: Remove KERN_<level> from pr_<level> use
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 20:51:20 -07:00
Dan Carpenter
4d8d6389b2 RDMA/nes: Remove unneeded variable
Just a small cleanup.  The "passive_state" variable isn't used any
more after commit dae58728dc ("RDMA/nes: Fix double CLOSE event
indication crash")

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 20:50:07 -07:00
Christoph Lameter
fed1db33fe IPoIB: Set pkt_type correctly for multicast packets (fix IGMP breakage)
IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for multicast traffic.  All incoming
packets are set to PACKET_HOST which means that igmp_recv() will
ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily.  In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 11:09:23 -07:00
Steve Wise
40dbf6ee38 RDMA/cxgb4: Fastreg NSMR fixes
- Remove dsgl support - doesn't work in T4.
- Wrap the immediate PBL as needed when building it in the wr.
- Adjust max pbl depth allowed based on ulptx alignment requirements.
- Bump the slots per SQ to 5 to allow up to 128MB fast registers.
- Advertise fastreg support by default.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:53:50 -07:00
Steve Wise
410ade4c26 RDMA/cxgb4: Don't set completion flag for read requests
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:53:49 -07:00
Steve Wise
98ae68b7ee RDMA/cxgb4: Set the default TCP send window to 128KB
This helps with large IO throughput.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:53:48 -07:00
Steve Wise
2f5b48c3ad RDMA/cxgb4: Use a mutex for QP and EP state transitions
Move the connection setup/teardown paths to the workq thread removing
spin lock/irq disable requirements for these paths.  This allows calls
down to the LLD for EP and QP state transition actions to be atomic
with respect to processing CPL messages coming up from the HW.
Namely, calls to rdma_init() and rdma_fini() can now be called with
the mutex held avoiding many race conditions with the abort path.

The QP spinlock is still used but only to manipulate the qp state.  This
allows the fastpaths, poll, post_send, and pos_recv, to run in the
irq context.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:53:48 -07:00
Steve Wise
c6d7b26791 RDMA/cxgb4: Support on-chip SQs
T4 support on-chip SQs to reduce latency.  This patch adds support for
this in iw_cxgb4:

 - Manage ocqp memory like other adapter mem resources.
 - Allocate user mode SQs from ocqp mem if available.
 - Map ocqp mem to user process using write combining.
 - Map PCIE_MA_SYNC reg to user process.

Bump uverbs ABI.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:35 -07:00
Steve Wise
aadc4df308 RDMA/cxgb4: Centralize the wait logic
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:34 -07:00
Steve Wise
9e8d1fa342 RDMA/cxgb4: debugfs files for dumping active stags
Add "stags" debugfs file.  This is useful for examining the TPTE and
PBL entries in adapter memory.  It allows scripts to dump just the
active entries.

Also clean up the "qps" file handlers and shared common code.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:33 -07:00
Steve Wise
05fb962947 RDMA/cxgb4: Log HW lack-of-resource errors
This helps debug cases where HW resources are depleted.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:33 -07:00
Steve Wise
0e42c1f430 RDMA/cxgb4: Handle CPL_RDMA_TERMINATE messages
T4 FW sends up CPL_RDMA_TERMINATE to indicate a peer TERM.  This
triggers the QP moving to TERMINATE state.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:32 -07:00
Steve Wise
6ff0e343b3 RDMA/cxgb4: Ignore TERMINATE CQEs
T4 incorrectly inserts TERM CQEs into the CQ.  Silently ignore them.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:31 -07:00
Steve Wise
7459486133 RDMA/cxgb4: Ignore positive return values from cxgb4_*_send() functions
The cxgb4_*_send() functions return NET_XMIT_ values, which are
positive integers or negative errno values.  So don't treat positive
return values as an error.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:31 -07:00
Steve Wise
13fecb83b4 RDMA/cxgb4: Zero out ISGL padding
The HW design requires zeroing any pad in SGLs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:30 -07:00
Steve Wise
af93fb5dcc RDMA/cxgb4: Don't use null ep ptr
In c4iw_modify_qp() error path, only use qhp->ep if ep is not already set.
Otherwise qhp->ep can be NULL and we crash.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:29 -07:00
Roland Dreier
183ae74bda RDMA/nes: Fix cast-to-pointer warnings on 32-bit
Fix:

  drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_alloc_fast_reg_page_list':
  drivers/infiniband/hw/nes/nes_verbs.c:477: warning: cast to pointer from integer of different size
  drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_post_send':
  drivers/infiniband/hw/nes/nes_verbs.c:3486: warning: cast to pointer from integer of different size
  drivers/infiniband/hw/nes/nes_verbs.c:3486: warning: cast to pointer from integer of different size

by printing u64 quantities by casting to unsigned long and long and
using %llx, rather than casting to void* and using %p.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-27 17:51:33 -07:00
Eli Cohen
a3f5adaf49 IB/core: Add link layer property to ports
This patch allows ports to have different link layers:
IB_LINK_LAYER_INFINIBAND or IB_LINK_LAYER_ETHERNET.  This is required
for adding IBoE (InfiniBand-over-Ethernet, aka RoCE) support.  For
devices that do not provide an implementation for querying the link
layer property of a port, we return a default value based on the
transport: RMA_TRANSPORT_IB nodes will return IB_LINK_LAYER_INFINIBAND
and RDMA_TRANSPORT_IWARP nodes will return IB_LINK_LAYER_ETHERNET.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-27 17:51:10 -07:00
Roland Dreier
c8e081a1bf RDMA/cxgb4: Fix warnings about casts to/from pointers of different sizes
Fix:

  drivers/infiniband/hw/cxgb4/qp.c: In function ‘create_qp’:
  drivers/infiniband/hw/cxgb4/qp.c:147: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/qp.c: In function ‘rdma_fini’:
  drivers/infiniband/hw/cxgb4/qp.c:988: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/qp.c: In function ‘rdma_init’:
  drivers/infiniband/hw/cxgb4/qp.c:1063: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/mem.c: In function ‘write_adapter_mem’:
  drivers/infiniband/hw/cxgb4/mem.c:74: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/cq.c: In function ‘destroy_cq’:
  drivers/infiniband/hw/cxgb4/cq.c:58: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/cq.c: In function ‘create_cq’:
  drivers/infiniband/hw/cxgb4/cq.c:135: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/cm.c: In function ‘fw6_msg’:
  drivers/infiniband/hw/cxgb4/cm.c:2326: warning: cast to pointer from integer of different size

by casting pointers to unsigned long instead of u64.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-27 17:51:04 -07:00
Steve Wise
bec658ff31 RDMA/cxgb3: Turn off RX coalescing for iWARP connections
The HW by default has RX coalescing on.  For iWARP connections, this
causes a 100ms delay in connection establishement due to the ingress
MPA Start message being stalled in HW.  So explicitly turn RX
coalescing off when setting up iWARP connections.

This was causing very bad performance for NP64 gather operations using
Open MPI, due to the way it sets up connections on larger jobs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-27 09:28:55 -07:00
Joe Perches
ea3f0e6bc5 drivers/infiniband: Remove unnecessary casts of private_data
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-09-23 13:38:28 +02:00
Roland Dreier
17859d07c8 Merge branches 'cxgb3' and 'nes' into for-linus 2010-09-08 14:43:28 -07:00