dect
/
linux-2.6
Archived
13
0
Fork 0
Commit Graph

283 Commits

Author SHA1 Message Date
FUJITA Tomonori 8d8bb39b9e dma-mapping: add the device argument to dma_mapping_error()
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:

This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).

I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread).  So I
CC'ed this to KVM camp.  Comments are appreciated.

A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
NULL, the system-wide dma_ops pointer is used as before.

If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging).  It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.

The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
device.  Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.

The first patch adds the device argument to dma_mapping_error.  The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.

This patch:

dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations.  So we can't have dma_mapping_ops per device.

Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
argument.

[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:03 -07:00
Jack Morgenstein 51a379d0c8 mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files
Update existing Mellanox copyright lines to 2008, and add such lines
to files where they are missing.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-25 10:32:52 -07:00
Linus Torvalds 5c402355ad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  MAINTAINERS: Remove Glenn Streiff from NetEffect entry
  mlx4_core: Improve error message when not enough UAR pages are available
  IB/mlx4: Add support for memory management extensions and local DMA L_Key
  IB/mthca: Keep free count for MTT buddy allocator
  mlx4_core: Keep free count for MTT buddy allocator
  mlx4_code: Add missing FW status return code
  IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg
  mlx4_core: Add module parameter to enable QoS support
  RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes
  IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures
  IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one()
  IB/ehca: Release mutex in error path of alloc_small_queue_page()
  IB/ehca: Use default value for Local CA ACK Delay if FW returns 0
  IB/ehca: Filter PATH_MIG events if QP was never armed
  IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event
  RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event
  RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event
2008-07-24 12:56:07 -07:00
Andrea Righi 27ac792ca0 PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:

	u64 val = PAGE_ALIGN(size);

always returns a value < 4GB even if size is greater than 4GB.

The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):

#define PAGE_SHIFT      12
#define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK       (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr)       (((addr)+PAGE_SIZE-1)&PAGE_MASK)

The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.

Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.

See also lkml discussion: http://lkml.org/lkml/2008/6/11/237

[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:21 -07:00
Roland Dreier 7644264082 mlx4_core: Improve error message when not enough UAR pages are available
If an mlx4 device with default FW (which gives a UAR BAR size of 8 MB)
is used in a system with 64 KB pages, then there are only 8192/64==128
UAR pages available.  However, the first 128 UAR pages are reserved
for use with event queue doorbells, so no UAR pages are available to
do anything else with, which means that the driver cannot work.

The current driver fails with a fairly cryptic "Failed to allocate
driver access region, aborting" message in this situation.  Fix the
driver to detect the problem earlier and print out a clearer
description of the problem and a suggestion of how to fix it (use a
new firmware image).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-23 08:12:47 -07:00
Roland Dreier 95d04f0735 IB/mlx4: Add support for memory management extensions and local DMA L_Key
Add support for the following operations to mlx4 when device firmware
supports them:

 - Send with invalidate and local invalidate send queue work requests;
 - Allocate/free fast register MRs;
 - Allocate/free fast register MR page lists;
 - Fast register MR send queue work requests;
 - Local DMA L_Key.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-23 08:12:26 -07:00
Roland Dreier e4044cfc49 mlx4_core: Keep free count for MTT buddy allocator
MTT entries are allocated with a buddy allocator, which just keeps
bitmaps for each level of the buddy table.  However, all free space
starts out at the highest order, and small allocations start scanning
from the lowest order.  When the lowest order tables have no free
space, this can lead to scanning potentially millions of bits before
finding a free entry at a higher order.

We can avoid this by just keeping a count of how many free entries
each order has, and skipping the bitmap scan when an order is
completely empty.  This provides a nice performance boost for a
negligible increase in memory usage.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:19:40 -07:00
Jack Morgenstein 899698dad7 mlx4_code: Add missing FW status return code
Add ICM_ERROR firmware status code.  In mapping to errnos, -ENFILE
seems closest.

This is in preparation for providing more detailed log info using
mlx4_err() in low-level driver when a non-zero status is returned.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:19:39 -07:00
Jack Morgenstein 51f5f0ee22 mlx4_core: Add module parameter to enable QoS support
Add a module parameter "enable_qos" to mlx4_core.  If this param is
set, enable support for QoS in the INIT_HCA command.  By default, the
parameter is set to 0 (disabled).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:19:37 -07:00
Vladimir Sokolovsky 2d92865158 mlx4_core: Use MOD_STAT_CFG command to get minimal page size
There was a bug in some versions of the mlx4 driver in
mlx4_alloc_fmr(), which hardcoded the minimum acceptable page_shift to
be 12.  However, new ConnectX firmware can support a minimum
page_shift of 9 (log_pg_sz of 9 returned by QUERY_DEV_LIM) -- so with
old drivers, ib_fmr_alloc() would fail for ULPs using the device
minimum when creating FMRs.

To preserve firmware compatibility with released mlx4 drivers, the
firmware will continue to return 12 as before for log_page_sz in
QUERY_DEV_CAP for these drivers.  However, to enable new drivers to
take advantage of the available smaller page size, the mlx4 driver now
first sets the log_pg_sz to the device minimum by setting a
log_page_sz value to 0 via the MOD_STAT_CFG command and then reading
the real minimum via QUERY_DEV_CAP.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:53 -07:00
Ron Livne 521e575b9a IB/mlx4: Add support for blocking multicast loopback packets
Add support for handling the IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK
flag by using the per-multicast group loopback blocking feature of
mlx4 hardware.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Oren Duer c5057ddccb mlx4_core: Support creation of FMRs with pages smaller than 4K
Don't hard code a test against a minimum page shift of 12, since the
device may support smaller pages.  Test against the actual smallest
page size from the device capabilities.

Signed-off-by: Oren Duer <oren@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-05 15:56:52 -07:00
Olaf Kirch bbdc2821db mlx4_core: Avoid recycling old FMR R_Keys too soon
When a FMR is unmapped, mlx4 resets the map count to 0, and clears the
upper part of the R_Key which is used as the sequence counter.

This poses a problem for RDS, which uses ib_fmr_unmap as a fence
operation.  RDS assumes that after issuing an unmap, the old R_Keys
will be invalid for a "reasonable" period of time. For instance,
Oracle processes uses shared memory buffers allocated from a pool of
buffers.  When a process dies, we want to reclaim these buffers -- but
we must make sure there are no pending RDMA operations to/from those
buffers.  The only way to achieve that is by using unmap and sync the
TPT.

However, when the sequence count is reset on unmap, there is a high
likelihood that a new mapping will be given the same R_Key that was
issued a few milliseconds ago.

To prevent this, don't reset the sequence count when unmapping a FMR.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Yevgeny Petrilin e463c7b197 mlx4_core: Add a way to set the "collapsed" CQ flag
Extend the mlx4_cq_resize() API with a way to set the "collapsed" flag
for the CQ being created.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:50 -07:00
Yevgeny Petrilin ed4d3c1061 mlx4_core: Add helper to move QP to ready-to-send
Avoid duplicating code in ethernet and FC modules.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-25 14:52:32 -07:00
Yevgeny Petrilin 38ae6a5354 mlx4_core: Add HW queues allocation helpers
Wrap doorbell, buffer and MTT allocation in helper functions for
ethernet and FC modules to use.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-25 14:27:08 -07:00
Vladimir Sokolovsky f5b3a096b1 mlx4_core: CQ resizing should pass a 0 opcode modifier to MODIFY_CQ
The call to mlx4_MODIFY_CQ() had a typo so that mlx4_cq_resize() was
actually asking the FW to modify a CQ's interrupt moderation rather than
asking it to resize a CQ.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Yevgeny Petrilin 6296883ca4 mlx4_core: Move kernel doorbell management into core
In addition to mlx4_ib, there will be ethernet and FC consumers of
mlx4_core, so move the code for managing kernel doorbells into the
core module to avoid having to duplicate this multiple times.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Eli Cohen 4ff08a76bc IB/mlx4: Fix incorrect comment
mlx4 hardware does not support external DDR memory.  Moreover, UAR
area (BAR 2) can change depending on FW version.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Eli Cohen 4dc51b3258 IB/mlx4: Fix race when detaching a QP from a multicast group
When detaching the last QP from an MCG entry, we need to make
sure that at any time, there will be no entry with zero number of
QPs which is linked to the list of the MCGs of the corresponding
hash index.  So don't write back the MCG entry if we are removing the
last QP; just unlink the entry.

Also, remove an unnecessary MCG read when attaching a QP requires
allocation of a new entry in the AMGM.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Vladimir Sokolovsky bbf8eed1a0 IB/mlx4: Add support for resizing CQs
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Eli Cohen 3fdcb97f0b IB/mlx4: Add support for modifying CQ moderation parameters
Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Jack Morgenstein 9b1f38515c mlx4_core: Increase max number of QPs to 128K
With the advent large clusters which utilize multicore hosts, 64K QPs
is not enough.  We should increase the default maximum for QPs to 128K.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Eli Cohen b832be1e40 IB/mlx4: Add IPoIB LSO support
Add TSO support to the mlx4_ib driver.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Eli Cohen 8ff095ec4b IB/mlx4: Add IPoIB checksum offload support
ConnectX devices support checksum generation and verification of TCP
and UDP packets for UD IPoIB messages.  This patch checks if the HCA
supports this and sets the IB_DEVICE_UD_IP_CSUM capability flag if it
does.  It implements support for handling the IB_SEND_IP_CSUM send
flag and setting the csum_ok field in receive work completions.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Ali Ayub <ali@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Roland Dreier 37608eea86 mlx4_core: Fix confusion between mlx4_event and mlx4_dev_event enums
The struct mlx4_interface.event() method was supposed to get an enum
mlx4_dev_event, but the driver code was actually passing in the
hardware enum mlx4_event values.  Fix up the callers of
mlx4_dispatch_event() so that they pass in the right type of value,
and fix up the event method in mlx4_ib so that it can handle the enum
mlx4_dev_event values.

This eliminates the need for the subtype parameter to the event
method, so remove it.

This also fixes the sparse warning

    drivers/net/mlx4/intf.c:127:48: warning: mixing different enum types
    drivers/net/mlx4/intf.c:127:48:     int enum mlx4_event  versus
    drivers/net/mlx4/intf.c:127:48:     int enum mlx4_dev_event

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier ca28121114 mlx4_core: Move opening brace of function onto a new line
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:04 -07:00
Jack Morgenstein 11e75a7455 mlx4_core: Move table_find from fmr_alloc to fmr_enable
mlx4_table_find (for FMR MPTs) requires that ICM memory already be
mapped.  Before this fix, FMR allocation depended on ICM memory
already being mapped for the MPT entry.  If all currently mapped
entries are taken, the find operation fails (even if the MPT ICM table
still had more entries, which were just not mapped yet).

This fix moves the mpt find operation to fmr_enable, to guarantee that
any required ICM memory mapping has already occurred.

Found by Oren Duer of Mellanox.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-14 10:43:48 -08:00
Olof Johansson 29c271123d mlx4_core: Fix build break (missing include)
Commit 313abe55 ("mlx4_core: For 64-bit systems, vmap() kernel queue
buffers") caused this to pop up on powerpc allyesconfig, looks like a
missing include file:

    drivers/net/mlx4/alloc.c: In function 'mlx4_buf_alloc':
    drivers/net/mlx4/alloc.c:162: error: implicit declaration of function 'vmap'
    drivers/net/mlx4/alloc.c:162: error: 'VM_MAP' undeclared (first use in this function)
    drivers/net/mlx4/alloc.c:162: error: (Each undeclared identifier is reported only once
    drivers/net/mlx4/alloc.c:162: error: for each function it appears in.)
    drivers/net/mlx4/alloc.c:162: warning: assignment makes pointer from integer without a cast
    drivers/net/mlx4/alloc.c: In function 'mlx4_buf_free':
    drivers/net/mlx4/alloc.c:187: error: implicit declaration of function 'vunmap'

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-11 14:19:42 -08:00
Roland Dreier b57aacfa7a mlx4_core: Clean up struct mlx4_buf
Now that struct mlx4_buf.u is a struct instead of a union because of
the vmap() changes, there's no point in having a struct at all.  So
move .direct and .page_list directly into struct mlx4_buf and get rid
of a bunch of unnecessary ".u"s.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-06 21:17:59 -08:00
Jack Morgenstein 313abe55a8 mlx4_core: For 64-bit systems, vmap() kernel queue buffers
Since kernel virtual memory is not a problem on 64-bit systems, there
is no reason to use our own 2-layer page mapping scheme for large
kernel queue buffers on such systems.  Instead, map the page list to a
single virtually contiguous buffer with vmap(), so that can we access
buffer memory via direct indexing.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-06 21:17:45 -08:00
Roland Dreier f33afc26dc IB: Avoid marking __devinitdata as const
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Jack Morgenstein 893da75956 mlx4_core: Don't read reserved fields in mlx4_QUERY_ADAPTER()
The firmware QUERY_ADAPTER command does not return vendor_id,
device_id, and revision_id; eliminate these fields from the query.

Initialize the rev_id field of the mlx4 device via init_node_data (MAD
IFC query), as is done in the query_device verb implementation.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Roland Dreier e8f9b2ed98 mlx4_core: Fix more section mismatches
Commit 3d73c288 ("mlx4_core: Fix section mismatches") fixed some of
the section mismatches introduced when error recovery was added, but
there were still more cases of errory recovery code calling into
__devinit code from regular .text.  Fix this by getting rid of the
now-incorrect __devinit annotations.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:41 -08:00
Jack Morgenstein 5920869f1e mlx4_core: Fix max_eqs masking in QUERY_DEV_CAP
log_max_eqs is a 4-bit field, not a 3-bit field in the response to the
QUERY_DEV_CAP FW command, so we should mask with 0xf instead of 0x7
when reading it.

Found by Yossi Leybovitch of Mellanox.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:29 -08:00
Jack Morgenstein 9ed87fd34c mlx4_core: Fix state check in mlx4_qp_modify()
When checking the states passed in, mlx4_qp_modify() accidentally checks
cur_state twice rather than checking cur_state and new_state.  Fix this
to make sure that both values are in-bounds.

Since these values may be passed in from userspace, this bug results in
userspace being able to trigger an oops.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: stable <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-20 13:01:28 -08:00
Jack Morgenstein e383d19e90 mlx4_core: Fix thinko in QP destroy (incorrect bitmap_free)
Fix thinko in commit eaf559bf ("mlx4_core: Don't free special QPs in
QP number bitmap").  The old commit had the logic exactly backwards
and ended up freeing *only* special QPs, which not only left the
original bug in place but also introduced the problem that the QP
number bitmap would get full after a while.

Found by Dotan Barak of Mellanox.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-14 08:20:03 -08:00
Ali Ayoub 3bba11e5c4 mlx4_core: Fix possible bad free in mlx4_buf_free()
When mlx4_buf_free() is called from the error path of
mlx4_buf_alloc(), it may be passed a buffer structure that does not
have all pages filled in.  Add a check for NULL to mlx4_buf_free() so
we avoid passing NULL to dma_free_coherent() (which will crash).

Signed-off-by: Ali Ayoub <ali@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-13 15:26:57 -08:00
Jens Axboe 642f149031 SG: Change sg_set_page() to take length and offset argument
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.

Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-24 11:20:47 +02:00
Linus Torvalds 0b776eb542 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
  IPoIB/cm: Use common CQ for CM send completions
  IB/uverbs: Fix checking of userspace object ownership
  IB/mlx4: Sanity check userspace send queue sizes
  IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
  IB/ehca: Enable large page MRs by default
  IB/ehca: Change meaning of hca_cap_mr_pgsize
  IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
  IB/ehca: Fix masking error in {,re}reg_phys_mr()
  IB/ehca: Supply QP token for SRQ base QPs
  IPoIB: Use round_jiffies() for ah_reap_task
  RDMA/cma: Fix deadlock destroying listen requests
  RDMA/cma: Add locking around QP accesses
  IB/mthca: Avoid alignment traps when writing doorbells
  mlx4_core: Kill mlx4_write64_raw()
2007-10-23 09:56:11 -07:00
Jens Axboe 45711f1af6 [SG] Update drivers to use sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 21:19:53 +02:00
Jack Morgenstein 77109cc282 mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
The current INIT_HCA firmware command timeout is sufficient for the
default number of resources (QPs, CQs, etc) being allocated, but if
the HCA profile is modified to increase the amount of resources, then
a spurious timeout is detected and HCA initialization fails.

Increase the timeout for the INIT_HCA command to 10 seconds, which
also brings it into line with all the other command timeouts.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-21 15:06:04 -07:00
Roland Dreier b027cacdab mlx4_core: Fix infinite loop on device initialization
Commit 3d73c288 ("mlx4_core: Fix section mismatches") introduced a
stupid bug in device init: when some of mlx4_init_one() was split off
into __mlx4_init_one(), the call from the main mlx4_init_one()
function was back to mlx4_init_one() rather than to __mlx4_init_one(),
which leads to an obvious infinite loop if the function is every
called.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-13 14:10:50 -07:00
Roland Dreier 3d73c2884f mlx4_core: Fix section mismatches
Commit ee49bd93 ("mlx4_core: Reset device when internal error is
detected") introduced some section mismatch problems when
CONFIG_HOTPLUG=n, because the error recovery code tears down and
reinitializes the device after everything is loaded, which ends up
calling into lots of code marked __devinit and __devexit from regular
.text.  Fix this by getting rid of these now-incorrect section
markers.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-10 15:43:54 -07:00
Roland Dreier 2e61c646ed mlx4_core: Use mmiowb() to avoid firmware commands getting jumbled up
Firmware commands are sent to the HCA by writing multiple words to a
command register block.  Access to this block of registers is
serialized with a mutex.  However, on large SGI systems writes to the
register block may be reordered within the system interconnect and
reach the HCA in a different order than they were issued (even with
the mutex).  Fix this by adding an mmiowb() before dropping the mutex.

This bug was observed with real workloads with the similar FW command
code in the mthca driver, and adding the mmiowb() as in commit
66547550 ("IB/mthca: Use mmiowb() to avoid firmware commands getting
jumbled up") was confirmed to fix the problems, so we should add the
same fix to mlx4.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:18 -07:00
Jack Morgenstein e57ac0c297 mlx4_core: Increase max number of QPs per multicast group to 56
Increase the number of QPs allowed per multicast group from 8 to 56.
This allows for one QP per core on 16-core systems, which are now
quite common, and allows some space for future growth.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:16 -07:00
Jack Morgenstein 8ad11fb6b0 IB/mlx4: Implement FMRs
Implement FMRs for mlx4.  This is an adaptation of code from mthca.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:16 -07:00
Jack Morgenstein d7bb58fb1c mlx4_core: Write MTTs from CPU instead with of WRITE_MTT FW command
Write MTT entries directly to ICM from the driver (eliminating use of
WRITE_MTT command).  This reduces the number of FW commands needed to
register an MR by at least a factor of 2 and speeds up memory
registration significantly.  This code will also be used to implement
FMRs.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:16 -07:00
Roland Dreier 121964ec38 mlx4_core: Fix meaning of dev->caps.reserved_mtts
Everything that uses caps.reserved_mtts expects it to be a count of MTT
segments, not MTT entries.  So convert the value that the FW gives us to
a count of segments.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:16 -07:00
Roland Dreier cf78237d7b mlx4_core: Reserve the correct number of MTT segments
Taking ilog2(dev->caps.reserved_mtts) to find out the order to pass to
the MTT buddy allocator will do the wrong thing if reserved_mtts is ever
not a power of 2.  Be safe and use fls(dev->caps.reserved_mtts - 1).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:16 -07:00
Jack Morgenstein 5b0bf5e25e mlx4_core: Support ICM tables in coherent memory
Enable having ICM tables in coherent memory, and use coherent memory
for the dMPT table.  This will allow writing MPT entries for MRs both
via the SW2HW_MPT command and also directly by the driver for FMR
remapping without needing to flush or worry about cacheline boundaries.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:15 -07:00
Jack Morgenstein cd9281d873 IB/mlx4: Display misc device information under /sys/class/infiniband/
display the following device information under /sys/class/infiniband/mlx4_X:
board_id, fw_ver, hw_rev, hca_type.

This patch makes this information available to userspace utilities
such as ibstat and ibv_devinfo.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:14 -07:00
Roland Dreier ea98054fef mlx4_core: Change capability decoding: SRC->XRC
The SRC ("scalable RC") transport has been renamed to XRC ("extended 
RC"), to avoid having an abbreviation that is so easily confused with an 
abbreviation for "source."  Update the HCA capability decoding output to 
use the new name.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:13 -07:00
Roland Dreier d7dc3ccbe4 IB/mlx4: Fix up SRQ limit_watermark endianness
mlx4_srq_query() returns a big-endian 16-bit value through an int *,
which screws up sparse checking.  Fix this so that a CPU-endian value
is returned.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00
Michael S. Tsirkin 08fb105540 mlx4_core: Enable MSI-X by default
Recover from MSI-X errors by automatically falling back on regular
interrupt, instead of asking the user to do this manually.  This makes
it possible to enable MSI-X by default, and will make it possible to
get rid of the msi_x module option in the future.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:05 -07:00
Roland Dreier eaf559bf56 mlx4_core: Don't free special QPs in QP number bitmap
Special QPs are not allocated using the regular QP number bitmap, so
when they are destroyed, their QP number should not be freed in the
bitmap.

Found by Dotan Barak of Mellanox.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:05 -07:00
Dotan Barak 36ce10d3e8 mlx4_core: Use enum value GO_BIT_TIMEOUT_MSECS
Rename GO_BIT_TIMEOUT to GO_BIT_TIMEOUT_MSECS for clarity, and
actually use it as the go bit timeout (instead of having the define
but then ignoring it and using a hard-coded 10 * HZ for the actual
timeout).

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:04 -07:00
Eli Cohen 947b2a8083 mlx4_core: Wait 1 second after reset before accessing device
Put a 1000 msec delay after resetting the device before attempting to
do config cycles on it.  Not waiting causes system hangs on some
chipsets, e.g. Intel E7520, when the driver is loaded.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-13 08:47:44 -07:00
Jack Morgenstein 0172e2e14c mlx4_core: Remove kfree() in mlx4_mr_alloc() error flow
mlx4_mr_alloc() doesn't actually allocate mr (it just initializes the
pointer that the caller passes in), so it shouldn't free it if an
error occurs.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-28 08:30:45 -07:00
Roland Dreier 0981582dbf mlx4_core: Change command token on timeout
The FW command token is currently only updated on a command completion
event. This means that on command timeout, the same token will be
reused for new command, which results in a mess if the timed out
command *does* eventually complete.

This is the same change as the patch for mthca from Michael
S. Tsirkin <mst@dev.mellanox.co.il> that was just merged.  It seems
sensible to avoid gratuitous differences in FW command processing
between mthca and mlx4.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-20 21:19:43 -07:00
Jack Morgenstein c9f2ba5ed2 IB/mlx4: Increase max outstanding RDMA reads as target
Change the maximum number of outstanding RDMA reads allowed as a
target from 4 to 16 to per QP.  This allows RDMA read operations to
pipeline better.

Pointed out by Dotan Barak and Sagi Rotem.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 20:50:50 -07:00
Jack Morgenstein ee49bd9397 mlx4_core: Reset device when internal error is detected
Reset the device when an internal error is detected.

Also, detect errors by polling the error buffer rather than using
interrupts.  This is more robust and doesn't depend on MSI-X.  Remove
the old interrupt handler entirely, since we don't want to support two
mechanisms for detecting internal errors.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:42 -07:00
Jack Morgenstein 65541cb7cf IB/mlx4: Implement query SRQ
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-12 15:41:24 -07:00
Jack Morgenstein 6a775e2ba4 IB/mlx4: Implement query QP
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-12 15:41:00 -07:00
Dotan Barak 149983af60 mlx4_core: Get the maximum message size from reported device capabilities
Get the maximum message size from the device capabilities returned
from the QUERY_DEV_CAP firmware command, rather than hard-coding 2 GB.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Michael S. Tsirkin 525f5f44c4 mlx4_core: Include linux/mutex.h from mlx4.h
mlx4.h uses struct mutex, so although <linux/mutex.h> seems to be pulled in
indirectly by one of the headers it includes, the right thing to do is
to include <linux/mutex.h> directly.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:20 -07:00
Bill Nottingham 287aa83dff drivers/net: fix comparisons of unsigned < 0
Recent gcc versions emit warnings when unsigned variables are compared < 0 or >= 0.

Signed-off-by: Bill Nottingham <notting@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-08 22:16:39 -04:00
Jack Morgenstein 786f238e4f mlx4_core: Add new Mellanox device IDs
Add new IDs for PCIe gen2 devices.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-02 20:41:35 -07:00
Roland Dreier 5ae2a7a836 IB/mlx4: Handle FW command interface rev 3
Upcoming firmware introduces command interface revision 3, which
changes the way port capabilities are queried and set.  Update the
driver to handle both the new and old command interfaces by adding a
new MLX4_FLAG_OLD_PORT_CMDS that it is set after querying the firmware
interface revision and then using the correct interface based on the
setting of the flag.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-18 08:15:02 -07:00
Roland Dreier 0e6e741621 IB/mlx4: Handle new FW requirement for send request prefetching
New ConnectX firmware introduces FW command interface revision 2,
which requires that for each QP, a chunk of send queue entries (the
"headroom") is kept marked as invalid, so that the HCA doesn't get
confused if it prefetches entries that haven't been posted yet.  Add
code to the driver to do this, and also update the user ABI so that
userspace can request that the prefetcher be turned off for userspace
QPs (we just leave the prefetcher on for all kernel QPs).

Unfortunately, marking send queue entries this way is confuses older
firmware, so we change the driver to allow only FW command interface
revisions 2.  This means that users will have to update their firmware
to work with the new driver, but the firmware is changing quickly and
the old firmware has lots of other bugs anyway, so this shouldn't be too
big a deal.

Based on a patch from Jack Morgenstein <jackm@dev.mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-18 08:13:48 -07:00
Jack Morgenstein b2d9308ae4 mlx4_core: Don't set MTT address in dMPT entries with PA set
If a dMPT entry has the PA flag (direct physical address) set, then
the (unused) MTT base address field has to be set to 0.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 23:24:38 -07:00
Roland Dreier fe40900f40 mlx4_core: Check firmware command interface revision
HCA firmware with incompatible changes to the FW commmand interface is
coming soon.  Add a check of the interface revision during
initialization and bail out if the firmware advertises a revision that
the driver doesn't know about.  This will avoid strange failures later
if the driver goes on using the wrong interface revision. 

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 23:24:36 -07:00
Roland Dreier 3e1db334dc IB/mthca, mlx4_core: Fix typo in comment
s/signifant/significant/

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 11:51:59 -07:00
Roland Dreier 2c5cb23558 mlx4_core: Free catastrophic error MSI-X interrupt with correct dev_id
We need to pass the same dev_id to free_irq() and request_irq().  When
using MSI-X, the MLX4_EQ_CATAS interrupt uses a different dev_id from
the other interrupts.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 11:51:58 -07:00
Roland Dreier b581401ed0 mlx4_core: Initialize ctx_list and ctx_lock earlier
We may call mlx4_dispatch_event() before mlx4_register_device() is
called for a device, because for example a catastrophic error happens
immediately after we enable interrupts.  Therefore priv->ctx_list and
priv->ctx_lock need to be initialized earlier.

This bug was actually exposed by the MSI-X bug that returned IRQ numbers 
to drivers in reverse order, so that the first FW command 
interrupt looked to mlx4 like a catastrophic error.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 11:51:58 -07:00
Eli Cohen 09360d5408 mlx4_core: Fix CQ context layout
The reserved6 field should be 64 bits, not just 16 bits.  Without
this, the structure does not match the hardware layout on 32-bit
architectures: the db_rec_addr field ends up at offset 52 instead of
offset 56.  The bug slipped by because the alignment of __be64 members
ends up putting it in the right place on x86-64.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 11:51:57 -07:00
Roland Dreier a2cb4a98f2 IB/mlx4: Fix last allocated object tracking in bitmap allocator
Set last allocated object to the object after the one just allocated
before ORing in the extra top bits.  Also handle the case where this
wraps around.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29 16:07:09 -07:00
Roland Dreier 23c15c21d3 mlx4_core: Fix array overrun in dump_dev_cap_flags()
Don't overrun fname[] array when decoding device flags.

This was spotted by the Coverity checker (CID 1642).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:57 -07:00
Al Viro 9cbe05c712 missing includes in mlx4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-15 18:56:37 -07:00
Linus Torvalds de7860c3f3 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB/cm: Optimize stale connection detection
  IB/mthca: Set cleaned CQEs back to HW ownership when cleaning CQ
  IB/mthca: Fix posting >255 recv WRs for Tavor
  RDMA/cma: Add check to validate that cm_id is bound to a device
  RDMA/cma: Fix synchronization with device removal in cma_iw_handler
  RDMA/cma: Simplify device removal handling code
  IB/ehca: Disable scaling code by default, bump version number
  IB/ehca: Beautify sysfs attribute code and fix compiler warnings
  IB/ehca: Remove _irqsave, move #ifdef
  IB/ehca: Fix AQP0/1 QP number
  IB/ehca: Correctly set GRH mask bit in ehca_modify_qp()
  IB/ehca: Serialize hypervisor calls in ehca_register_mr()
  IB/ipath: Shadow the gpio_mask register
  IB/mlx4: Fix uninitialized spinlock for 32-bit archs
  mlx4_core: Remove unused doorbell_lock
  net: Trivial MLX4_DEBUG dependency fix.
2007-05-15 09:52:31 -07:00
Roland Dreier 20eebcf09c mlx4_core: Remove unused doorbell_lock
struct mlx4_priv.doorbell_lock is never used, so delete it.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-13 08:54:18 -07:00
Andrew Morton 4093785dd1 mlx4: don't use deprecated IRQ flags
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-05-11 17:53:31 -04:00
Roland Dreier 225c7b1fee IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters
Add an InfiniBand driver for Mellanox ConnectX adapters.  Because
these adapters can also be used as ethernet NICs and Fibre Channel 
HBAs, the driver is split into two modules: 
 
  mlx4_core: Handles low-level things like device initialization and 
    processing firmware commands.  Also controls resource allocation 
    so that the InfiniBand, ethernet and FC functions can share a 
    device without stepping on each other. 
 
  mlx4_ib: Handles InfiniBand-specific things; plugs into the 
    InfiniBand midlayer. 

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:38 -07:00