Archived
14
0
Fork 0
Commit graph

76 commits

Author SHA1 Message Date
David Teigland
a536e38125 dlm: ignore cancel on granted lock
Return immediately from dlm_unlock(CANCEL) if the lock is
granted and not being converted; there's nothing to cancel.

Signed-off-by: David Teigland <teigland@redhat.com>
2009-03-11 12:23:58 -05:00
David Teigland
43279e5376 dlm: clear defunct cancel state
When a conversion completes successfully and finds that a cancel
of the convert is still in progress (which is now a moot point),
preemptively clear the state associated with outstanding cancel.
That state could cause a subsequent conversion to be ignored.

Also, improve the consistency and content of error and debug
messages in this area.

Signed-off-by: David Teigland <teigland@redhat.com>
2009-03-11 12:23:39 -05:00
David Teigland
c7be761a81 dlm: change rsbtbl rwlock to spinlock
The rwlock is almost always used in write mode, so there's no reason
to not use a spinlock instead.

Signed-off-by: David Teigland <teigland@redhat.com>
2009-01-08 15:12:39 -06:00
David Teigland
e3a84ad495 dlm: add time stamp of blocking callback
Record the time the latest blocking callback was queued for
a lock.  This will be used for debugging in combination with
lock queue timestamp changes in the previous patch.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-12-23 10:18:34 -06:00
David Teigland
eeda418d8c dlm: change lock time stamping
Use ktime instead of jiffies for timestamping lkb's.  Also stamp the
time on every lkb whenever it's added to a resource queue, instead of
just stamping locks subject to timeouts.  This will allow us to use
timestamps more widely for debugging all locks.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-12-23 10:18:17 -06:00
David Teigland
fd22a51bcc dlm: improve how bast mode handling
The lkb bastmode value is set in the context of processing the
lock, and read by the dlm_astd thread.  Because it's accessed
in these two separate contexts, the writing/reading ought to
be done under a lock.  This is simple to do by setting it and
reading it when the lkb is added to and removed from dlm_astd's
callback list which is properly locked.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-12-23 10:16:46 -06:00
Benny Halevy
18c60c0a3b dlm: fix uninitialized variable for search_rsb_list callers
gcc 4.3.0 correctly emits the following warning.
search_rsb_list does not *r_ret if no dlm_rsb is found
and _search_rsb may pass the uninitialized value upstream
on the error path when both calls to search_rsb_list
return non-zero error.

The fix sets *r_ret to NULL on search_rsb_list's not-found path.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-07-14 13:56:59 -05:00
David Teigland
329fc4c372 dlm: fix basts for granted CW waiting PR/CW
The fix in commit 3650925893 was addressing
the case of a granted PR lock with waiting PR and CW locks.  It's a
special case that requires forcing a CW bast.  However, that forced CW
bast was incorrectly applying to a second condition where the granted
lock was CW.  So, the holder of a CW lock could receive an extraneous CW
bast instead of a PR bast.  This fix narrows the original special case to
what was intended.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-07-14 13:56:59 -05:00
David Teigland
761b9d3ffc dlm: save master info after failed no-queue request
When a NOQUEUE request fails, the rsb res_master field is unnecessarily
reset to -1, instead of leaving the valid master setting in place.  We
want to save the looked-up master values while the rsb is on the "toss
list" so that another lookup can be avoided if the rsb is soon reused.
The fix is to simply leave res_master value alone.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-04-21 11:18:01 -05:00
Adrian Bunk
170e19ab29 dlm: make dlm_print_rsb() static
dlm_print_rsb() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-04-21 11:18:01 -05:00
David Teigland
d292c0cc48 dlm: eliminate astparam type casting
Put lkb_astparam in a union with a dlm_user_args pointer to
eliminate a lot of type casting.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-06 23:27:04 -06:00
David Teigland
e5dae548b0 dlm: proper types for asts and basts
Use proper types for ast and bast functions, and use
consistent type for ast param.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-06 00:35:45 -06:00
Al Viro
a9cc915928 dlm: fix overflows when copying from ->m_extra to lvb
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04 01:29:13 -06:00
Al Viro
ef58bccab7 dlm: make find_rsb() fail gracefully when namelen is too large
We *can* get there from receive_request() and dlm_recover_master_copy()
with namelen too large if incoming request is invalid; BUG() from
DLM_ASSERT() in allocate_rsb() is a bit excessive reaction to that
and in case of dlm_recover_master_copy() we would actually oops before
that while calculating hash of up to 64Kb worth of data - with data
actually being 64 _bytes_ in kmalloc()'ed struct.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04 01:26:31 -06:00
Al Viro
a5dd06313d dlm: receive_rcom_lock_args() overflow check
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04 01:25:58 -06:00
Al Viro
ae773d0b74 dlm: verify that places expecting rcom_lock have packet long enough
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04 01:25:09 -06:00
Al Viro
163a1859ec dlm: do not byteswap rcom_lock
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04 01:23:14 -06:00
Al Viro
eef7d739c2 dlm: dlm_process_incoming_buffer() fixes
* check that length is large enough to cover the non-variable part of message or
  rcom resp. (after checking that it's large enough to cover the header, of
  course).

* kill more pointless casts

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04 01:22:42 -06:00
Al Viro
8b0d8e03f8 dlm: use proper C for dlm/requestqueue stuff (and fix alignment bug)
a) don't cast the pointer to dlm_header *, we use it as dlm_message *
   anyway.
b) we copy the message into a queue element, then pass the pointer to
   copy to dlm_receive_message_saved(); declare it properly to make sure
   that we have the right alignment.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04 01:21:32 -06:00
David Teigland
85f0379aa0 dlm: keep cached master rsbs during recovery
To prevent the master of an rsb from changing rapidly, an unused rsb is kept
on the "toss list" for a period of time to be reused.  The toss list was
being cleared completely for each recovery, which is unnecessary.  Much of
the benefit of the toss list can be maintained if nodes keep rsb's in their
toss list that they are the master of.  These rsb's need to be included
when the resource directory is rebuilt during recovery.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:43 -06:00
David Teigland
594199ebaa dlm: change error message to debug
The invalid lockspace messages are normal and can appear relatively
often.  They should be suppressed without debugging enabled.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:43 -06:00
David Teigland
755b5eb8ba dlm: limit dir lookup loop
In a rare case we may need to repeat a local resource directory lookup
due to a race with removing the rsb and removing the resdir record.
We'll never need to do more than a single additional lookup, though,
so the infinite loop around the lookup can be removed.  In addition
to being unnecessary, the infinite loop is dangerous since some other
unknown condition may appear causing the loop to never break.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland
42dc1601a9 dlm: reject normal unlock when lock is waiting for lookup
Non-forced unlocks should be rejected if the lock is waiting on the
rsb_lookup list for another lock to establish the master node.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland
c54e04b00f dlm: validate messages before processing
There was some hit and miss validation of messages that has now been
cleaned up and unified.  Before processing a message, the new
validate_message() function checks that the lkb is the appropriate type,
process-copy or master-copy, and that the message is from the correct
nodeid for the the given lkb.  Other checks and assertions on the
lkb type and nodeid have been removed.  The assertions were particularly
bad since they would panic the machine instead of just ignoring the bad
message.

Although other recent patches have made processing old message unlikely,
it still may be possible for an old message to be processed and caught
by these checks.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland
46b43eed70 dlm: reject messages from non-members
Messages from nodes that are no longer members of the lockspace should be
ignored.  When nodes are removed from the lockspace, recovery can
sometimes complete quickly enough that messages arrive from a removed node
after recovery has completed.  When processed, these messages would often
cause an error message, and could in some cases change some state, causing
problems.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland
aec64e1be2 dlm: another call to confirm_master in receive_request_reply
When a failed request (EBADR or ENOTBLK) is unlocked/canceled instead of
retried, there may be other lkb's waiting on the rsb_lookup list for it
to complete.  A call to confirm_master() is needed to move on to the next
waiting lkb since the current one won't be retried.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland
601342ce02 dlm: recover locks waiting for overlap replies
When recovery looks at locks waiting for replies, it fails to consider
locks that have already received a reply for their first remote operation,
but not received a reply for secondary, overlapping unlock/cancel.  The
appropriate stub reply needs to be called for these waiters.

Appears when we start doing recovery in the presence of a many overlapping
unlock/cancel ops.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland
8a358ca8e7 dlm: clear ast_type when removing from astqueue
The lkb_ast_type field indicates whether the lkb is on the astqueue list.
When clearing locks for a process, lkb's were being removed from the astqueue
list without clearing the field.  If release_lockspace then happened
immediately afterward, it could try to remove the lkb from the list a second
time.

Appears when process calls libdlm dlm_release_lockspace() which first
closes the ls dev triggering clear_proc_locks, and then removes the ls
(a write to control dev) causing release_lockspace().

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland
52bda2b5ba dlm: use dlm prefix on alloc and free functions
The dlm functions in memory.c should use the dlm_ prefix.  Also, use
kzalloc/kfree directly for dlm_direntry's, removing the wrapper functions.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-29 17:17:19 -06:00
David Teigland
11b2498ba7 dlm: don't print common non-errors
Change log_error() to log_debug() for conditions that can occur in
large number in normal operation.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-29 17:17:08 -06:00
Adrian Bunk
e028398da7 dlm: proper prototypes
This patch adds a proper prototype for some functions in
fs/dlm/dlm_internal.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-29 17:16:52 -06:00
David Teigland
c36258b592 [DLM] block dlm_recv in recovery transition
Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
threads while working in the dlm.  This allows dlm_recv activity to be
suspended when the lockspace transitions to, from and between recovery
cycles.

The specific bug prompting this change is one where an in-progress
recovery cycle is aborted by a new recovery cycle.  While dlm_recv was
processing a recovery message, the recovery cycle was aborted and
dlm_recoverd began cleaning up.  dlm_recv decremented recover_locks_count
on an rsb after dlm_recoverd had reset it to zero.  This is fixed by
suspending dlm_recv (taking write lock on the rwsem) before aborting the
current recovery.

The transitions to/from normal and recovery modes are simplified by using
this new ability to block dlm_recv.  The switch from normal to recovery
mode means dlm_recv goes from processing locking messages, to saving them
for later, and vice versa.  Races are avoided by blocking dlm_recv when
setting the flag that switches between modes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:56:38 +01:00
Patrick Caulfield
b434eda6fd [DLM] don't overwrite castparam if it's NULL
If the castaddr passed to the userland API is NULL then don't overwrite the
existing castparam. This allows a different thread to cancel a lock request and
the CANCEL AST gets delivered to the original thread.

bz#306391 (for RHEL4) refers.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:56:36 +01:00
David Teigland
3650925893 [DLM] fix basts for granted PR waiting CW
Fix a long standing bug where a blocking callback would be missed
when there's a granted lock in PR mode and waiting locks in both
PR and CW modes (and the PR lock was added to the waiting queue
before the CW lock).  The logic simply compared the numerical values
of the modes to determine if a blocking callback was required, but in
the one case of PR and CW, the lower valued CW mode blocks the higher
valued PR mode.  We just need to add a special check for this PR/CW
case in the tests that decide when a blocking callback is needed.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14 10:31:02 +01:00
Patrick Caulfield
44f487a553 [DLM] variable allocation
Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace.
This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL.
(This updated version of the patch uses gfp_t for ls_allocation.)

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-Off-By: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:17 +01:00
David Teigland
8b4021fa43 [DLM] canceling deadlocked lock
Add a function that can be used through libdlm by a system daemon to cancel
another process's deadlocked lock.  A completion ast with EDEADLK is returned
to the process waiting for the lock.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:54 +01:00
David Teigland
84d8cd69a8 [DLM] timeout fixes
Various fixes related to the new timeout feature:
- add_timeout() missed setting TIMEWARN flag on lkb's when the
  TIMEOUT flag was already set
- clear_proc_locks should remove a dead process's locks from the
  timeout list
- the end-of-life calculation for user locks needs to consider that
  ETIMEDOUT is equivalent to -DLM_ECANCEL
- make initial default timewarn_cs config value visible in configfs
- change bit position of TIMEOUT_CANCEL flag so it's not copied to
  a remote master node
- set timestamp on remote lkb's so a lock dump will display the time
  they've been waiting

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:52 +01:00
Steven Whitehouse
b3cab7b9a3 [DLM] Compile fix
A one liner fix which got missed from the earlier patches.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
Cc: David Teigland <teigland@redhat.com>
2007-07-09 08:22:49 +01:00
David Teigland
639aca417d [DLM] fix compile breakage
In the rush to get the previous patch set sent, a compilation bug I fixed
shortly before sending somehow got clobbered, probably by a missed quilt
refresh or something.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:45 +01:00
David Teigland
c85d65e914 [DLM] cancel in conversion deadlock [4/6]
When conversion deadlock is detected, cancel the conversion and return
EDEADLK to the application.  This is a new default behavior where before
the dlm would allow the deadlock to exist indefinately.

The DLM_LKF_NODLCKWT flag can now be used in a conversion to prevent the
dlm from performing conversion deadlock detection/cancelation on it.
The DLM_LKF_CONVDEADLK flag can continue to be used as before to tell the
dlm to demote the granted mode of the lock being converted if it gets into
a conversion deadlock.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:38 +01:00
David Teigland
d7db923ea4 [DLM] dlm_device interface changes [3/6]
Change the user/kernel device interface used by libdlm:
- Add ability for userspace to check the version of the interface.  libdlm
  can now adapt to different versions of the kernel interface.
- Increase the size of the flags passed in a lock request so all possible
  flags can be used from userspace.
- Add an opaque "xid" value for each lock.  This "transaction id" will be
  used later to associate locks with each other during deadlock detection.
- Add a "timeout" value for each lock.  This is used along with the
  DLM_LKF_TIMEOUT flag.

Also, remove a fragment of unused code in device_read().

This patch requires updating libdlm which is backward compatible with
older kernels.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:36 +01:00
David Teigland
3ae1acf93a [DLM] add lock timeouts and warnings [2/6]
New features: lock timeouts and time warnings.  If the DLM_LKF_TIMEOUT
flag is set, then the request/conversion will be canceled after waiting
the specified number of centiseconds (specified per lock).  This feature
is only available for locks requested through libdlm (can be enabled for
kernel dlm users if there's a use for it.)

If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
a warning message will be sent to userspace (using genetlink) after a
request/conversion has been waiting for a given number of centiseconds
(configurable per node).  The time warnings will be used in the future
to do deadlock detection in userspace.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:33 +01:00
David Teigland
85e86edf95 [DLM] block scand during recovery [1/6]
Don't let dlm_scand run during recovery since it may try to do a resource
directory removal while the directory nodes are changing.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:31 +01:00
David Teigland
7d3c1feb80 [DLM] fix mode munging
There are flags to enable two specialized features in the dlm:
1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by
   changing the granted mode of locks to NL.
2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR
   or CW to grant them if the normal requested mode can't be granted.

GFS direct i/o exercises both of these features, especially when mixed
with buffered i/o.  The dlm has problems with them.

The first problem is on the master node. If it demotes a lock as a part of
converting it, the actual step of converting the lock isn't being done
after the demotion, the lock is just left sitting on the granted queue
with a granted mode of NL.  I think the mistaken assumption was that the
call to grant_pending_locks() would grant it, but that function naturally
doesn't look at locks on the granted queue.

The second problem is on the process node.  If the master either demotes
or gives an altmode, the munging of the gr/rq modes is never done in the
process copy of the lock, leaving the master/process copies out of sync.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:36 +01:00
David Teigland
ce03f12b37 [DLM] change lkid format
A lock id is a uint32 and is used as an opaque reference to the lock.  For
userland apps, the lkid is passed up, through libdlm, as the return value
from a write() on the dlm device.  This created a problem when the high
bit was 1, making the lkid look like an error.  This is fixed by changing
how the lkid is composed.  The low 16 bits identified the hash bucket for
the lock and the high 16 bits were a per-bucket counter (which eventually
hit 0x8000 causing the problem).  These are simply swapped around; the
number of hash table buckets is far below 0x8000, making all lkid's
positive when viewed as signed.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:15 +01:00
David Teigland
8499137d4e [DLM] add orphan purging code (1/2)
Add code for purging orphan locks.  A process can also purge all of its
own non-orphan locks by passing a pid of zero.  Code already exists for
processes to create persistent locks that become orphans when the process
exits, but the complimentary capability for another process to then purge
these orphans has been missing.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:10 +01:00
David Teigland
7e4dac3359 [DLM] split create_message function
This splits the current create_message() function into two parts so that
later patches can call the new lower-level _create_message() function when
they don't have an rsb struct.  No functional change in this patch.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:07 +01:00
David Teigland
ef0c2bb05f [DLM] overlapping cancel and unlock
Full cancel and force-unlock support.  In the past, cancel and force-unlock
wouldn't work if there was another operation in progress on the lock.  Now,
both cancel and unlock-force can overlap an operation on a lock, meaning there
may be 2 or 3 operations in progress on a lock in parallel.  This support is
important not only because cancel and force-unlock are explicit operations
that an app can use, but both are used implicitly when a process exits while
holding locks.

Summary of changes:

- add-to and remove-from waiters functions were rewritten to handle situations
  with more than one remote operation outstanding on a lock

- validate_unlock_args detects when an overlapping cancel/unlock-force
  can be sent and when it needs to be delayed until a request/lookup
  reply is received

- processing request/lookup replies detects when cancel/unlock-force
  occured during the op, and carries out the delayed cancel/unlock-force

- manipulation of the "waiters" (remote operation) state of a lock moved under
  the standard rsb mutex that protects all the other lock state

- the two recovery routines related to locks on the waiters list changed
  according to the way lkb's are now locked before accessing waiters state

- waiters recovery detects when lkb's being recovered have overlapping
  cancel/unlock-force, and may not recover such locks

- revert_lock (cancel) returns a value to distinguish cases where it did
  nothing vs cases where it actually did a cancel; the cancel completion ast
  should only be done when cancel did something

- orphaned locks put on new list so they can be found later for purging

- cancel must be called on a lock when making it an orphan

- flag user locks (ENDOFLIFE) at the end of their useful life (to the
  application) so we can return an error for any further cancel/unlock-force

- we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
  either a completion or blocking ast

- clear an unread bast on a lock that's become unlocked

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:00 +01:00
David Teigland
62a0f62369 [DLM] zero new user lvbs
A new lvb for a userland lock wasn't being initialized to zero.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:24 -05:00
David Teigland
b790c3b7c3 [DLM] can miss clearing resend flag
A long, complicated sequence of events, beginning with the RESEND flag not
being cleared on an lkb, can result in an unlock never completing.

- lkb on waiters list for remote lookup
- the remote node is both the dir node and the master node, so
  it optimizes the lookup into a request and sends a request
  reply back
- the request reply is saved on the requestqueue to be processed
  after recovery
- recovery runs dlm_recover_waiters_pre() which sets RESEND flag
  so the lookup will be resent after recovery
- end of recovery: process_requestqueue takes saved request reply
  which removes the lkb off the waitesr list, _without_ clearing
  the RESEND flag
- end of recovery: dlm_recover_waiters_post() doesn't do anything
  with the now completed lookup lkb (would usually clear RESEND)
- later, the node unmounts, unlocks this lkb that still has RESEND
  flag set
- the lkb is on the waiters list again, now for unlock, when recovery
  occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND
  set, doesn't do anything since the master still exists
- end of recovery: dlm_recover_waiters_post() takes this lkb off
  the waiters list because it has the RESEND flag set, then reports
  an error because unlocks are never supposed to be handled in
  recover_waiters_post().
- later, the unlock reply is received, doesn't find the lkb on
  the waiters list because recover_waiters_post() has wrongly
  removed it.
- the unlock operation has been lost, and we're left with a
  stray granted lock
- unmount spins waiting for the unlock to complete

The visible evidence of this problem will be a node where gfs umount is
spinning, the dlm waiters list will be empty, and the dlm locks list will
show a granted lock.

The fix is simply to clear the RESEND flag when taking an lkb off the
waiters list.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:50 -05:00