diff mbox

Orangefs ABI documentation

Message ID CAOg9mSQf4m-MM_kKd=UcGsCx_LZg=+nFESd8jhzDEbcM9DOGjg@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mike Marshall Jan. 22, 2016, 4:59 p.m. UTC
Hi Al...

I moved a tiny bit of your work around so it would compile, but I booted a
kernel from it, and ran some tests, it seems to work OK... doing this
work from home makes me remember writing cobol programs on a
silent 700... my co-workers are helping me look at
wait_for_cancellation_downcall... we recently made some
improvements there based on some problems we were
having in production with our out-of-tree Frankenstein
module... I'm glad you are also looking there.


# git diff

-Mike

On Fri, Jan 22, 2016 at 6:09 AM, Mike Marshall <hubcap@omnibond.com> wrote:
> Hi Al...
>
> Thanks for the review...
>
> I'll be trying to make some progress on your concerns about
> wait_for_cancellation_downcall today, but in case it looks
> like I dropped off the face of the earth for a few days - maybe
> I did - a several days long ice storm is dropping on us right
> now, everyone is working from home, power will likely go
> out for some...
>
> I'm cloning your repo from kernel.org across my slow
> dsl link right now...
>
> -Mike "I wrote all the debugfs code <g> "
>
> On Fri, Jan 22, 2016 at 2:11 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Fri, Jan 15, 2016 at 04:46:09PM -0500, Mike Marshall wrote:
>>> Al...
>>>
>>> We have been working on Orangefs ABI (protocol between
>>> userspace and kernel module) documentation, and improving
>>> the code along the way.
>>>
>>> We wish you would look at the documentation progress, and
>>> look at the way that .write_iter is implemented now... plus
>>> the other improvements. We know that every problem you
>>> pointed out hasn't been worked on yet, but your feedback
>>> would let us know if we're headed in the right direction.
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux.git
>>> for-next
>>>
>>> The improved (but not yet officially released) userspace
>>> function that does the writev to the Orangefs pseudo device is in:
>>>
>>> http://www.orangefs.org/svn/orangefs/branches/trunk.kernel.update/
>>> src/io/dev/pint-dev.c
>>>
>>> You may decide we don't deserve an ACK yet, but since we are in
>>> the merge window I hope you can look at it...
>>
>> write_iter got much better, but there are serious problems with locking.
>> For example, just what will happen if orangefs_devreq_read() picks an op
>> for copying to daemon, moves it to hash, unlocks everything and starts
>> copy_to_user() at the time when submitter of that op (sleeping
>> in wait_for_matching_downcall()) gets a signal and buggers off?  It calls
>> orangefs_clean_up_interrupted_operation(), removes the sucker from hash
>> and proceed to free the damn thing.  Right under ongoing copy_to_user().
>> Better yet, should that copy_to_user() fail, we proceed to move an already
>> freed op back to request list.  And no, get_op()/put_op() pair won't be
>> enough - submitters just go ahead an call op_release(), freeing the op,
>> refcount be damned.  They'd need to switch to put_op() for that to work.
>> Note that the same applies for existing use of get_op/put_op mechanism -
>> if that signal comes just as the orangefs_devreq_write_iter() has picked
>> the op from hash, we'll end up freeing the sucker right under copy_from_iter()
>> despite get_op().
>>
>> What's more, a failure of that copy_to_user() leaves us in a nasty situation -
>> how do we tell whether to put the request back to the list or just quietly
>> drop it?  Check the refcount?  But what if the submitter (believing, and
>> for a good reason, that it had removed the sucker from hash) hadn't gotten
>> around to dropping its reference?
>>
>> Another piece of fun assumes malicious daemon; sure, it's already a FUBAR
>> situation, but userland shouldn't be able to corrupt the kernel memory.
>> Look: daemon spawns a couple of threads.  One of those issues read() on
>> your character device, with just 16 bytes of destination mmapped and filled
>> with zeroes.  Another spins until it sees ->tag becoming non-zero and
>> immediately does writev() now that it has the tag value.  What we get is
>> submitter seeing op serviced and proceeding to free it while ->read() hits
>> the copy_to_user() failure and reinserts the already freed op into request
>> list.
>>
>> I'm not sure we have any business inserting the op to hash until we are
>> done with copy_to_user() and know we won't fail.  Would simplify the
>> failure recovery as well...
>>
>> I think we ought to flag the op when submitter decides to give up on it
>> and have the return-to-request-list logics check that (under op->lock),
>> returning it to the list only in case it's not flagged *and* decrementing
>> refcount before dropping op->lock in that case - we are guaranteed that
>> this is not the last reference.  In case it's flagged, we drop ->op_lock,
>> do put_op() and _not_ refile the sucker to request list.
>>
>> I'm not sure if this
>>                 /* NOTE: for I/O operations we handle releasing the op
>>                  * object except in the case of timeout.  the reason we
>>                  * can't free the op in timeout cases is that the op
>>                  * service logic in the vfs retries operations using
>>                  * the same op ptr, thus it can't be freed.
>>                  */
>>                 if (!timed_out)
>>                         op_release(op);
>> is safe, while we are at it...  I suspect we'd be better off if we did
>> put_op() there.  Unconditionally.  After the entire
>>         if (op->downcall.type == ORANGEFS_VFS_OP_FILE_IO)
>> thing, so that the total change of refcount is zero in all cases.  At least
>> that makes for regular lifetime rules (with s/op_release/put_op/ in submitters)
>>
>>         The thing I really don't understand is the situation with
>> wait_for_cancellation_downcall().  The comments in there flat-out
>> contradict the code - if called with signal already pending (as the
>> comment would seem to indicate), we might easily leave and free
>> the damn op without it ever being seen by daemon.  What's intended there?
>> And what happens if we reach schedule_timeout() in there (i.e. if we get
>> there without a signal pending) and operation completes successfully?
>> AFAICS, you'll get -ETIMEDOUT, not that the caller even looked at that
>> return value...  What's going on there?
>>
>>         Another unpleasantness is that your locking hierarchy leads to
>> to races in orangefs_clean_up_interrupted_operation(); you take op->lock,
>> look at the state, decide what needs to be locked to remove the sucker from,
>> drop op->lock, lock the relevant data structure and proceed to search op in
>> it, removing it in case of success.  Fun, but what happens if it gets refiled
>> as soon as you drop op->lock?
>>
>>         I suspect that we ought to make refiling conditional on the same
>> "submitter has given up on that one" flag *and* move hash lock outside of
>> op->lock.  IOW, once that flag had been set, only submitter can do anything
>> with op->list.  Then orangefs_clean_up_interrupted_operation() could simply
>> make sure that flag had been set, drop op->lock, grab the relevant spinlock
>> and do list_del().  And to hell with "search and remove if it's there"
>> complications.
>>
>>         One more problem is with your open_access_count; it should be
>> tristate, not boolean.  As it is, another open() of your character
>> device is possible as soon as ->release() has decremented open_access_count.
>> IOW, purge_inprogress_ops() and purge_waiting_ops() can happen when we
>> already have reopened the sucker.  It should have three states - "not opened",
>> "opened" and "shutting down", the last one both preventing open() and
>> being treated as "not in service".
>>
>>         Input validation got a lot saner (and documentation is quite welcome).
>> One place where I see a problem is listxattr -
>>                 if (total + new_op->downcall.resp.listxattr.lengths[i] > size)
>>                         goto done;
>> is *not* enough; you need to check that it's not going backwards (i.e. isn't
>> less than total).  total needs to be size_t, BTW - ssize_t is wrong here.
>>
>>         Another issue is that you are doing register_chrdev() too early -
>> it should be just before register_filesystem(); definitely without any
>> failure exits in between.
>>
>>         Debugfs stuff is FUBAR - it's sufficiently isolated, so I hadn't
>> spent much time on it, but for example debugfs_create_file() in
>> orangefs_debugfs_init() fails, we'll call orangefs_debugfs_cleanup()
>> right there.  On module exit it will be called again, with
>> debugfs_remove_recursive(debug_dir) called *again*.  At the very least it
>> should clear debug_dir in orangefs_debugfs_cleanup().  Worse,
>> orangefs_kernel_debug_init() will try to create an object in freed
>> directory...
>>
>>         I've dumped some of the massage into vfs.git#orangefs-untested, but
>> the locking and lifetime changes are not there yet; I'll add more of that
>> tomorrow.  I would really appreciate details on wait_for_cancellation_downcall;
>> that's the worst gap I have in understanding that code.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Al Viro Jan. 22, 2016, 5:08 p.m. UTC | #1
On Fri, Jan 22, 2016 at 11:59:40AM -0500, Mike Marshall wrote:
> Hi Al...
> 
> I moved a tiny bit of your work around so it would compile, but I booted a
> kernel from it, and ran some tests, it seems to work OK... doing this
> work from home makes me remember writing cobol programs on a
> silent 700... my co-workers are helping me look at
> wait_for_cancellation_downcall... we recently made some
> improvements there based on some problems we were
> having in production with our out-of-tree Frankenstein
> module... I'm glad you are also looking there.

BTW, what should happen to requests that got a buggered response in
orangefs_devreq_write_iter()?  As it is, you just free them and to hell
with whatever might've been waiting on them; that's easy to fix, but
what about the submitter?  Should we let it time out/simulate an error
properly returned by daemon/something else?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall Jan. 22, 2016, 5:40 p.m. UTC | #2
I think I see what you are saying... it seems like we have always
just freed them and to heck with whatever was waiting on them.

Maybe in all the places I detect something bad, set the return code
and goto out, perhaps  I should also jamb the bad return code
into op->downcall.status and goto wakeup intead of out?

Our infant fuzzer isn't really up to making test errors show up
there, I'd probably have to put some test weirdo stuff in
the userspace part that does the writev to make some test
failures...

-Mike

On Fri, Jan 22, 2016 at 12:08 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Jan 22, 2016 at 11:59:40AM -0500, Mike Marshall wrote:
>> Hi Al...
>>
>> I moved a tiny bit of your work around so it would compile, but I booted a
>> kernel from it, and ran some tests, it seems to work OK... doing this
>> work from home makes me remember writing cobol programs on a
>> silent 700... my co-workers are helping me look at
>> wait_for_cancellation_downcall... we recently made some
>> improvements there based on some problems we were
>> having in production with our out-of-tree Frankenstein
>> module... I'm glad you are also looking there.
>
> BTW, what should happen to requests that got a buggered response in
> orangefs_devreq_write_iter()?  As it is, you just free them and to hell
> with whatever might've been waiting on them; that's easy to fix, but
> what about the submitter?  Should we let it time out/simulate an error
> properly returned by daemon/something else?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Jan. 22, 2016, 5:43 p.m. UTC | #3
On Fri, Jan 22, 2016 at 05:08:38PM +0000, Al Viro wrote:

> BTW, what should happen to requests that got a buggered response in
> orangefs_devreq_write_iter()?  As it is, you just free them and to hell
> with whatever might've been waiting on them; that's easy to fix, but
> what about the submitter?  Should we let it time out/simulate an error
> properly returned by daemon/something else?

FWIW, here's what I have in mind for lifetime rules:
	1) op_alloc() creates them with refcount 1
	2) when we decide to pass the sucker to daemon, we bump the refcount
before taking the thing out of request list.  After successful copying
to userland, we check if the submitter has given up on it.  If it hasn't,
we move the sucker to hash and decrement the refcount (it can't be the
last reference there).  If it has, we do put_op().  After _failed_ copying
to userland, we also check if it's been given up.  If it hasn't, we move
it back to request list and decrement the refcount.  If it has, we do
put_op().
	3) when we get a response from daemon, we bump the refcount before
taking the thing out of hash.  Once we are done, we call put_op() - in *all*
cases.  Malformed response is treated like a well-formed error.
	4) if submitter decides to give up upon a request, it sets a flag in
->op_state.  From that point on nobody else is going to touch op->list.
	5) once submitter is done, it does put_op() - *NOT* op_release().
Normally it ends up with request freed, but if we'd raced with interaction
with daemon we'll get it freed by put_op() done in (2) or (3).  No
        /*
         * tell the device file owner waiting on I/O that this read has
         * completed and it can return now.  in this exact case, on
         * wakeup the daemon will free the op, so we *cannot* touch it
         * after this.
         */
        wake_up_daemon_for_return(new_op);
        new_op = NULL;
anymore - just wake it up and do put_op() regardless.  In that case daemon
has already done get_op() and will free the sucker once it does its put_op().

Objections, comments?

BTW, it really looks like we might be better off without atomic_t here -
op->lock is held for all non-trivial transitions and we might as well have
put_op() require op->lock held and do something like
	if (--op->ref_count) {
		spin_unlock(op->lock);
		return;
	}
	spin_unlock(op->lock);
	op_release(op);

Speaking of op->lock:
                        /*
                         * let process sleep for a few seconds so shared
                         * memory system can be initialized.
                         */
                        spin_lock_irqsave(&op->lock, irqflags);
                        prepare_to_wait(&orangefs_bufmap_init_waitq,
                                        &wait_entry,
                                        TASK_INTERRUPTIBLE);
                        spin_unlock_irqrestore(&op->lock, irqflags);
What do you need to block interrupts for?

Oh, and why do you call orangefs_op_initialize(orangefs_op) from op_release(),
when you immediately follow it with freeing orangefs_op and do it on
allocation anyway?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall Jan. 22, 2016, 6:17 p.m. UTC | #4
>> Objections, comments?

I have no objections so far to your suggestions, I'm trying to keep
my co-workers looking in on this discussion... most other days
we're all lined up in adjacent cubes...

Your attention and fixes will almost certainly all be improvements...
I found a few fixes you made to the old
version of the writev code that were off base, but that
code made it look like impossible situations needed
to be handled...

I'll compile your repo again and run the quick tests
when you think your additions are stable...

-Mike

On Fri, Jan 22, 2016 at 12:43 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Jan 22, 2016 at 05:08:38PM +0000, Al Viro wrote:
>
>> BTW, what should happen to requests that got a buggered response in
>> orangefs_devreq_write_iter()?  As it is, you just free them and to hell
>> with whatever might've been waiting on them; that's easy to fix, but
>> what about the submitter?  Should we let it time out/simulate an error
>> properly returned by daemon/something else?
>
> FWIW, here's what I have in mind for lifetime rules:
>         1) op_alloc() creates them with refcount 1
>         2) when we decide to pass the sucker to daemon, we bump the refcount
> before taking the thing out of request list.  After successful copying
> to userland, we check if the submitter has given up on it.  If it hasn't,
> we move the sucker to hash and decrement the refcount (it can't be the
> last reference there).  If it has, we do put_op().  After _failed_ copying
> to userland, we also check if it's been given up.  If it hasn't, we move
> it back to request list and decrement the refcount.  If it has, we do
> put_op().
>         3) when we get a response from daemon, we bump the refcount before
> taking the thing out of hash.  Once we are done, we call put_op() - in *all*
> cases.  Malformed response is treated like a well-formed error.
>         4) if submitter decides to give up upon a request, it sets a flag in
> ->op_state.  From that point on nobody else is going to touch op->list.
>         5) once submitter is done, it does put_op() - *NOT* op_release().
> Normally it ends up with request freed, but if we'd raced with interaction
> with daemon we'll get it freed by put_op() done in (2) or (3).  No
>         /*
>          * tell the device file owner waiting on I/O that this read has
>          * completed and it can return now.  in this exact case, on
>          * wakeup the daemon will free the op, so we *cannot* touch it
>          * after this.
>          */
>         wake_up_daemon_for_return(new_op);
>         new_op = NULL;
> anymore - just wake it up and do put_op() regardless.  In that case daemon
> has already done get_op() and will free the sucker once it does its put_op().
>
> Objections, comments?
>
> BTW, it really looks like we might be better off without atomic_t here -
> op->lock is held for all non-trivial transitions and we might as well have
> put_op() require op->lock held and do something like
>         if (--op->ref_count) {
>                 spin_unlock(op->lock);
>                 return;
>         }
>         spin_unlock(op->lock);
>         op_release(op);
>
> Speaking of op->lock:
>                         /*
>                          * let process sleep for a few seconds so shared
>                          * memory system can be initialized.
>                          */
>                         spin_lock_irqsave(&op->lock, irqflags);
>                         prepare_to_wait(&orangefs_bufmap_init_waitq,
>                                         &wait_entry,
>                                         TASK_INTERRUPTIBLE);
>                         spin_unlock_irqrestore(&op->lock, irqflags);
> What do you need to block interrupts for?
>
> Oh, and why do you call orangefs_op_initialize(orangefs_op) from op_release(),
> when you immediately follow it with freeing orangefs_op and do it on
> allocation anyway?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Jan. 22, 2016, 6:37 p.m. UTC | #5
On Fri, Jan 22, 2016 at 01:17:40PM -0500, Mike Marshall wrote:
> >> Objections, comments?
> 
> I have no objections so far to your suggestions, I'm trying to keep
> my co-workers looking in on this discussion... most other days
> we're all lined up in adjacent cubes...

email and IRC exist for purpose...  BTW, another thing: if you've managed
to get to orangefs_exit() with anything still in request list or hash,
you are seriously fscked.  Whatever had allocated them is presumably still
around and it's running in that module's .text...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall Jan. 22, 2016, 7:07 p.m. UTC | #6
Martin's the only other one subscribed to fs-devel, the
rest are looking in here today:

http://marc.info/?l=linux-fsdevel&w=2&r=1&s=orangefs&q=b

Walt thinks you deserve an Orangefs Contributor's Jacket...

-Mike

On Fri, Jan 22, 2016 at 1:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Jan 22, 2016 at 01:17:40PM -0500, Mike Marshall wrote:
>> >> Objections, comments?
>>
>> I have no objections so far to your suggestions, I'm trying to keep
>> my co-workers looking in on this discussion... most other days
>> we're all lined up in adjacent cubes...
>
> email and IRC exist for purpose...  BTW, another thing: if you've managed
> to get to orangefs_exit() with anything still in request list or hash,
> you are seriously fscked.  Whatever had allocated them is presumably still
> around and it's running in that module's .text...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall Jan. 22, 2016, 7:21 p.m. UTC | #7
Al...

I think that my commit f987f4c28
"don't trigger copy_attributes_to_inode from d_revalidate."
doesn't really do the job... I think I at least need to
write code that checks some of the attributes and
fails the revalidate if they got changed via some
external process... does that sound right?

-Mike

On Fri, Jan 22, 2016 at 2:07 PM, Mike Marshall <hubcap@omnibond.com> wrote:
> Martin's the only other one subscribed to fs-devel, the
> rest are looking in here today:
>
> http://marc.info/?l=linux-fsdevel&w=2&r=1&s=orangefs&q=b
>
> Walt thinks you deserve an Orangefs Contributor's Jacket...
>
> -Mike
>
> On Fri, Jan 22, 2016 at 1:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Fri, Jan 22, 2016 at 01:17:40PM -0500, Mike Marshall wrote:
>>> >> Objections, comments?
>>>
>>> I have no objections so far to your suggestions, I'm trying to keep
>>> my co-workers looking in on this discussion... most other days
>>> we're all lined up in adjacent cubes...
>>
>> email and IRC exist for purpose...  BTW, another thing: if you've managed
>> to get to orangefs_exit() with anything still in request list or hash,
>> you are seriously fscked.  Whatever had allocated them is presumably still
>> around and it's running in that module's .text...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Jan. 22, 2016, 7:50 p.m. UTC | #8
On Fri, Jan 22, 2016 at 01:17:40PM -0500, Mike Marshall wrote:
> >> Objections, comments?
> 
> I have no objections so far to your suggestions, I'm trying to keep
> my co-workers looking in on this discussion... most other days
> we're all lined up in adjacent cubes...

FWIW, I do see a problem there.  This "submitter has given up" flag would
need to be cleared on retries ;-/

When can we end up retrying?  We'd need
	* wait_for_cancellation_downcall/wait_for_matching_downcall finished
	* op_state_serviced(op) false
	* op->downcall.status set to -EAGAIN
wait_for_matching_downcall() either returns with op_state_serviced() or
returns one -ETIMEDOUT, -EINTR, -EAGAIN or -EIO, with its return value
ending up in op->downcall.status.  So if it was wait_for_matching_downcall(),
we'd have to have returned -EAGAIN.  Which can happen only with
op_state_purged(op), i.e. with character device having been closed
after the sucker had been placed on the lists...

wait_for_cancellation_downcall() similar, but it never returns -EAGAIN.
IOW, for it retries are irrelevant.

Damnit, what's the point of picking a purged request from the request
list in orangefs_devreq_read()?  Ditto for orangefs_devreq_write_iter()
and request hash...

Note that purge *can't* happen during either of those - ->release() can't
be called while something is in ->read() or ->write_iter().  So if we would
ignore purged requests in those, the problem with retries clearing the
"I gave up" flag wouldn't exist...  It's a race anyway - submitter had been
woken up when we'd been marking the request as purged, and it's going to
remove the sucker from whatever list it's on very soon after getting woken
up.

Do you see any problems with skipping such request list/hash elements in
orangefs_devreq_read()/orangefs_devreq_write_iter() resp.?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Jan. 22, 2016, 7:54 p.m. UTC | #9
On Fri, Jan 22, 2016 at 02:07:23PM -0500, Mike Marshall wrote:
> Martin's the only other one subscribed to fs-devel, the
> rest are looking in here today:
> 
> http://marc.info/?l=linux-fsdevel&w=2&r=1&s=orangefs&q=b

Umm...  Why not Cc to your maillist (pvfs2-developers@beowulf-underground.org?)
if it's unmoderated?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Jan. 22, 2016, 8:04 p.m. UTC | #10
On Fri, Jan 22, 2016 at 02:21:44PM -0500, Mike Marshall wrote:
> Al...
> 
> I think that my commit f987f4c28
> "don't trigger copy_attributes_to_inode from d_revalidate."
> doesn't really do the job... I think I at least need to
> write code that checks some of the attributes and
> fails the revalidate if they got changed via some
> external process... does that sound right?

Won't your orangefs_revalidate_lookup() spot the changed handle?  Anyway,
some sanity checks in there might be a good idea - at least "has the
object type somehow changed without handle going stale?"...

Said that, what should be picking e.g. chmod/chown by external source?
You don't have ->permission() instances in there, so inode->i_mode is
used directly by generic_inode_permission()...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall Jan. 22, 2016, 8:30 p.m. UTC | #11
The userspace daemon (client-core) that reads/writes to the device
restarts automatically if it stops for some reason... I believe active
ops are marked "purged" when this happens, and when client-core
restarts "purged" ops are retried (once)... see the comment
in waitqueue.c "if the operation was purged in the meantime..."

I've tried to rattle Walt and Becky's chains to see if they
can describe it better...

-Mike

On Fri, Jan 22, 2016 at 3:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Jan 22, 2016 at 02:21:44PM -0500, Mike Marshall wrote:
>> Al...
>>
>> I think that my commit f987f4c28
>> "don't trigger copy_attributes_to_inode from d_revalidate."
>> doesn't really do the job... I think I at least need to
>> write code that checks some of the attributes and
>> fails the revalidate if they got changed via some
>> external process... does that sound right?
>
> Won't your orangefs_revalidate_lookup() spot the changed handle?  Anyway,
> some sanity checks in there might be a good idea - at least "has the
> object type somehow changed without handle going stale?"...
>
> Said that, what should be picking e.g. chmod/chown by external source?
> You don't have ->permission() instances in there, so inode->i_mode is
> used directly by generic_inode_permission()...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall Jan. 22, 2016, 8:51 p.m. UTC | #12
Nope... I guess that's why the original developer called
copy_attributes_to_inode
every time... I need to at least check permissions... if a file's
permissions change
via an external source, say from 755 to 700, the other guy can still cat
the file once...

-Mike

On Fri, Jan 22, 2016 at 3:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Jan 22, 2016 at 02:21:44PM -0500, Mike Marshall wrote:
>> Al...
>>
>> I think that my commit f987f4c28
>> "don't trigger copy_attributes_to_inode from d_revalidate."
>> doesn't really do the job... I think I at least need to
>> write code that checks some of the attributes and
>> fails the revalidate if they got changed via some
>> external process... does that sound right?
>
> Won't your orangefs_revalidate_lookup() spot the changed handle?  Anyway,
> some sanity checks in there might be a good idea - at least "has the
> object type somehow changed without handle going stale?"...
>
> Said that, what should be picking e.g. chmod/chown by external source?
> You don't have ->permission() instances in there, so inode->i_mode is
> used directly by generic_inode_permission()...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall Jan. 22, 2016, 11:53 p.m. UTC | #13
I built and tested orangefs-untested ...

Orangefs still seems to work OK... I built it as a module
and didn't trigger the BUG_ON when I unloaded it.

-Mike

On Fri, Jan 22, 2016 at 3:51 PM, Mike Marshall <hubcap@omnibond.com> wrote:
> Nope... I guess that's why the original developer called
> copy_attributes_to_inode
> every time... I need to at least check permissions... if a file's
> permissions change
> via an external source, say from 755 to 700, the other guy can still cat
> the file once...
>
> -Mike
>
> On Fri, Jan 22, 2016 at 3:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Fri, Jan 22, 2016 at 02:21:44PM -0500, Mike Marshall wrote:
>>> Al...
>>>
>>> I think that my commit f987f4c28
>>> "don't trigger copy_attributes_to_inode from d_revalidate."
>>> doesn't really do the job... I think I at least need to
>>> write code that checks some of the attributes and
>>> fails the revalidate if they got changed via some
>>> external process... does that sound right?
>>
>> Won't your orangefs_revalidate_lookup() spot the changed handle?  Anyway,
>> some sanity checks in there might be a good idea - at least "has the
>> object type somehow changed without handle going stale?"...
>>
>> Said that, what should be picking e.g. chmod/chown by external source?
>> You don't have ->permission() instances in there, so inode->i_mode is
>> used directly by generic_inode_permission()...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Jan. 23, 2016, 12:12 a.m. UTC | #14
On Fri, Jan 22, 2016 at 03:30:02PM -0500, Mike Marshall wrote:
> The userspace daemon (client-core) that reads/writes to the device
> restarts automatically if it stops for some reason... I believe active
> ops are marked "purged" when this happens, and when client-core
> restarts "purged" ops are retried (once)... see the comment
> in waitqueue.c "if the operation was purged in the meantime..."
> 
> I've tried to rattle Walt and Becky's chains to see if they
> can describe it better...

What I mean is the following sequence:

Syscall: puts op into request list, sleeps in wait_for_matching_downcall()
Daemon: exits, markes purged, wakes Syscall up
Daemon gets restarted
Daemon calls read(), finds op still on the list
Syscall: finally gets the timeslice, removes op from the list, decides to
	resubmit

This is very hard to hit - normally by the time we get around to read()
from restarted daemon the waiter had already been woken up and
already removed the purged op from the list.  So in practice you probably
had never hit that case.  However, it is theoretically possible.

What I propose to do is to have purged requests that are still in the lists
to be skipped by orangefs_devreq_read() and orangefs_devreq_remove_op().
IOW, pretend that the race had been won by whatever had been waiting on
that request and got woken up when it had been purged.

Note that by the time it gets resubmitted, it already has the 'purged' flag
removed - set_op_state_waiting(op) is done when we are inserting into
request list and it leaves no trace of OP_VFS_STATE_PURGED.  So I'm not
talking about the resubmitted stuff; just the one that had been in queue
since before the daemon restart and hadn't been removed from there yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Jan. 23, 2016, 1:28 a.m. UTC | #15
On Sat, Jan 23, 2016 at 12:12:02AM +0000, Al Viro wrote:
> On Fri, Jan 22, 2016 at 03:30:02PM -0500, Mike Marshall wrote:
> > The userspace daemon (client-core) that reads/writes to the device
> > restarts automatically if it stops for some reason... I believe active
> > ops are marked "purged" when this happens, and when client-core
> > restarts "purged" ops are retried (once)... see the comment
> > in waitqueue.c "if the operation was purged in the meantime..."
> > 
> > I've tried to rattle Walt and Becky's chains to see if they
> > can describe it better...
> 
> What I mean is the following sequence:
> 
> Syscall: puts op into request list, sleeps in wait_for_matching_downcall()
> Daemon: exits, markes purged, wakes Syscall up
> Daemon gets restarted
> Daemon calls read(), finds op still on the list
> Syscall: finally gets the timeslice, removes op from the list, decides to
> 	resubmit
> 
> This is very hard to hit - normally by the time we get around to read()
> from restarted daemon the waiter had already been woken up and
> already removed the purged op from the list.  So in practice you probably
> had never hit that case.  However, it is theoretically possible.
> 
> What I propose to do is to have purged requests that are still in the lists
> to be skipped by orangefs_devreq_read() and orangefs_devreq_remove_op().
> IOW, pretend that the race had been won by whatever had been waiting on
> that request and got woken up when it had been purged.
> 
> Note that by the time it gets resubmitted, it already has the 'purged' flag
> removed - set_op_state_waiting(op) is done when we are inserting into
> request list and it leaves no trace of OP_VFS_STATE_PURGED.  So I'm not
> talking about the resubmitted stuff; just the one that had been in queue
> since before the daemon restart and hadn't been removed from there yet.

OK, aforementioned locking/refcounting scheme implemented (and completely
untested).  See #orangefs-untested.

Rules:

*  refcounting is real refcounting - objects are created with refcount 1,
get_op() increments refcount, op_release() decrements and frees when zero.

* daemon interaction (read/write_iter) is refcount-neutral - it grabs
a reference when picking request from list/hash and always drops it in
the end.  Submitters are always releasing the reference acquired when
allocating request.

* when submitter decides to give up upon request (for any reason - timeout,
signal, daemon disconnect) it marks it with new bit - OP_VFS_STATE_GIVEN_UP.
Once that is done, nobody else is allowed to touch its ->list.

* request is inserted into hash only when we'd succeeded copying to daemon's
memory (and only if request hadn't been given up while we'd been copying it).
If copying fails, we only have to refile to the request list (and only if
it hadn't been given up).

* when copying a response from daemon, request is only marked serviced if it
hadn't been given up while we were parsing the response.  Malformed responses,
failed copying of response, etc. are treated as well-formed error response
would be.  Error value is the one we'll be returning to daemon - might or
might not be right from the error recovery standpoint.

* hash lock nests *outside* of op->lock now and all work with the hash is
protected by it.

* daemon interaction skips the requests that had been around since before
the control device had been opened, acting as if the (already woken up)
submitters had already gotten around to removing those from the list/hash.
That's the normal outcome of such race anyway, and it simplifies the analysis.

Again, it's completely untested; might oops, break stuff, etc.  I _think_
it makes sense from the lifetime rules and locking POV, but...

	I still would like to understand what's going on in
wait_for_cancellation_request() - it *probably* shouldn't matter for the
correctness of that massage, but there might be dragons.  I don't understand
in which situation wrt pending signals it is supposed to be called and the
comments in there contradict the actual code.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall Jan. 23, 2016, 2:54 a.m. UTC | #16
Well... that all seems awesome, and compiled the first
time and all my quick tests on my dinky vm make
it seem fine... It is Becky that recently spent a
bunch of time fighting the cancellation dragons,
I'll see if I can't get her to weigh in on
wait_for_cancellation_downcall tomorrow.

We have some gnarly tests we were running on
real hardware that helped reproduce the problems
she was seeing in production with Clemson's
Palmetto Cluster, I'll run them, but maybe not
until Monday with the ice storm...

Thanks Al...

-Mike

On Fri, Jan 22, 2016 at 8:28 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sat, Jan 23, 2016 at 12:12:02AM +0000, Al Viro wrote:
>> On Fri, Jan 22, 2016 at 03:30:02PM -0500, Mike Marshall wrote:
>> > The userspace daemon (client-core) that reads/writes to the device
>> > restarts automatically if it stops for some reason... I believe active
>> > ops are marked "purged" when this happens, and when client-core
>> > restarts "purged" ops are retried (once)... see the comment
>> > in waitqueue.c "if the operation was purged in the meantime..."
>> >
>> > I've tried to rattle Walt and Becky's chains to see if they
>> > can describe it better...
>>
>> What I mean is the following sequence:
>>
>> Syscall: puts op into request list, sleeps in wait_for_matching_downcall()
>> Daemon: exits, markes purged, wakes Syscall up
>> Daemon gets restarted
>> Daemon calls read(), finds op still on the list
>> Syscall: finally gets the timeslice, removes op from the list, decides to
>>       resubmit
>>
>> This is very hard to hit - normally by the time we get around to read()
>> from restarted daemon the waiter had already been woken up and
>> already removed the purged op from the list.  So in practice you probably
>> had never hit that case.  However, it is theoretically possible.
>>
>> What I propose to do is to have purged requests that are still in the lists
>> to be skipped by orangefs_devreq_read() and orangefs_devreq_remove_op().
>> IOW, pretend that the race had been won by whatever had been waiting on
>> that request and got woken up when it had been purged.
>>
>> Note that by the time it gets resubmitted, it already has the 'purged' flag
>> removed - set_op_state_waiting(op) is done when we are inserting into
>> request list and it leaves no trace of OP_VFS_STATE_PURGED.  So I'm not
>> talking about the resubmitted stuff; just the one that had been in queue
>> since before the daemon restart and hadn't been removed from there yet.
>
> OK, aforementioned locking/refcounting scheme implemented (and completely
> untested).  See #orangefs-untested.
>
> Rules:
>
> *  refcounting is real refcounting - objects are created with refcount 1,
> get_op() increments refcount, op_release() decrements and frees when zero.
>
> * daemon interaction (read/write_iter) is refcount-neutral - it grabs
> a reference when picking request from list/hash and always drops it in
> the end.  Submitters are always releasing the reference acquired when
> allocating request.
>
> * when submitter decides to give up upon request (for any reason - timeout,
> signal, daemon disconnect) it marks it with new bit - OP_VFS_STATE_GIVEN_UP.
> Once that is done, nobody else is allowed to touch its ->list.
>
> * request is inserted into hash only when we'd succeeded copying to daemon's
> memory (and only if request hadn't been given up while we'd been copying it).
> If copying fails, we only have to refile to the request list (and only if
> it hadn't been given up).
>
> * when copying a response from daemon, request is only marked serviced if it
> hadn't been given up while we were parsing the response.  Malformed responses,
> failed copying of response, etc. are treated as well-formed error response
> would be.  Error value is the one we'll be returning to daemon - might or
> might not be right from the error recovery standpoint.
>
> * hash lock nests *outside* of op->lock now and all work with the hash is
> protected by it.
>
> * daemon interaction skips the requests that had been around since before
> the control device had been opened, acting as if the (already woken up)
> submitters had already gotten around to removing those from the list/hash.
> That's the normal outcome of such race anyway, and it simplifies the analysis.
>
> Again, it's completely untested; might oops, break stuff, etc.  I _think_
> it makes sense from the lifetime rules and locking POV, but...
>
>         I still would like to understand what's going on in
> wait_for_cancellation_request() - it *probably* shouldn't matter for the
> correctness of that massage, but there might be dragons.  I don't understand
> in which situation wrt pending signals it is supposed to be called and the
> comments in there contradict the actual code.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Jan. 23, 2016, 7:10 p.m. UTC | #17
On Fri, Jan 22, 2016 at 09:54:48PM -0500, Mike Marshall wrote:
> Well... that all seems awesome, and compiled the first
> time and all my quick tests on my dinky vm make
> it seem fine... It is Becky that recently spent a
> bunch of time fighting the cancellation dragons,
> I'll see if I can't get her to weigh in on
> wait_for_cancellation_downcall tomorrow.
> 
> We have some gnarly tests we were running on
> real hardware that helped reproduce the problems
> she was seeing in production with Clemson's
> Palmetto Cluster, I'll run them, but maybe not
> until Monday with the ice storm...

OK, several more pushed.  The most interesting part is probably switch
to real completions - you'd been open-coding them for no good reason
(and as always with reinventing locking primitives, asking for trouble).

New bits just as untested as the earlier ones, of course...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall Jan. 23, 2016, 7:24 p.m. UTC | #18
OK, I'll get them momentarily...

I merged your other patches, and there was a merge
conflict I had to work around... you're working from
an orangefs tree that lacks one commit I had made
last week... my linux-next tree has all your patches
through yesterday in it now...

I am setting up "the gnarly test" (at home from a VM,
though) that should cause a bunch of cancellations,
I want to see if I can get
wait_for_cancellation_downcall to ever
flow past that "if (signal_pending(current)) {"
block... if it does, that demonstrate where
the comments conflict with the code, right?

-Mike

On Sat, Jan 23, 2016 at 2:10 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Jan 22, 2016 at 09:54:48PM -0500, Mike Marshall wrote:
>> Well... that all seems awesome, and compiled the first
>> time and all my quick tests on my dinky vm make
>> it seem fine... It is Becky that recently spent a
>> bunch of time fighting the cancellation dragons,
>> I'll see if I can't get her to weigh in on
>> wait_for_cancellation_downcall tomorrow.
>>
>> We have some gnarly tests we were running on
>> real hardware that helped reproduce the problems
>> she was seeing in production with Clemson's
>> Palmetto Cluster, I'll run them, but maybe not
>> until Monday with the ice storm...
>
> OK, several more pushed.  The most interesting part is probably switch
> to real completions - you'd been open-coding them for no good reason
> (and as always with reinventing locking primitives, asking for trouble).
>
> New bits just as untested as the earlier ones, of course...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall Jan. 23, 2016, 9:35 p.m. UTC | #19
I compiled and tested the new patches,
they seem to work more than great, unless
it is just my imagination that the kernel
module is much faster now. I'll measure it
with more than seat-of-the-pants to see
for sure. The patches are pushed to
the for-next branch.

My "gnarly test" can get the code to flow
into wait_for_cancellation_downcall, but
never would flow past the
"if (signal_pending(current)) {" block,
though that doesn't prove anything...

I had to look at the wiki page for "cargo culting" <g>...
When Becky was working on the cancellation
problem I alluded to earlier, we talked about and
suspected the spin_lock_irqsaves in
service_operation were not appropriate...

Thanks again Al...

-Mike


On Sat, Jan 23, 2016 at 2:24 PM, Mike Marshall <hubcap@omnibond.com> wrote:
> OK, I'll get them momentarily...
>
> I merged your other patches, and there was a merge
> conflict I had to work around... you're working from
> an orangefs tree that lacks one commit I had made
> last week... my linux-next tree has all your patches
> through yesterday in it now...
>
> I am setting up "the gnarly test" (at home from a VM,
> though) that should cause a bunch of cancellations,
> I want to see if I can get
> wait_for_cancellation_downcall to ever
> flow past that "if (signal_pending(current)) {"
> block... if it does, that demonstrate where
> the comments conflict with the code, right?
>
> -Mike
>
> On Sat, Jan 23, 2016 at 2:10 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Fri, Jan 22, 2016 at 09:54:48PM -0500, Mike Marshall wrote:
>>> Well... that all seems awesome, and compiled the first
>>> time and all my quick tests on my dinky vm make
>>> it seem fine... It is Becky that recently spent a
>>> bunch of time fighting the cancellation dragons,
>>> I'll see if I can't get her to weigh in on
>>> wait_for_cancellation_downcall tomorrow.
>>>
>>> We have some gnarly tests we were running on
>>> real hardware that helped reproduce the problems
>>> she was seeing in production with Clemson's
>>> Palmetto Cluster, I'll run them, but maybe not
>>> until Monday with the ice storm...
>>
>> OK, several more pushed.  The most interesting part is probably switch
>> to real completions - you'd been open-coding them for no good reason
>> (and as always with reinventing locking primitives, asking for trouble).
>>
>> New bits just as untested as the earlier ones, of course...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Jan. 23, 2016, 10:05 p.m. UTC | #20
On Sat, Jan 23, 2016 at 04:35:32PM -0500, Mike Marshall wrote:
> I compiled and tested the new patches,
> they seem to work more than great, unless
> it is just my imagination that the kernel
> module is much faster now. I'll measure it
> with more than seat-of-the-pants to see
> for sure. The patches are pushed to
> the for-next branch.

As long as the speedup is not due to not actually doing the IO... ;-)

> My "gnarly test" can get the code to flow
> into wait_for_cancellation_downcall, but
> never would flow past the
> "if (signal_pending(current)) {" block,
> though that doesn't prove anything...

AFAICS, it is possible to get there without a signal - you need the first
attempt of file_read / file_write to fail on disconnect, be retried and
that retry to fail on timeout.  _Then_ you get to sending a cancel without
any signals ever being sent to you.  

Said that, I'm not sure that this "we don't wait at all if the signals
are pending" is right - consider e.g. a read getting killed after it had
been picked by daemon and before the daemon gets a chance to reply.
We submit a cancel, but unless daemon picks it immediately, we don't
even bother waiting - we see that cancel hadn't been serviced yet, that
a signal is pending and we remove the cancel from request list.  In that
scenario daemon doesn't get a chance to see the cancel request at all...

In which situations do we want cancel to be given to daemon and do we
need to wait for its response?

> I had to look at the wiki page for "cargo culting" <g>...
> When Becky was working on the cancellation
> problem I alluded to earlier, we talked about and
> suspected the spin_lock_irqsaves in
> service_operation were not appropriate...

It isn't - first of all, prepare_to_wait() does spin_lock_irqsave() itself
(on the queue lock), so interrupt-disabling bits are there anyway.  And
the exclusion on op->lock is completely pointless, obviously - you are waiting
for bufmap initialization, which has nothing to do with specific op.
Note that by that point op is not on any lists, so there's nobody who could
see it, let alone grab the same spinlock...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index b926fe77..88e606a 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -103,24 +103,6 @@  enum orangefs_vfs_op_states {
        OP_VFS_STATE_PURGED = 8,
 };

-#define set_op_state_waiting(op)     ((op)->op_state = OP_VFS_STATE_WAITING)
-#define set_op_state_inprogress(op)  ((op)->op_state = OP_VFS_STATE_INPROGR)
-static inline void set_op_state_serviced(struct orangefs_kernel_op_s *op)
-{
-       op->op_state = OP_VFS_STATE_SERVICED;
-       wake_up_interruptible(&op->waitq);
-}
-static inline void set_op_state_purged(struct orangefs_kernel_op_s *op)
-{
-       op->op_state |= OP_VFS_STATE_PURGED;
-       wake_up_interruptible(&op->waitq);
-}
-
-#define op_state_waiting(op)     ((op)->op_state & OP_VFS_STATE_WAITING)
-#define op_state_in_progress(op) ((op)->op_state & OP_VFS_STATE_INPROGR)
-#define op_state_serviced(op)    ((op)->op_state & OP_VFS_STATE_SERVICED)
-#define op_state_purged(op)      ((op)->op_state & OP_VFS_STATE_PURGED)
-
 #define get_op(op)                                     \
        do {                                            \
                atomic_inc(&(op)->ref_count);   \
@@ -259,6 +241,25 @@  struct orangefs_kernel_op_s {
        struct list_head list;
 };

+#define set_op_state_waiting(op)     ((op)->op_state = OP_VFS_STATE_WAITING)
+#define set_op_state_inprogress(op)  ((op)->op_state = OP_VFS_STATE_INPROGR)
+static inline void set_op_state_serviced(struct orangefs_kernel_op_s *op)
+{
+       op->op_state = OP_VFS_STATE_SERVICED;
+       wake_up_interruptible(&op->waitq);
+}
+static inline void set_op_state_purged(struct orangefs_kernel_op_s *op)
+{
+       op->op_state |= OP_VFS_STATE_PURGED;
+       wake_up_interruptible(&op->waitq);
+}
+
+#define op_state_waiting(op)     ((op)->op_state & OP_VFS_STATE_WAITING)
+#define op_state_in_progress(op) ((op)->op_state & OP_VFS_STATE_INPROGR)
+#define op_state_serviced(op)    ((op)->op_state & OP_VFS_STATE_SERVICED)
+#define op_state_purged(op)      ((op)->op_state & OP_VFS_STATE_PURGED)
+
+
 /* per inode private orangefs info */
 struct orangefs_inode_s {
        struct orangefs_object_kref refn;
diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
index 641de05..a257891 100644
--- a/fs/orangefs/waitqueue.c
+++ b/fs/orangefs/waitqueue.c
@@ -16,6 +16,9 @@ 
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"

+static int wait_for_cancellation_downcall(struct orangefs_kernel_op_s *);
+static int wait_for_matching_downcall(struct orangefs_kernel_op_s *);
+
 /*
  * What we do in this function is to walk the list of operations that are
  * present in the request queue and mark them as purged.