mbox series

[0/2] nfsd: close potential race between open and delegation

Message ID 20220714200434.161818-1-jlayton@kernel.org (mailing list archive)
Headers show
Series nfsd: close potential race between open and delegation | expand

Message

Jeff Layton July 14, 2022, 8:04 p.m. UTC
This is a respin of the patchset that I sent earlier today. I hit a
deadlock with that one because of the ambiguous locking.

This series is based on top of Neil's set entitled:

    [PATCH 0/8] NFSD: clean up locking.

His patchset makes the locking in the nfsd4_open codepath much more
consistent, and this becomes a lot simpler to fix. Without that set
however, the state of the parent's i_rwsem is unclear after nfsd_lookup
is called, and I don't see a way to determine it reliably.

Jeff Layton (2):
  nfsd: drop fh argument from alloc_init_deleg
  nfsd: vet the opened dentry after setting a delegation

 fs/nfsd/nfs4state.c | 54 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 45 insertions(+), 9 deletions(-)

Comments

NeilBrown July 14, 2022, 11:59 p.m. UTC | #1
On Fri, 15 Jul 2022, Jeff Layton wrote:
> This is a respin of the patchset that I sent earlier today. I hit a
> deadlock with that one because of the ambiguous locking.
> 
> This series is based on top of Neil's set entitled:
> 
>     [PATCH 0/8] NFSD: clean up locking.
> 
> His patchset makes the locking in the nfsd4_open codepath much more
> consistent, and this becomes a lot simpler to fix. Without that set
> however, the state of the parent's i_rwsem is unclear after nfsd_lookup
> is called, and I don't see a way to determine it reliably.

I haven't examined these patch very closely, but a few initial thoughts
are:

1/ Before my series, you can unambiguously tell if i_rwsem is held by
   checking fhp->fh_locked.  In fact, just call "fh_lock()", and you can
   then be sure the fh is locked, whether or not it was locked before
however...
2/ Do we really need to lock the parent?  If a rename or unlink happens
   after the lease was taken, the lease will be broken. So
    take lease.
    repeat lookup (locklessly)
    Check if lease has been broken
   Should provide all you need.

   You don't *need* to lock the directory to open an existing file and
   with my pending parallel-updates patch set, you only need a shared
   lock on the directory to create a file.  So I'd rather not be locking
   the directory at all to get a delegation

3/ When you vet the name you only do a lookup_one_len(), while
   nfsd_lookup_dentry() also calls nfsd_cross_mnt() as it is possible
   for a file to be mounted on.
   That means that if I did bind mount one file over another and export
   over NFSD, the file will never offer a delegation.
   This is a minor point, but I think it would be best to be as correct
   and consistent as possible.

Thanks for working on this!

NeilBrown

> 
> Jeff Layton (2):
>   nfsd: drop fh argument from alloc_init_deleg
>   nfsd: vet the opened dentry after setting a delegation
> 
>  fs/nfsd/nfs4state.c | 54 +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 45 insertions(+), 9 deletions(-)
> 
> -- 
> 2.36.1
> 
>
Jeff Layton July 15, 2022, 11:32 a.m. UTC | #2
On Fri, 2022-07-15 at 09:59 +1000, NeilBrown wrote:
> On Fri, 15 Jul 2022, Jeff Layton wrote:
> > This is a respin of the patchset that I sent earlier today. I hit a
> > deadlock with that one because of the ambiguous locking.
> > 
> > This series is based on top of Neil's set entitled:
> > 
> >     [PATCH 0/8] NFSD: clean up locking.
> > 
> > His patchset makes the locking in the nfsd4_open codepath much more
> > consistent, and this becomes a lot simpler to fix. Without that set
> > however, the state of the parent's i_rwsem is unclear after nfsd_lookup
> > is called, and I don't see a way to determine it reliably.
> 
> I haven't examined these patch very closely, but a few initial thoughts
> are:
> 
> 1/ Before my series, you can unambiguously tell if i_rwsem is held by
>    checking fhp->fh_locked.  In fact, just call "fh_lock()", and you can
>    then be sure the fh is locked, whether or not it was locked before

Thanks, good to know. I wasn't sure how reliable that bool is. I guess
though that once you have a svc_fh, then you can more or less assume
that you have exclusive access to it for the life of the RPC being
processed.

> however...
> 2/ Do we really need to lock the parent?  If a rename or unlink happens
>    after the lease was taken, the lease will be broken. So
>     take lease.
>     repeat lookup (locklessly)
>     Check if lease has been broken
>    Should provide all you need.
> 
>    You don't *need* to lock the directory to open an existing file and
>    with my pending parallel-updates patch set, you only need a shared
>    lock on the directory to create a file.  So I'd rather not be locking
>    the directory at all to get a delegation
> 

Yeah, we probably don't need to lock the dir. That said, after your
patch series we already hold the i_rwsem on the parent at this point so
lookup_one_len is fine in this instance.

> 3/ When you vet the name you only do a lookup_one_len(), while
>    nfsd_lookup_dentry() also calls nfsd_cross_mnt() as it is possible
>    for a file to be mounted on.
>    That means that if I did bind mount one file over another and export
>    over NFSD, the file will never offer a delegation.
>    This is a minor point, but I think it would be best to be as correct
>    and consistent as possible.
> 

Agreed, but that will take a bit more work. nfsd_lookup_dentry takes
several parameters that we don't currently have access to in
nfs4_set_delegation (e.g. the rqstp). Those will need to be plumbed
through several functions.

> Thanks for working on this!

...and thank you for the locking cleanup! Getting rid of fh_lock/_unlock
is a really nice cleanup that makes it a lot more clear how this should
work.
NeilBrown July 18, 2022, 3:21 a.m. UTC | #3
On Fri, 15 Jul 2022, Jeff Layton wrote:
> On Fri, 2022-07-15 at 09:59 +1000, NeilBrown wrote:
> 
> > however...
> > 2/ Do we really need to lock the parent?  If a rename or unlink happens
> >    after the lease was taken, the lease will be broken. So
> >     take lease.
> >     repeat lookup (locklessly)
> >     Check if lease has been broken
> >    Should provide all you need.
> > 
> >    You don't *need* to lock the directory to open an existing file and
> >    with my pending parallel-updates patch set, you only need a shared
> >    lock on the directory to create a file.  So I'd rather not be locking
> >    the directory at all to get a delegation
> > 
> 
> Yeah, we probably don't need to lock the dir. That said, after your
> patch series we already hold the i_rwsem on the parent at this point so
> lookup_one_len is fine in this instance.

But the only reason we hold i_rwsem at this point is to prevent renames
in the "opened existing file" case.  The "created new file" case holds
it as well just be be consistent with the first case.

If we "vet" the dentry, then we don't need the lock any more.  We can
then simplify nfsd_lookup_dentry() to always assume the dir is not
locked - so the "locked" arg can go, and nfsd_lookup() can lose the
"lock" arg and always return with the directory unlocked.

I'm tempted to add your patch to the front of my series.  The
inconsistency in locking can be fix by unlocking the directory before we
get even close to handing out a delegation - so the delegation never
sees a locked directory.
But right now I have a cold and don't trust myself to think clearly
enough to create code worth posting.  Hopefully I'll be thinking more
clearly later in the week.

While I'm here ...  is "vet" a good word?  The meaning is appropriate,
but I wonder if it would cause our friends for whom English isn't their
first language to stumble.  There are about 5 uses in the kernel at
present.

Would validate or verify be better?  Even though they are longer..

Thanks,
NeilBrown
Jeff Layton July 18, 2022, 11:16 a.m. UTC | #4
On Mon, 2022-07-18 at 13:21 +1000, NeilBrown wrote:
> On Fri, 15 Jul 2022, Jeff Layton wrote:
> > On Fri, 2022-07-15 at 09:59 +1000, NeilBrown wrote:
> > 
> > > however...
> > > 2/ Do we really need to lock the parent?  If a rename or unlink happens
> > >    after the lease was taken, the lease will be broken. So
> > >     take lease.
> > >     repeat lookup (locklessly)
> > >     Check if lease has been broken
> > >    Should provide all you need.
> > > 
> > >    You don't *need* to lock the directory to open an existing file and
> > >    with my pending parallel-updates patch set, you only need a shared
> > >    lock on the directory to create a file.  So I'd rather not be locking
> > >    the directory at all to get a delegation
> > > 
> > 
> > Yeah, we probably don't need to lock the dir. That said, after your
> > patch series we already hold the i_rwsem on the parent at this point so
> > lookup_one_len is fine in this instance.
> 
> But the only reason we hold i_rwsem at this point is to prevent renames
> in the "opened existing file" case.  The "created new file" case holds
> it as well just be be consistent with the first case.
> 
> If we "vet" the dentry, then we don't need the lock any more.  We can
> then simplify nfsd_lookup_dentry() to always assume the dir is not
> locked - so the "locked" arg can go, and nfsd_lookup() can lose the
> "lock" arg and always return with the directory unlocked.
> 
> I'm tempted to add your patch to the front of my series.  The
> inconsistency in locking can be fix by unlocking the directory before we
> get even close to handing out a delegation - so the delegation never
> sees a locked directory.

Hmm, ok. I suppose we don't necessarily have to care whether the thing
is locked before calling into nfsd_lookup_dentry. I'll take another stab
at fixing this in the kernel w/o your series. That'll make Chuck happy
too.

> But right now I have a cold and don't trust myself to think clearly
> enough to create code worth posting.  Hopefully I'll be thinking more
> clearly later in the week.
> 
> While I'm here ...  is "vet" a good word?  The meaning is appropriate,
> but I wonder if it would cause our friends for whom English isn't their
> first language to stumble.  There are about 5 uses in the kernel at
> present.
> 
> Would validate or verify be better?  Even though they are longer..


Good point. I'm all for helping out non-native English speakers. I'll
plan to change it to something less esoteric.
Chuck Lever July 18, 2022, 2:31 p.m. UTC | #5
> On Jul 17, 2022, at 11:21 PM, NeilBrown <neilb@suse.de> wrote:
> 
> But right now I have a cold and don't trust myself to think clearly
> enough to create code worth posting.  Hopefully I'll be thinking more
> clearly later in the week.

Thanks for the update!

Here are my plans: I'd like to finalize content of the first 5.20
NFSD pull request this week. If these patches are not ready by
then, I can prepare a second PR later in the 5.20 merge window
with your work, which should give you another two weeks. If they
are still not ready, I'll get them in first thing for the next
merge window.


--
Chuck Lever
NeilBrown July 25, 2022, 4:46 a.m. UTC | #6
On Tue, 19 Jul 2022, Chuck Lever III wrote:
> 
> > On Jul 17, 2022, at 11:21 PM, NeilBrown <neilb@suse.de> wrote:
> > 
> > But right now I have a cold and don't trust myself to think clearly
> > enough to create code worth posting.  Hopefully I'll be thinking more
> > clearly later in the week.
> 
> Thanks for the update!
> 
> Here are my plans: I'd like to finalize content of the first 5.20
> NFSD pull request this week. If these patches are not ready by
> then, I can prepare a second PR later in the 5.20 merge window
> with your work, which should give you another two weeks. If they
> are still not ready, I'll get them in first thing for the next
> merge window.

FYI I've addressed the outstanding issues (I think), but have not yet done
any testing or given the patches a final inspection.  I hope to do that
tomorrow, and will post the patches once it is done.
If you want a preview you can find them on github.com/neilbrown/linux
in the "nfsd" branch.

Jeff's patches to validate delegations after getting the lease, so we
don't have to hold the lock so long, comes first.

NeilBrown