Message ID | 165708033167.1940.3364591321728458949.stgit@noble.brown (mailing list archive) |
---|---|
Headers | show |
Series | NFSD: clean up locking. | expand |
> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote: > > This series prepares NFSD to be able to adjust to work with a proposed > patch which allows updates to directories to happen in parallel. > This patch set changes the way directories are locked, so the present > series cleans up some locking in nfsd. > > Specifically we remove fh_lock() and fh_unlock(). > These functions are problematic for a few reasons. > - they are deliberately idempotent - setting or clearing a flag > so that a second call does nothing. This makes locking errors harder, > but it results in code that looks wrong ... and maybe sometimes is a > little bit wrong. > Early patches clean up several places where this idempotent nature of > the functions is depended on, and so makes the code clearer. > > - they transparently call fh_fill_pre/post_attrs(), including at times > when this is not necessary. Having the calls only when necessary is > marginally more efficient, and arguably makes the code clearer. > > nfsd_lookup() currently always locks the directory, though often no lock > is needed. So a patch in this series reduces this locking. > > There is an awkward case that could still be further improved. > NFSv4 open needs to ensure the file is not renamed (or unlinked etc) > between the moment when the open succeeds, and a later moment when a > "lease" is attached to support a delegation. The handling of this lock > is currently untidy, particularly when creating a file. > It would probably be better to take a lease immediately after > opening the file, and then discarding if after deciding not to provide a > delegation. > > I have run fstests and cthon tests on this, but I wouldn't be surprised > if there is a corner case that I've missed. Hi Neil, thanks for (re)posting. Let me make a few general comments here before I send out specific review nits. I'm concerned mostly with how this series can be adequately tested. The two particular areas I'm worried about: - There are some changes to NFSv2 code, which is effectively fallow. I think I can run v2 tests, once we decide what tests should be run. - More critically, ("NFSD: reduce locking in nfsd_lookup()") does some pretty heavy lifting. How should this change be tested? Secondarily, the series adds more bells and whistles to the generic NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular: - ("NFSD: change nfsd_create() to unlock directory before returning.") makes some big changes to nfsd_create(). But that helper itself is pretty small. Seems like cleaner code would result if NFSv4 had its own version of nfsd_create() to deal with the post-change cases. - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue: nfsd_lookup() is being changed such that its semantics are substantially different for NFSv4 than for others. This is possibly an indication that nfsd_lookup() should also be duplicated into the NFSv4-specific code and the generic VFS version should be left alone. I would prefer the code duplication approach in both these cases, unless you can convince me that is a bad idea. Finally, with regard to the awkward case you mention above. The NFSv4 OPEN code is a hairy mess, mostly because the protocol is a Swiss army knife and our implementation has had small fixes plastered onto it for many years. I won't be disappointed if you don't manage to address the rename/unlink/delegation race you mention above this time around. Just don't make it worse ;-) Meanwhile we should start accruing some thoughts and designs about how this code path needs to work. > NeilBrown > > > --- > > NeilBrown (8): > NFSD: drop rqstp arg to do_set_nfs4_acl() > NFSD: change nfsd_create() to unlock directory before returning. > NFSD: always drop directory lock in nfsd_unlink() > NFSD: only call fh_unlock() once in nfsd_link() > NFSD: reduce locking in nfsd_lookup() > NFSD: use explicit lock/unlock for directory ops > NFSD: use (un)lock_inode instead of fh_(un)lock for file operations > NFSD: discard fh_locked flag and fh_lock/fh_unlock > > > fs/nfsd/nfs2acl.c | 6 +- > fs/nfsd/nfs3acl.c | 4 +- > fs/nfsd/nfs3proc.c | 21 ++--- > fs/nfsd/nfs4acl.c | 19 ++--- > fs/nfsd/nfs4proc.c | 106 +++++++++++++++--------- > fs/nfsd/nfs4state.c | 8 +- > fs/nfsd/nfsfh.c | 3 +- > fs/nfsd/nfsfh.h | 56 +------------ > fs/nfsd/nfsproc.c | 14 ++-- > fs/nfsd/vfs.c | 193 ++++++++++++++++++++++++++------------------ > fs/nfsd/vfs.h | 19 +++-- > 11 files changed, 238 insertions(+), 211 deletions(-) > > -- > Signature > -- Chuck Lever
On Thu, 07 Jul 2022, Chuck Lever III wrote: > > > On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote: > > > > This series prepares NFSD to be able to adjust to work with a proposed > > patch which allows updates to directories to happen in parallel. > > This patch set changes the way directories are locked, so the present > > series cleans up some locking in nfsd. > > > > Specifically we remove fh_lock() and fh_unlock(). > > These functions are problematic for a few reasons. > > - they are deliberately idempotent - setting or clearing a flag > > so that a second call does nothing. This makes locking errors harder, > > but it results in code that looks wrong ... and maybe sometimes is a > > little bit wrong. > > Early patches clean up several places where this idempotent nature of > > the functions is depended on, and so makes the code clearer. > > > > - they transparently call fh_fill_pre/post_attrs(), including at times > > when this is not necessary. Having the calls only when necessary is > > marginally more efficient, and arguably makes the code clearer. > > > > nfsd_lookup() currently always locks the directory, though often no lock > > is needed. So a patch in this series reduces this locking. > > > > There is an awkward case that could still be further improved. > > NFSv4 open needs to ensure the file is not renamed (or unlinked etc) > > between the moment when the open succeeds, and a later moment when a > > "lease" is attached to support a delegation. The handling of this lock > > is currently untidy, particularly when creating a file. > > It would probably be better to take a lease immediately after > > opening the file, and then discarding if after deciding not to provide a > > delegation. > > > > I have run fstests and cthon tests on this, but I wouldn't be surprised > > if there is a corner case that I've missed. > > Hi Neil, thanks for (re)posting. > > Let me make a few general comments here before I send out specific > review nits. > > I'm concerned mostly with how this series can be adequately tested. > The two particular areas I'm worried about: > > - There are some changes to NFSv2 code, which is effectively > fallow. I think I can run v2 tests, once we decide what tests > should be run. I hope we can still test v2... I know it is disabled by default.. If we can't test it, we should consider removing it. > > - More critically, ("NFSD: reduce locking in nfsd_lookup()") does > some pretty heavy lifting. How should this change be tested? I don't see how there can be any answer other than "run all the tests we usually run". lockdep should report any locking strangeness. > > Secondarily, the series adds more bells and whistles to the generic > NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular: > > - ("NFSD: change nfsd_create() to unlock directory before returning.") > makes some big changes to nfsd_create(). But that helper itself > is pretty small. Seems like cleaner code would result if NFSv4 > had its own version of nfsd_create() to deal with the post-change > cases. I would not like that approach. Duplicating code is rarely a good idea. Maybe, rather than passing a function and void* to nfsd_create(), we could pass an acl and a label and do the setting in vfs.c rather then nfs4proc.c. The difficult part of that approach is getting back the individual error statuses. That should be solvable though. > > - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue: > nfsd_lookup() is being changed such that its semantics are > substantially different for NFSv4 than for others. This is > possibly an indication that nfsd_lookup() should also be > duplicated into the NFSv4-specific code and the generic VFS > version should be left alone. Again, I don't like duplication. In this case, I think the longer term solution is to remove the NFSv4 specific locking differences and solve the problem differently. i.e. don't hold the inode locked, but check for any possible rename after getting a lease. Once that is done, nfsd_lookup() can have saner semantics. > > I would prefer the code duplication approach in both these cases, > unless you can convince me that is a bad idea. When duplicating code results in substantial simplification in both copies, then it makes sense. Otherwise I think the default should be not to duplicate. Thanks, NeilBrown > > Finally, with regard to the awkward case you mention above. The > NFSv4 OPEN code is a hairy mess, mostly because the protocol is > a Swiss army knife and our implementation has had small fixes > plastered onto it for many years. I won't be disappointed if > you don't manage to address the rename/unlink/delegation race > you mention above this time around. Just don't make it worse ;-) > > Meanwhile we should start accruing some thoughts and designs > about how this code path needs to work. > > > > NeilBrown > > > > > > --- > > > > NeilBrown (8): > > NFSD: drop rqstp arg to do_set_nfs4_acl() > > NFSD: change nfsd_create() to unlock directory before returning. > > NFSD: always drop directory lock in nfsd_unlink() > > NFSD: only call fh_unlock() once in nfsd_link() > > NFSD: reduce locking in nfsd_lookup() > > NFSD: use explicit lock/unlock for directory ops > > NFSD: use (un)lock_inode instead of fh_(un)lock for file operations > > NFSD: discard fh_locked flag and fh_lock/fh_unlock > > > > > > fs/nfsd/nfs2acl.c | 6 +- > > fs/nfsd/nfs3acl.c | 4 +- > > fs/nfsd/nfs3proc.c | 21 ++--- > > fs/nfsd/nfs4acl.c | 19 ++--- > > fs/nfsd/nfs4proc.c | 106 +++++++++++++++--------- > > fs/nfsd/nfs4state.c | 8 +- > > fs/nfsd/nfsfh.c | 3 +- > > fs/nfsd/nfsfh.h | 56 +------------ > > fs/nfsd/nfsproc.c | 14 ++-- > > fs/nfsd/vfs.c | 193 ++++++++++++++++++++++++++------------------ > > fs/nfsd/vfs.h | 19 +++-- > > 11 files changed, 238 insertions(+), 211 deletions(-) > > > > -- > > Signature > > > > -- > Chuck Lever > > > >
> On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote: > > On Thu, 07 Jul 2022, Chuck Lever III wrote: >> >>> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote: >>> >>> This series prepares NFSD to be able to adjust to work with a proposed >>> patch which allows updates to directories to happen in parallel. >>> This patch set changes the way directories are locked, so the present >>> series cleans up some locking in nfsd. >>> >>> Specifically we remove fh_lock() and fh_unlock(). >>> These functions are problematic for a few reasons. >>> - they are deliberately idempotent - setting or clearing a flag >>> so that a second call does nothing. This makes locking errors harder, >>> but it results in code that looks wrong ... and maybe sometimes is a >>> little bit wrong. >>> Early patches clean up several places where this idempotent nature of >>> the functions is depended on, and so makes the code clearer. >>> >>> - they transparently call fh_fill_pre/post_attrs(), including at times >>> when this is not necessary. Having the calls only when necessary is >>> marginally more efficient, and arguably makes the code clearer. >>> >>> nfsd_lookup() currently always locks the directory, though often no lock >>> is needed. So a patch in this series reduces this locking. >>> >>> There is an awkward case that could still be further improved. >>> NFSv4 open needs to ensure the file is not renamed (or unlinked etc) >>> between the moment when the open succeeds, and a later moment when a >>> "lease" is attached to support a delegation. The handling of this lock >>> is currently untidy, particularly when creating a file. >>> It would probably be better to take a lease immediately after >>> opening the file, and then discarding if after deciding not to provide a >>> delegation. >>> >>> I have run fstests and cthon tests on this, but I wouldn't be surprised >>> if there is a corner case that I've missed. >> >> Hi Neil, thanks for (re)posting. >> >> Let me make a few general comments here before I send out specific >> review nits. >> >> I'm concerned mostly with how this series can be adequately tested. >> The two particular areas I'm worried about: >> >> - There are some changes to NFSv2 code, which is effectively >> fallow. I think I can run v2 tests, once we decide what tests >> should be run. > > I hope we can still test v2... I know it is disabled by default.. > If we can't test it, we should consider removing it. The work of deprecating and removing NFSv2 is already under way. I think what I'm asking is if /you/ have tested the NFSv2 changes. >> Secondarily, the series adds more bells and whistles to the generic >> NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular: >> >> - ("NFSD: change nfsd_create() to unlock directory before returning.") >> makes some big changes to nfsd_create(). But that helper itself >> is pretty small. Seems like cleaner code would result if NFSv4 >> had its own version of nfsd_create() to deal with the post-change >> cases. > > I would not like that approach. Duplicating code is rarely a good idea. De-duplicating code /can/ be a good idea, but isn't always a good idea. If the exceptional cases add a lot of logic, that can make the de-duplicated code difficult to read and reason about, and it can make it brittle, just as it does in this case. Modifications on behalf of NFSv4 in this common piece of code is possibly hazardous to NFSv3, and navigating around the exception logic makes it difficult to understand and review. IMO code duplication is not an appropriate design pattern in this specific instance. > Maybe, rather than passing a function and void* to nfsd_create(), we > could pass an acl and a label and do the setting in vfs.c rather then > nfs4proc.c. The difficult part of that approach is getting back the > individual error statuses. That should be solvable though. The bulk of the work in nfsd_create() is done by lookup_one_len() and nfsd_create_locked(), both of which are public APIs. The rest of nfsd_create() is code that appears in several other places already. >> - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue: >> nfsd_lookup() is being changed such that its semantics are >> substantially different for NFSv4 than for others. This is >> possibly an indication that nfsd_lookup() should also be >> duplicated into the NFSv4-specific code and the generic VFS >> version should be left alone. > > Again, I don't like duplication. In this case, I think the longer term > solution is to remove the NFSv4 specific locking differences and solve > the problem differently. i.e. don't hold the inode locked, but check > for any possible rename after getting a lease. Once that is done, > nfsd_lookup() can have saner semantics. Then, perhaps we should bite that bullet and do that work now. >> I would prefer the code duplication approach in both these cases, >> unless you can convince me that is a bad idea. > > When duplicating code results in substantial simplification in both > copies, then it makes sense. Otherwise I think the default should be > not to duplicate. I believe that duplicating in both cases above will result in simpler and less brittle code. That's why I asked for it. -- Chuck Lever
On Wed, 13 Jul 2022, Chuck Lever III wrote: > > > On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote: > > > > On Thu, 07 Jul 2022, Chuck Lever III wrote: > >> > >>> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote: > >>> > >>> This series prepares NFSD to be able to adjust to work with a proposed > >>> patch which allows updates to directories to happen in parallel. > >>> This patch set changes the way directories are locked, so the present > >>> series cleans up some locking in nfsd. > >>> > >>> Specifically we remove fh_lock() and fh_unlock(). > >>> These functions are problematic for a few reasons. > >>> - they are deliberately idempotent - setting or clearing a flag > >>> so that a second call does nothing. This makes locking errors harder, > >>> but it results in code that looks wrong ... and maybe sometimes is a > >>> little bit wrong. > >>> Early patches clean up several places where this idempotent nature of > >>> the functions is depended on, and so makes the code clearer. > >>> > >>> - they transparently call fh_fill_pre/post_attrs(), including at times > >>> when this is not necessary. Having the calls only when necessary is > >>> marginally more efficient, and arguably makes the code clearer. > >>> > >>> nfsd_lookup() currently always locks the directory, though often no lock > >>> is needed. So a patch in this series reduces this locking. > >>> > >>> There is an awkward case that could still be further improved. > >>> NFSv4 open needs to ensure the file is not renamed (or unlinked etc) > >>> between the moment when the open succeeds, and a later moment when a > >>> "lease" is attached to support a delegation. The handling of this lock > >>> is currently untidy, particularly when creating a file. > >>> It would probably be better to take a lease immediately after > >>> opening the file, and then discarding if after deciding not to provide a > >>> delegation. > >>> > >>> I have run fstests and cthon tests on this, but I wouldn't be surprised > >>> if there is a corner case that I've missed. > >> > >> Hi Neil, thanks for (re)posting. > >> > >> Let me make a few general comments here before I send out specific > >> review nits. > >> > >> I'm concerned mostly with how this series can be adequately tested. > >> The two particular areas I'm worried about: > >> > >> - There are some changes to NFSv2 code, which is effectively > >> fallow. I think I can run v2 tests, once we decide what tests > >> should be run. > > > > I hope we can still test v2... I know it is disabled by default.. > > If we can't test it, we should consider removing it. > > The work of deprecating and removing NFSv2 is already under way. > I think what I'm asking is if /you/ have tested the NFSv2 changes. That's a question I can answer. I haven't. I will. > > > >> Secondarily, the series adds more bells and whistles to the generic > >> NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular: > >> > >> - ("NFSD: change nfsd_create() to unlock directory before returning.") > >> makes some big changes to nfsd_create(). But that helper itself > >> is pretty small. Seems like cleaner code would result if NFSv4 > >> had its own version of nfsd_create() to deal with the post-change > >> cases. > > > > I would not like that approach. Duplicating code is rarely a good idea. > > De-duplicating code /can/ be a good idea, but isn't always a good > idea. If the exceptional cases add a lot of logic, that can make the > de-duplicated code difficult to read and reason about, and it can > make it brittle, just as it does in this case. Modifications on > behalf of NFSv4 in this common piece of code is possibly hazardous > to NFSv3, and navigating around the exception logic makes it > difficult to understand and review. Are we looking at the same code? The sum total of extra code needed for v4 is: - two extra parameters: void (*post_create)(struct svc_fh *fh, void *data), void *data) - two lines of code: if (!err && post_create) post_create(resfhp, data); does that really make anything hard to follow? > > IMO code duplication is not an appropriate design pattern in this > specific instance. I'm guessing there is a missing negation in there. > > > > Maybe, rather than passing a function and void* to nfsd_create(), we > > could pass an acl and a label and do the setting in vfs.c rather then > > nfs4proc.c. The difficult part of that approach is getting back the > > individual error statuses. That should be solvable though. > > The bulk of the work in nfsd_create() is done by lookup_one_len() > and nfsd_create_locked(), both of which are public APIs. The rest > of nfsd_create() is code that appears in several other places > already. "several" == 1. The only other call site for nfsd_create_locked() is in nfsd_proc_create() I would say that the "bulk" of the work is the interplay between locking, error checking, and these two functions you mention. > > > >> - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue: > >> nfsd_lookup() is being changed such that its semantics are > >> substantially different for NFSv4 than for others. This is > >> possibly an indication that nfsd_lookup() should also be > >> duplicated into the NFSv4-specific code and the generic VFS > >> version should be left alone. > > > > Again, I don't like duplication. In this case, I think the longer term > > solution is to remove the NFSv4 specific locking differences and solve > > the problem differently. i.e. don't hold the inode locked, but check > > for any possible rename after getting a lease. Once that is done, > > nfsd_lookup() can have saner semantics. > > Then, perhaps we should bite that bullet and do that work now. While this does have an appeal, it also looks like the start of a rabbit hole where I find have to fix up a growing collection of problems before my patch can land. I think balance is needed - certainly asking for some preliminary cleanup is appropriate. Expecting long standing and subtle issues that are tangential to the main goal to be resolved first is possibly asking too much. NeilBrown > > > >> I would prefer the code duplication approach in both these cases, > >> unless you can convince me that is a bad idea. > > > > When duplicating code results in substantial simplification in both > > copies, then it makes sense. Otherwise I think the default should be > > not to duplicate. > > I believe that duplicating in both cases above will result in > simpler and less brittle code. That's why I asked for it. > > -- > Chuck Lever > > > >
> On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote: > > On Wed, 13 Jul 2022, Chuck Lever III wrote: >> >>> On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote: >>> >>> On Thu, 07 Jul 2022, Chuck Lever III wrote: >>>> >>>>> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote: >>>>> >>>>> This series prepares NFSD to be able to adjust to work with a proposed >>>>> patch which allows updates to directories to happen in parallel. >>>>> This patch set changes the way directories are locked, so the present >>>>> series cleans up some locking in nfsd. >>>>> >>>>> Specifically we remove fh_lock() and fh_unlock(). >>>>> These functions are problematic for a few reasons. >>>>> - they are deliberately idempotent - setting or clearing a flag >>>>> so that a second call does nothing. This makes locking errors harder, >>>>> but it results in code that looks wrong ... and maybe sometimes is a >>>>> little bit wrong. >>>>> Early patches clean up several places where this idempotent nature of >>>>> the functions is depended on, and so makes the code clearer. >>>>> >>>>> - they transparently call fh_fill_pre/post_attrs(), including at times >>>>> when this is not necessary. Having the calls only when necessary is >>>>> marginally more efficient, and arguably makes the code clearer. >>>>> >>>>> nfsd_lookup() currently always locks the directory, though often no lock >>>>> is needed. So a patch in this series reduces this locking. >>>>> >>>>> There is an awkward case that could still be further improved. >>>>> NFSv4 open needs to ensure the file is not renamed (or unlinked etc) >>>>> between the moment when the open succeeds, and a later moment when a >>>>> "lease" is attached to support a delegation. The handling of this lock >>>>> is currently untidy, particularly when creating a file. >>>>> It would probably be better to take a lease immediately after >>>>> opening the file, and then discarding if after deciding not to provide a >>>>> delegation. >>>>> >>>>> I have run fstests and cthon tests on this, but I wouldn't be surprised >>>>> if there is a corner case that I've missed. >>>> >>>> Hi Neil, thanks for (re)posting. >>>> >>>> Let me make a few general comments here before I send out specific >>>> review nits. >>>> >>>> I'm concerned mostly with how this series can be adequately tested. >>>> The two particular areas I'm worried about: >>>> >>>> - There are some changes to NFSv2 code, which is effectively >>>> fallow. I think I can run v2 tests, once we decide what tests >>>> should be run. >>> >>> I hope we can still test v2... I know it is disabled by default.. >>> If we can't test it, we should consider removing it. >> >> The work of deprecating and removing NFSv2 is already under way. >> I think what I'm asking is if /you/ have tested the NFSv2 changes. > > That's a question I can answer. I haven't. I will. Just in case it's not clear by now, NFSv2 (and NFSv3) stability is the reason I don't want to perturb the NFSD VFS API code in any significant way unless it's absolutely needed. >>>> Secondarily, the series adds more bells and whistles to the generic >>>> NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular: >>>> >>>> - ("NFSD: change nfsd_create() to unlock directory before returning.") >>>> makes some big changes to nfsd_create(). But that helper itself >>>> is pretty small. Seems like cleaner code would result if NFSv4 >>>> had its own version of nfsd_create() to deal with the post-change >>>> cases. >>> >>> I would not like that approach. Duplicating code is rarely a good idea. >> >> De-duplicating code /can/ be a good idea, but isn't always a good >> idea. If the exceptional cases add a lot of logic, that can make the >> de-duplicated code difficult to read and reason about, and it can >> make it brittle, just as it does in this case. Modifications on >> behalf of NFSv4 in this common piece of code is possibly hazardous >> to NFSv3, and navigating around the exception logic makes it >> difficult to understand and review. > > Are we looking at the same code? > The sum total of extra code needed for v4 is: > - two extra parameters: > void (*post_create)(struct svc_fh *fh, void *data), > void *data) > - two lines of code: > if (!err && post_create) > post_create(resfhp, data); > > does that really make anything hard to follow? The synopsis of nfsd_create() is already pretty cluttered. You're adding that extra code in nfsd_symlink() too IIRC? And, this change adds a virtual function call (which has significant overhead these days) for reasons of convenience rather than necessity. >> IMO code duplication is not an appropriate design pattern in this >> specific instance. > > I'm guessing there is a missing negation in there. Yes, thanks. >>> Maybe, rather than passing a function and void* to nfsd_create(), we >>> could pass an acl and a label and do the setting in vfs.c rather then >>> nfs4proc.c. The difficult part of that approach is getting back the >>> individual error statuses. That should be solvable though. >> >> The bulk of the work in nfsd_create() is done by lookup_one_len() >> and nfsd_create_locked(), both of which are public APIs. The rest >> of nfsd_create() is code that appears in several other places >> already. > > "several" == 1. The only other call site for nfsd_create_locked() is in > nfsd_proc_create() But there are 15 call sites under fs/nfsd/ for lookup_one_len(). It's a pretty common trope. > I would say that the "bulk" of the work is the interplay between > locking, error checking, and these two functions you mention. You're looking at the details of your particular change, and I'm concerned about how much technical debt is going to continue to accrue in an area shared with legacy protocol implementation. I'm still asking myself if I can live with the extra parameters and virtual function call. Maybe. IMO, there are three ways forward: 1. I can merge your patch and move on. 2. I can merge your patch as it is and follow up with clean-up. 3. I can drop your patch and write it the way I prefer. >>>> - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue: >>>> nfsd_lookup() is being changed such that its semantics are >>>> substantially different for NFSv4 than for others. This is >>>> possibly an indication that nfsd_lookup() should also be >>>> duplicated into the NFSv4-specific code and the generic VFS >>>> version should be left alone. >>> >>> Again, I don't like duplication. In this case, I think the longer term >>> solution is to remove the NFSv4 specific locking differences and solve >>> the problem differently. i.e. don't hold the inode locked, but check >>> for any possible rename after getting a lease. Once that is done, >>> nfsd_lookup() can have saner semantics. >> >> Then, perhaps we should bite that bullet and do that work now. > > While this does have an appeal, it also looks like the start of a rabbit > hole where I find have to fix up a growing collection of problems before > my patch can land. That's kind of the nature of the beast. You are requesting changes that add technical debt with the promise that "sometime in the future" that debt will be erased. Meanwhile, nfsd_lookup() will be made harder to maintain, and it will continue to contain a real bug. I'm saying if there's a real bug here, addressing that should be the priority. > I think balance is needed - certainly asking for some preliminary > cleanup is appropriate. Expecting long standing and subtle issues that > are tangential to the main goal to be resolved first is possibly asking > too much. Fixing the bug seems to me (perhaps naively) to be directly related to the parallelism of directory operations. Why do you think it is tangential? Asking for bugs to be addressed before new features go in seems sensible to me, and is a common practice to enable the resulting fixes to be backported. If you don't want to address the bug you mentioned, then someone else will need to, and that's fine. I think the priority should be bug fixes first. Just to be clear, I'm referring to this issue: >> NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain >> locked and never has. So it is possible (though unlikely) for the >> newly created file to be renamed before a delegation is handed out, >> and that would be bad. This patch does *NOT* fix that, but *DOES* >> take the directory lock immediately after creating the file, which >> reduces the size of the window and ensure that the lock is held >> consistently. More work is needed to guarantee no rename happens >> before the delegation. > > Interesting. Maybe after taking the lock, we could re-vet the dentry vs. > the info in the OPEN request? That way, we'd presumably know that the > above race didn't occur. -- Chuck Lever
On Wed, 2022-07-13 at 14:15 +0000, Chuck Lever III wrote: > > > On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote: > > > > On Wed, 13 Jul 2022, Chuck Lever III wrote: > > > > > > > On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > On Thu, 07 Jul 2022, Chuck Lever III wrote: > > > > > > > > > > > On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > > > > > This series prepares NFSD to be able to adjust to work with a proposed > > > > > > patch which allows updates to directories to happen in parallel. > > > > > > This patch set changes the way directories are locked, so the present > > > > > > series cleans up some locking in nfsd. > > > > > > > > > > > > Specifically we remove fh_lock() and fh_unlock(). > > > > > > These functions are problematic for a few reasons. > > > > > > - they are deliberately idempotent - setting or clearing a flag > > > > > > so that a second call does nothing. This makes locking errors harder, > > > > > > but it results in code that looks wrong ... and maybe sometimes is a > > > > > > little bit wrong. > > > > > > Early patches clean up several places where this idempotent nature of > > > > > > the functions is depended on, and so makes the code clearer. > > > > > > > > > > > > - they transparently call fh_fill_pre/post_attrs(), including at times > > > > > > when this is not necessary. Having the calls only when necessary is > > > > > > marginally more efficient, and arguably makes the code clearer. > > > > > > > > > > > > nfsd_lookup() currently always locks the directory, though often no lock > > > > > > is needed. So a patch in this series reduces this locking. > > > > > > > > > > > > There is an awkward case that could still be further improved. > > > > > > NFSv4 open needs to ensure the file is not renamed (or unlinked etc) > > > > > > between the moment when the open succeeds, and a later moment when a > > > > > > "lease" is attached to support a delegation. The handling of this lock > > > > > > is currently untidy, particularly when creating a file. > > > > > > It would probably be better to take a lease immediately after > > > > > > opening the file, and then discarding if after deciding not to provide a > > > > > > delegation. > > > > > > > > > > > > I have run fstests and cthon tests on this, but I wouldn't be surprised > > > > > > if there is a corner case that I've missed. > > > > > > > > > > Hi Neil, thanks for (re)posting. > > > > > > > > > > Let me make a few general comments here before I send out specific > > > > > review nits. > > > > > > > > > > I'm concerned mostly with how this series can be adequately tested. > > > > > The two particular areas I'm worried about: > > > > > > > > > > - There are some changes to NFSv2 code, which is effectively > > > > > fallow. I think I can run v2 tests, once we decide what tests > > > > > should be run. > > > > > > > > I hope we can still test v2... I know it is disabled by default.. > > > > If we can't test it, we should consider removing it. > > > > > > The work of deprecating and removing NFSv2 is already under way. > > > I think what I'm asking is if /you/ have tested the NFSv2 changes. > > > > That's a question I can answer. I haven't. I will. > > Just in case it's not clear by now, NFSv2 (and NFSv3) > stability is the reason I don't want to perturb the > NFSD VFS API code in any significant way unless it's > absolutely needed. > > > > > > > Secondarily, the series adds more bells and whistles to the generic > > > > > NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular: > > > > > > > > > > - ("NFSD: change nfsd_create() to unlock directory before returning.") > > > > > makes some big changes to nfsd_create(). But that helper itself > > > > > is pretty small. Seems like cleaner code would result if NFSv4 > > > > > had its own version of nfsd_create() to deal with the post-change > > > > > cases. > > > > > > > > I would not like that approach. Duplicating code is rarely a good idea. > > > > > > De-duplicating code /can/ be a good idea, but isn't always a good > > > idea. If the exceptional cases add a lot of logic, that can make the > > > de-duplicated code difficult to read and reason about, and it can > > > make it brittle, just as it does in this case. Modifications on > > > behalf of NFSv4 in this common piece of code is possibly hazardous > > > to NFSv3, and navigating around the exception logic makes it > > > difficult to understand and review. > > > > Are we looking at the same code? > > The sum total of extra code needed for v4 is: > > - two extra parameters: > > void (*post_create)(struct svc_fh *fh, void *data), > > void *data) > > - two lines of code: > > if (!err && post_create) > > post_create(resfhp, data); > > > > does that really make anything hard to follow? > > The synopsis of nfsd_create() is already pretty cluttered. > > You're adding that extra code in nfsd_symlink() too IIRC? And, > this change adds a virtual function call (which has significant > overhead these days) for reasons of convenience rather than > necessity. > > > > > IMO code duplication is not an appropriate design pattern in this > > > specific instance. > > > > I'm guessing there is a missing negation in there. > > Yes, thanks. > > > > > > Maybe, rather than passing a function and void* to nfsd_create(), we > > > > could pass an acl and a label and do the setting in vfs.c rather then > > > > nfs4proc.c. The difficult part of that approach is getting back the > > > > individual error statuses. That should be solvable though. > > > > > > The bulk of the work in nfsd_create() is done by lookup_one_len() > > > and nfsd_create_locked(), both of which are public APIs. The rest > > > of nfsd_create() is code that appears in several other places > > > already. > > > > "several" == 1. The only other call site for nfsd_create_locked() is in > > nfsd_proc_create() > > But there are 15 call sites under fs/nfsd/ for lookup_one_len(). > It's a pretty common trope. > > > > I would say that the "bulk" of the work is the interplay between > > locking, error checking, and these two functions you mention. > > You're looking at the details of your particular change, and > I'm concerned about how much technical debt is going to > continue to accrue in an area shared with legacy protocol > implementation. > > I'm still asking myself if I can live with the extra parameters > and virtual function call. Maybe. IMO, there are three ways > forward: > > 1. I can merge your patch and move on. > 2. I can merge your patch as it is and follow up with clean-up. > 3. I can drop your patch and write it the way I prefer. > > > > > > > - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue: > > > > > nfsd_lookup() is being changed such that its semantics are > > > > > substantially different for NFSv4 than for others. This is > > > > > possibly an indication that nfsd_lookup() should also be > > > > > duplicated into the NFSv4-specific code and the generic VFS > > > > > version should be left alone. > > > > > > > > Again, I don't like duplication. In this case, I think the longer term > > > > solution is to remove the NFSv4 specific locking differences and solve > > > > the problem differently. i.e. don't hold the inode locked, but check > > > > for any possible rename after getting a lease. Once that is done, > > > > nfsd_lookup() can have saner semantics. > > > > > > Then, perhaps we should bite that bullet and do that work now. > > > > While this does have an appeal, it also looks like the start of a rabbit > > hole where I find have to fix up a growing collection of problems before > > my patch can land. > > That's kind of the nature of the beast. > > You are requesting changes that add technical debt with the > promise that "sometime in the future" that debt will be > erased. Meanwhile, nfsd_lookup() will be made harder to > maintain, and it will continue to contain a real bug. > > I'm saying if there's a real bug here, addressing that should > be the priority. > > > > I think balance is needed - certainly asking for some preliminary > > cleanup is appropriate. Expecting long standing and subtle issues that > > are tangential to the main goal to be resolved first is possibly asking > > too much. > > Fixing the bug seems to me (perhaps naively) to be directly > related to the parallelism of directory operations. Why do > you think it is tangential? > > Asking for bugs to be addressed before new features go in > seems sensible to me, and is a common practice to enable the > resulting fixes to be backported. If you don't want to address > the bug you mentioned, then someone else will need to, and > that's fine. I think the priority should be bug fixes first. > > Just to be clear, I'm referring to this issue: > > > > NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain > > > locked and never has. So it is possible (though unlikely) for the > > > newly created file to be renamed before a delegation is handed out, > > > and that would be bad. This patch does *NOT* fix that, but *DOES* > > > take the directory lock immediately after creating the file, which > > > reduces the size of the window and ensure that the lock is held > > > consistently. More work is needed to guarantee no rename happens > > > before the delegation. > > > > Interesting. Maybe after taking the lock, we could re-vet the dentry vs. > > the info in the OPEN request? That way, we'd presumably know that the > > above race didn't occur. > > I usually like to see bugfixes first too, but I haven't heard of anyone ever hitting this problem in the field. We've lived with this bug for a long time, so I don't see a problem with cleaning up the locking first and then fixing this on top of that. That said, if we're concerned about it, we could just add an extra check to nfs4_set_delegation. Basically, take parent mutex, set the delegation, walk the inode->i_dentry list and vet that one of them matches the OPEN args. That change probably wouldn't be too invasive and should be backportable, but it means taking that mutex twice. The alternate idea would be to try to set the delegation a lot earlier, while we still hold the parent mutex for the open. That makes the cleanup trickier, but is potentially more efficient. I think it'd be simpler to implement this on top of Neil's patchset since it simplifies the locking. Backporting that by itself is probably going to be painful though. What should we aim for here? -- Jeff Layton <jlayton@kernel.org>
> On Jul 13, 2022, at 3:13 PM, Jeff Layton <jlayton@kernel.org> wrote: > > On Wed, 2022-07-13 at 14:15 +0000, Chuck Lever III wrote: >> >> >> Just to be clear, I'm referring to this issue: >> >>>> NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain >>>> locked and never has. So it is possible (though unlikely) for the >>>> newly created file to be renamed before a delegation is handed out, >>>> and that would be bad. This patch does *NOT* fix that, but *DOES* >>>> take the directory lock immediately after creating the file, which >>>> reduces the size of the window and ensure that the lock is held >>>> consistently. More work is needed to guarantee no rename happens >>>> before the delegation. >>> >>> Interesting. Maybe after taking the lock, we could re-vet the dentry vs. >>> the info in the OPEN request? That way, we'd presumably know that the >>> above race didn't occur. > > I usually like to see bugfixes first too, but I haven't heard of anyone > ever hitting this problem in the field. We've lived with this bug for a > long time, so I don't see a problem with cleaning up the locking first > and then fixing this on top of that. > > That said, if we're concerned about it, we could just add an extra check > to nfs4_set_delegation. Basically, take parent mutex, set the > delegation, walk the inode->i_dentry list and vet that one of them > matches the OPEN args. That change probably wouldn't be too invasive and > should be backportable, but it means taking that mutex twice. > > The alternate idea would be to try to set the delegation a lot earlier, > while we still hold the parent mutex for the open. That makes the > cleanup trickier, but is potentially more efficient. I think it'd be > simpler to implement this on top of Neil's patchset since it simplifies > the locking. Backporting that by itself is probably going to be painful > though. > > What should we aim for here? If there is consensus that a fix for a possible delegation/rename race is not necessary in stable kernels, then there is a little more maneuverability -- we can shoot for what is ideal moving forward. Again, my main concerns are not perturbing the legacy code if we don't have to, and having the NFSv4 code spell out its locking requirements clearly. After that, performance is important, so avoiding taking locks and mutexes (mutices?) unnecessarily would be best. -- Chuck Lever
On Thu, 14 Jul 2022, Chuck Lever III wrote: > > > On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote: > > > > On Wed, 13 Jul 2022, Chuck Lever III wrote: > >> > >>> On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote: > >>> > >>> On Thu, 07 Jul 2022, Chuck Lever III wrote: > >>>> > >>>>> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote: > >>>>> > >>>>> This series prepares NFSD to be able to adjust to work with a proposed > >>>>> patch which allows updates to directories to happen in parallel. > >>>>> This patch set changes the way directories are locked, so the present > >>>>> series cleans up some locking in nfsd. > >>>>> > >>>>> Specifically we remove fh_lock() and fh_unlock(). > >>>>> These functions are problematic for a few reasons. > >>>>> - they are deliberately idempotent - setting or clearing a flag > >>>>> so that a second call does nothing. This makes locking errors harder, > >>>>> but it results in code that looks wrong ... and maybe sometimes is a > >>>>> little bit wrong. > >>>>> Early patches clean up several places where this idempotent nature of > >>>>> the functions is depended on, and so makes the code clearer. > >>>>> > >>>>> - they transparently call fh_fill_pre/post_attrs(), including at times > >>>>> when this is not necessary. Having the calls only when necessary is > >>>>> marginally more efficient, and arguably makes the code clearer. > >>>>> > >>>>> nfsd_lookup() currently always locks the directory, though often no lock > >>>>> is needed. So a patch in this series reduces this locking. > >>>>> > >>>>> There is an awkward case that could still be further improved. > >>>>> NFSv4 open needs to ensure the file is not renamed (or unlinked etc) > >>>>> between the moment when the open succeeds, and a later moment when a > >>>>> "lease" is attached to support a delegation. The handling of this lock > >>>>> is currently untidy, particularly when creating a file. > >>>>> It would probably be better to take a lease immediately after > >>>>> opening the file, and then discarding if after deciding not to provide a > >>>>> delegation. > >>>>> > >>>>> I have run fstests and cthon tests on this, but I wouldn't be surprised > >>>>> if there is a corner case that I've missed. > >>>> > >>>> Hi Neil, thanks for (re)posting. > >>>> > >>>> Let me make a few general comments here before I send out specific > >>>> review nits. > >>>> > >>>> I'm concerned mostly with how this series can be adequately tested. > >>>> The two particular areas I'm worried about: > >>>> > >>>> - There are some changes to NFSv2 code, which is effectively > >>>> fallow. I think I can run v2 tests, once we decide what tests > >>>> should be run. > >>> > >>> I hope we can still test v2... I know it is disabled by default.. > >>> If we can't test it, we should consider removing it. > >> > >> The work of deprecating and removing NFSv2 is already under way. > >> I think what I'm asking is if /you/ have tested the NFSv2 changes. > > > > That's a question I can answer. I haven't. I will. > > Just in case it's not clear by now, NFSv2 (and NFSv3) > stability is the reason I don't want to perturb the > NFSD VFS API code in any significant way unless it's > absolutely needed. "absolutely" seems like a rather high bar. It makes the goal of "stability" seem more like "ossification". Certainly we don't want to break things. Equally certainly we will. I don't think "hold back useful changes out of fear" is the path to success. Testing, review, and responding to bug reports is what we find works best - and what we generally do. And I wouldn't class NFSv3 at all with NFSv2 (not even in parentheses). NFSv2 has very few (if any) users in the broad community, so bugs are likely to go unnoticed for extended periods. NFSv3 is, I believe, still widely used, though not as widely as v4. There are use-cases where I would recommend v3. e.g. a case could certainly be made to not extend my directory-locking changes to v2-specific code, but v3 should get them. > > > >>>> Secondarily, the series adds more bells and whistles to the generic > >>>> NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular: > >>>> > >>>> - ("NFSD: change nfsd_create() to unlock directory before returning.") > >>>> makes some big changes to nfsd_create(). But that helper itself > >>>> is pretty small. Seems like cleaner code would result if NFSv4 > >>>> had its own version of nfsd_create() to deal with the post-change > >>>> cases. > >>> > >>> I would not like that approach. Duplicating code is rarely a good idea. > >> > >> De-duplicating code /can/ be a good idea, but isn't always a good > >> idea. If the exceptional cases add a lot of logic, that can make the > >> de-duplicated code difficult to read and reason about, and it can > >> make it brittle, just as it does in this case. Modifications on > >> behalf of NFSv4 in this common piece of code is possibly hazardous > >> to NFSv3, and navigating around the exception logic makes it > >> difficult to understand and review. > > > > Are we looking at the same code? > > The sum total of extra code needed for v4 is: > > - two extra parameters: > > void (*post_create)(struct svc_fh *fh, void *data), > > void *data) > > - two lines of code: > > if (!err && post_create) > > post_create(resfhp, data); > > > > does that really make anything hard to follow? > > The synopsis of nfsd_create() is already pretty cluttered. Would it help to collect related details (name, type, attributes, acl, label) into a struct and pass a reference to that around? One awkwardness in my latest patch is that the acl and label are not set in nfsd_create_setattr(). If they were passed around together with the attributes, that would be a lot easier to achieve. > > You're adding that extra code in nfsd_symlink() too IIRC? And, > this change adds a virtual function call (which has significant > overhead these days) for reasons of convenience rather than > necessity. "significant"? In context of creation a file, I don't think one more virtual function call is all that significant? I started writing code to avoid the function pointer, but the function args list exploded. If you could be happy with e.g. a struct nfs_create_args which contains attrs, acl, label, and was passed into nfsd_create_setattr(), then I think that would cleanly avoid the desire for a function pointer. > > > >> IMO code duplication is not an appropriate design pattern in this > >> specific instance. > > > > I'm guessing there is a missing negation in there. > > Yes, thanks. > > > >>> Maybe, rather than passing a function and void* to nfsd_create(), we > >>> could pass an acl and a label and do the setting in vfs.c rather then > >>> nfs4proc.c. The difficult part of that approach is getting back the > >>> individual error statuses. That should be solvable though. > >> > >> The bulk of the work in nfsd_create() is done by lookup_one_len() > >> and nfsd_create_locked(), both of which are public APIs. The rest > >> of nfsd_create() is code that appears in several other places > >> already. > > > > "several" == 1. The only other call site for nfsd_create_locked() is in > > nfsd_proc_create() > > But there are 15 call sites under fs/nfsd/ for lookup_one_len(). > It's a pretty common trope. > > > > I would say that the "bulk" of the work is the interplay between > > locking, error checking, and these two functions you mention. > > You're looking at the details of your particular change, and > I'm concerned about how much technical debt is going to > continue to accrue in an area shared with legacy protocol > implementation. > > I'm still asking myself if I can live with the extra parameters > and virtual function call. Maybe. IMO, there are three ways > forward: > > 1. I can merge your patch and move on. > 2. I can merge your patch as it is and follow up with clean-up. > 3. I can drop your patch and write it the way I prefer. > > > >>>> - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue: > >>>> nfsd_lookup() is being changed such that its semantics are > >>>> substantially different for NFSv4 than for others. This is > >>>> possibly an indication that nfsd_lookup() should also be > >>>> duplicated into the NFSv4-specific code and the generic VFS > >>>> version should be left alone. > >>> > >>> Again, I don't like duplication. In this case, I think the longer term > >>> solution is to remove the NFSv4 specific locking differences and solve > >>> the problem differently. i.e. don't hold the inode locked, but check > >>> for any possible rename after getting a lease. Once that is done, > >>> nfsd_lookup() can have saner semantics. > >> > >> Then, perhaps we should bite that bullet and do that work now. > > > > While this does have an appeal, it also looks like the start of a rabbit > > hole where I find have to fix up a growing collection of problems before > > my patch can land. > > That's kind of the nature of the beast. > > You are requesting changes that add technical debt with the > promise that "sometime in the future" that debt will be > erased. Meanwhile, nfsd_lookup() will be made harder to > maintain, and it will continue to contain a real bug. > > I'm saying if there's a real bug here, addressing that should > be the priority. > > > > I think balance is needed - certainly asking for some preliminary > > cleanup is appropriate. Expecting long standing and subtle issues that > > are tangential to the main goal to be resolved first is possibly asking > > too much. > > Fixing the bug seems to me (perhaps naively) to be directly > related to the parallelism of directory operations. Why do > you think it is tangential? > The bug was exposed by the analysis required for the changes I want, but I think that is as close as the connection goes. To really see if it is tangential, we would need to come up with a solution and see how easily it is ported across my patches. As I said in a reply to Jeff, I don't think locking of the parent directory should be part of the solution. After getting a lease, we repeat the lookup and see if dentry has changed. After my patches, that would mean calling lookup_one_len_unlocked(). Before my patches, it would mean calling fh_unlock() earlier to ensure the directory is no longer locked but still calling lookup_one_len_unlocked() > Asking for bugs to be addressed before new features go in > seems sensible to me, and is a common practice to enable the > resulting fixes to be backported. If you don't want to address > the bug you mentioned, then someone else will need to, and > that's fine. I think the priority should be bug fixes first. As a general principle I would agree. In this case the bug is minor, long standing, and tangential. And the series imposes enough review burden as it is. But if Jeff can produce a fix, either to be applied before or after, then that would be an excellent solution. Thanks, NeilBrown > > Just to be clear, I'm referring to this issue: > > >> NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain > >> locked and never has. So it is possible (though unlikely) for the > >> newly created file to be renamed before a delegation is handed out, > >> and that would be bad. This patch does *NOT* fix that, but *DOES* > >> take the directory lock immediately after creating the file, which > >> reduces the size of the window and ensure that the lock is held > >> consistently. More work is needed to guarantee no rename happens > >> before the delegation. > > > > Interesting. Maybe after taking the lock, we could re-vet the dentry vs. > > the info in the OPEN request? That way, we'd presumably know that the > > above race didn't occur. > > > > -- > Chuck Lever > > > >
On Fri, 2022-07-15 at 12:36 +1000, NeilBrown wrote: > On Thu, 14 Jul 2022, Chuck Lever III wrote: > > > > > On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote: > > > > > > On Wed, 13 Jul 2022, Chuck Lever III wrote: > > > > > > > > > On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > > > On Thu, 07 Jul 2022, Chuck Lever III wrote: > > > > > > > > > > > > > On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > > > > > > > This series prepares NFSD to be able to adjust to work with a proposed > > > > > > > patch which allows updates to directories to happen in parallel. > > > > > > > This patch set changes the way directories are locked, so the present > > > > > > > series cleans up some locking in nfsd. > > > > > > > > > > > > > > Specifically we remove fh_lock() and fh_unlock(). > > > > > > > These functions are problematic for a few reasons. > > > > > > > - they are deliberately idempotent - setting or clearing a flag > > > > > > > so that a second call does nothing. This makes locking errors harder, > > > > > > > but it results in code that looks wrong ... and maybe sometimes is a > > > > > > > little bit wrong. > > > > > > > Early patches clean up several places where this idempotent nature of > > > > > > > the functions is depended on, and so makes the code clearer. > > > > > > > > > > > > > > - they transparently call fh_fill_pre/post_attrs(), including at times > > > > > > > when this is not necessary. Having the calls only when necessary is > > > > > > > marginally more efficient, and arguably makes the code clearer. > > > > > > > > > > > > > > nfsd_lookup() currently always locks the directory, though often no lock > > > > > > > is needed. So a patch in this series reduces this locking. > > > > > > > > > > > > > > There is an awkward case that could still be further improved. > > > > > > > NFSv4 open needs to ensure the file is not renamed (or unlinked etc) > > > > > > > between the moment when the open succeeds, and a later moment when a > > > > > > > "lease" is attached to support a delegation. The handling of this lock > > > > > > > is currently untidy, particularly when creating a file. > > > > > > > It would probably be better to take a lease immediately after > > > > > > > opening the file, and then discarding if after deciding not to provide a > > > > > > > delegation. > > > > > > > > > > > > > > I have run fstests and cthon tests on this, but I wouldn't be surprised > > > > > > > if there is a corner case that I've missed. > > > > > > > > > > > > Hi Neil, thanks for (re)posting. > > > > > > > > > > > > Let me make a few general comments here before I send out specific > > > > > > review nits. > > > > > > > > > > > > I'm concerned mostly with how this series can be adequately tested. > > > > > > The two particular areas I'm worried about: > > > > > > > > > > > > - There are some changes to NFSv2 code, which is effectively > > > > > > fallow. I think I can run v2 tests, once we decide what tests > > > > > > should be run. > > > > > > > > > > I hope we can still test v2... I know it is disabled by default.. > > > > > If we can't test it, we should consider removing it. > > > > > > > > The work of deprecating and removing NFSv2 is already under way. > > > > I think what I'm asking is if /you/ have tested the NFSv2 changes. > > > > > > That's a question I can answer. I haven't. I will. > > > > Just in case it's not clear by now, NFSv2 (and NFSv3) > > stability is the reason I don't want to perturb the > > NFSD VFS API code in any significant way unless it's > > absolutely needed. > > "absolutely" seems like a rather high bar. It makes the goal of > "stability" seem more like "ossification". > Certainly we don't want to break things. Equally certainly we will. > I don't think "hold back useful changes out of fear" is the path to > success. Testing, review, and responding to bug reports is what we find > works best - and what we generally do. > > And I wouldn't class NFSv3 at all with NFSv2 (not even in parentheses). > NFSv2 has very few (if any) users in the broad community, so bugs are > likely to go unnoticed for extended periods. NFSv3 is, I believe, still > widely used, though not as widely as v4. There are use-cases where I > would recommend v3. > > e.g. a case could certainly be made to not extend my directory-locking > changes to v2-specific code, but v3 should get them. > > > > > > > > > > > Secondarily, the series adds more bells and whistles to the generic > > > > > > NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular: > > > > > > > > > > > > - ("NFSD: change nfsd_create() to unlock directory before returning.") > > > > > > makes some big changes to nfsd_create(). But that helper itself > > > > > > is pretty small. Seems like cleaner code would result if NFSv4 > > > > > > had its own version of nfsd_create() to deal with the post-change > > > > > > cases. > > > > > > > > > > I would not like that approach. Duplicating code is rarely a good idea. > > > > > > > > De-duplicating code /can/ be a good idea, but isn't always a good > > > > idea. If the exceptional cases add a lot of logic, that can make the > > > > de-duplicated code difficult to read and reason about, and it can > > > > make it brittle, just as it does in this case. Modifications on > > > > behalf of NFSv4 in this common piece of code is possibly hazardous > > > > to NFSv3, and navigating around the exception logic makes it > > > > difficult to understand and review. > > > > > > Are we looking at the same code? > > > The sum total of extra code needed for v4 is: > > > - two extra parameters: > > > void (*post_create)(struct svc_fh *fh, void *data), > > > void *data) > > > - two lines of code: > > > if (!err && post_create) > > > post_create(resfhp, data); > > > > > > does that really make anything hard to follow? > > > > The synopsis of nfsd_create() is already pretty cluttered. > > Would it help to collect related details (name, type, attributes, acl, > label) into a struct and pass a reference to that around? > One awkwardness in my latest patch is that the acl and label are not set > in nfsd_create_setattr(). If they were passed around together with the > attributes, that would be a lot easier to achieve. > > > > > You're adding that extra code in nfsd_symlink() too IIRC? And, > > this change adds a virtual function call (which has significant > > overhead these days) for reasons of convenience rather than > > necessity. > > "significant"? In context of creation a file, I don't think one more > virtual function call is all that significant? > > I started writing code to avoid the function pointer, but the function > args list exploded. If you could be happy with e.g. a struct > nfs_create_args which contains attrs, acl, label, and was passed into > nfsd_create_setattr(), then I think that would cleanly avoid the desire > for a function pointer. > > > > > > > > > IMO code duplication is not an appropriate design pattern in this > > > > specific instance. > > > > > > I'm guessing there is a missing negation in there. > > > > Yes, thanks. > > > > > > > > > Maybe, rather than passing a function and void* to nfsd_create(), we > > > > > could pass an acl and a label and do the setting in vfs.c rather then > > > > > nfs4proc.c. The difficult part of that approach is getting back the > > > > > individual error statuses. That should be solvable though. > > > > > > > > The bulk of the work in nfsd_create() is done by lookup_one_len() > > > > and nfsd_create_locked(), both of which are public APIs. The rest > > > > of nfsd_create() is code that appears in several other places > > > > already. > > > > > > "several" == 1. The only other call site for nfsd_create_locked() is in > > > nfsd_proc_create() > > > > But there are 15 call sites under fs/nfsd/ for lookup_one_len(). > > It's a pretty common trope. > > > > > > > I would say that the "bulk" of the work is the interplay between > > > locking, error checking, and these two functions you mention. > > > > You're looking at the details of your particular change, and > > I'm concerned about how much technical debt is going to > > continue to accrue in an area shared with legacy protocol > > implementation. > > > > I'm still asking myself if I can live with the extra parameters > > and virtual function call. Maybe. IMO, there are three ways > > forward: > > > > 1. I can merge your patch and move on. > > 2. I can merge your patch as it is and follow up with clean-up. > > 3. I can drop your patch and write it the way I prefer. > > > > > > > > > > - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue: > > > > > > nfsd_lookup() is being changed such that its semantics are > > > > > > substantially different for NFSv4 than for others. This is > > > > > > possibly an indication that nfsd_lookup() should also be > > > > > > duplicated into the NFSv4-specific code and the generic VFS > > > > > > version should be left alone. > > > > > > > > > > Again, I don't like duplication. In this case, I think the longer term > > > > > solution is to remove the NFSv4 specific locking differences and solve > > > > > the problem differently. i.e. don't hold the inode locked, but check > > > > > for any possible rename after getting a lease. Once that is done, > > > > > nfsd_lookup() can have saner semantics. > > > > > > > > Then, perhaps we should bite that bullet and do that work now. > > > > > > While this does have an appeal, it also looks like the start of a rabbit > > > hole where I find have to fix up a growing collection of problems before > > > my patch can land. > > > > That's kind of the nature of the beast. > > > > You are requesting changes that add technical debt with the > > promise that "sometime in the future" that debt will be > > erased. Meanwhile, nfsd_lookup() will be made harder to > > maintain, and it will continue to contain a real bug. > > > > I'm saying if there's a real bug here, addressing that should > > be the priority. > > > > > > > I think balance is needed - certainly asking for some preliminary > > > cleanup is appropriate. Expecting long standing and subtle issues that > > > are tangential to the main goal to be resolved first is possibly asking > > > too much. > > > > Fixing the bug seems to me (perhaps naively) to be directly > > related to the parallelism of directory operations. Why do > > you think it is tangential? > > > > The bug was exposed by the analysis required for the changes I want, but > I think that is as close as the connection goes. > To really see if it is tangential, we would need to come up with a > solution and see how easily it is ported across my patches. > > As I said in a reply to Jeff, I don't think locking of the parent > directory should be part of the solution. After getting a lease, we > repeat the lookup and see if dentry has changed. After my patches, that > would mean calling lookup_one_len_unlocked(). Before my patches, it > would mean calling fh_unlock() earlier to ensure the directory is no > longer locked but still calling lookup_one_len_unlocked() > > > > Asking for bugs to be addressed before new features go in > > seems sensible to me, and is a common practice to enable the > > resulting fixes to be backported. If you don't want to address > > the bug you mentioned, then someone else will need to, and > > that's fine. I think the priority should be bug fixes first. > > As a general principle I would agree. > In this case the bug is minor, long standing, and tangential. > And the series imposes enough review burden as it is. > > But if Jeff can produce a fix, either to be applied before or after, > then that would be an excellent solution. > > Thanks, > NeilBrown > > FWIW, I hit this while running fstests against the server with some draft changes. This crash is not in an area I touched, so I suspect something in Neil's locking cleanup: [ 434.257242] ------------[ cut here ]------------ [ 434.257278] kernel BUG at fs/nfsd/xdr4.h:752! [ 434.257755] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 434.257937] CPU: 2 PID: 1202 Comm: nfsd Kdump: loaded Tainted: G OE 5.19.0-rc5+ #316 [ 434.258179] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.0-3.el9 04/01/2014 [ 434.258371] RIP: 0010:nfsd4_open+0x1940/0x1a30 [nfsd] [ 434.258615] Code: 40 e8 54 04 e2 e6 48 8b 7c 24 40 41 89 c4 e8 57 7e d4 e6 41 f7 d4 4c 8b 44 24 60 66 44 21 63 44 48 8b 54 24 40 e9 46 fc ff ff <0f> 0b 48 8d bd 88 00 00 00 e8 72 80 d4 e6 4c 8b ad 88 00 00 00 49 [ 434.259053] RSP: 0018:ffff888134897c10 EFLAGS: 00010246 [ 434.259211] RAX: 0000000000000000 RBX: ffff8881348791a0 RCX: ffffffffc07fbb44 [ 434.259408] RDX: 1ffff1102691001a RSI: 0000000000000002 RDI: ffff8881348800d1 [ 434.259606] RBP: ffff888134880030 R08: 0000000000000000 R09: 0000000000000001 [ 434.259802] R10: fffffbfff542dfac R11: 0000000000000001 R12: 0000000000000000 [ 434.260000] R13: ffff888133ab8000 R14: 0000000000000000 R15: ffff8881165db400 [ 434.260195] FS: 0000000000000000(0000) GS:ffff888225500000(0000) knlGS:0000000000000000 [ 434.260466] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 434.260697] CR2: 0000558890b5d178 CR3: 0000000107844000 CR4: 00000000003506e0 [ 434.260949] Call Trace: [ 434.261106] <TASK> [ 434.261260] ? nfsd4_decode_notsupp+0x10/0x10 [nfsd] [ 434.261549] ? nfsd4_interssc_connect+0x8e0/0x8e0 [nfsd] [ 434.261858] ? preempt_count_sub+0x14/0xc0 [ 434.262054] ? percpu_counter_add_batch+0x60/0xd0 [ 434.262261] nfsd4_proc_compound+0x8c6/0xe90 [nfsd] [ 434.262553] nfsd_dispatch+0x2a9/0x5c0 [nfsd] [ 434.262836] svc_process_common+0x6ab/0xc30 [sunrpc] [ 434.263162] ? svc_create_pooled+0x390/0x390 [sunrpc] [ 434.263484] ? nfsd_svc+0x830/0x830 [nfsd] [ 434.263755] ? svc_recv+0xabf/0x1310 [sunrpc] [ 434.264074] svc_process+0x1a3/0x200 [sunrpc] [ 434.264382] nfsd+0x1d7/0x390 [nfsd] [ 434.264640] ? nfsd_shutdown_threads+0x200/0x200 [nfsd] [ 434.264926] kthread+0x167/0x1a0 [ 434.265101] ? kthread_complete_and_exit+0x20/0x20 [ 434.265307] ret_from_fork+0x22/0x30 [ 434.265494] </TASK> [ 434.265652] Modules linked in: rpcsec_gss_krb5(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) rfkill(E) nf_tables(E) nfnetlink(E) iTCO_wdt(E) intel_rapl_msr(E) intel_pmc_bxt(E) iTCO_vendor_support(E) joydev(E) intel_rapl_common(E) i2c_i801(E) i2c_smbus(E) virtio_balloon(E) lpc_ich(E) nfsd(OE) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) drm(E) fuse(E) sunrpc(E) xfs(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) virtio_blk(E) virtio_console(E) virtio_net(E) net_failover(E) failover(E) qemu_fw_cfg(E) [ 434.267660] ---[ end trace 0000000000000000 ]--- [ 434.267870] RIP: 0010:nfsd4_open+0x1940/0x1a30 [nfsd] [ 434.268161] Code: 40 e8 54 04 e2 e6 48 8b 7c 24 40 41 89 c4 e8 57 7e d4 e6 41 f7 d4 4c 8b 44 24 60 66 44 21 63 44 48 8b 54 24 40 e9 46 fc ff ff <0f> 0b 48 8d bd 88 00 00 00 e8 72 80 d4 e6 4c 8b ad 88 00 00 00 49 [ 434.268771] RSP: 0018:ffff888134897c10 EFLAGS: 00010246 [ 434.268995] RAX: 0000000000000000 RBX: ffff8881348791a0 RCX: ffffffffc07fbb44 [ 434.269247] RDX: 1ffff1102691001a RSI: 0000000000000002 RDI: ffff8881348800d1 [ 434.269511] RBP: ffff888134880030 R08: 0000000000000000 R09: 0000000000000001 [ 434.269775] R10: fffffbfff542dfac R11: 0000000000000001 R12: 0000000000000000 [ 434.270030] R13: ffff888133ab8000 R14: 0000000000000000 R15: ffff8881165db400 [ 434.270288] FS: 0000000000000000(0000) GS:ffff888225500000(0000) knlGS:0000000000000000 [ 434.270632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 434.270862] CR2: 0000558890b5d178 CR3: 0000000107844000 CR4: 00000000003506e0 faddr2line says: $ ./scripts/faddr2line --list fs/nfsd/nfsd.ko nfsd4_open+0x1940 nfsd4_open+0x1940/0x1a30: set_change_info at /home/jlayton/git/linux/fs/nfsd/xdr4.h:752 747 #define NFS4_SVC_XDRSIZE sizeof(struct nfsd4_compoundargs) 748 749 static inline void 750 set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) 751 { >752< BUG_ON(!fhp->fh_pre_saved); 753 cinfo->atomic = (u32)(fhp->fh_post_saved && !fhp->fh_no_atomic_attr); 754 755 cinfo->before_change = fhp->fh_pre_change; 756 cinfo->after_change = fhp->fh_post_change; 757 } (inlined by) do_open_lookup at /home/jlayton/git/linux/fs/nfsd/nfs4proc.c:502 497 accmode = NFSD_MAY_NOP; 498 if (open->op_created || 499 open->op_claim_type == NFS4_OPEN_CLAIM_DELEGATE_CUR) 500 accmode |= NFSD_MAY_OWNER_OVERRIDE; 501 status = do_open_permission(rqstp, *resfh, open, accmode); >502< set_change_info(&open->op_cinfo, current_fh); 503 out: 504 if (status) 505 inode_unlock(current_fh->fh_dentry->d_inode); 506 return status; 507 } (inlined by) nfsd4_open at /home/jlayton/git/linux/fs/nfsd/nfs4proc.c:625 620 goto out; 621 622 switch (open->op_claim_type) { 623 case NFS4_OPEN_CLAIM_DELEGATE_CUR: 624 case NFS4_OPEN_CLAIM_NULL: >625< status = do_open_lookup(rqstp, cstate, open, &resfh); 626 if (status) 627 goto out; 628 locked = true; 629 break; 630 case NFS4_OPEN_CLAIM_PREVIOUS: I haven't yet dug down into the actual cause yet.
On Fri, 2022-07-15 at 09:01 -0400, Jeff Layton wrote: > On Fri, 2022-07-15 at 12:36 +1000, NeilBrown wrote: > > On Thu, 14 Jul 2022, Chuck Lever III wrote: > > > > > > > On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > On Wed, 13 Jul 2022, Chuck Lever III wrote: > > > > > > > > > > > On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > > > > > On Thu, 07 Jul 2022, Chuck Lever III wrote: > > > > > > > > > > > > > > > On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > > > > > > > > > This series prepares NFSD to be able to adjust to work with a proposed > > > > > > > > patch which allows updates to directories to happen in parallel. > > > > > > > > This patch set changes the way directories are locked, so the present > > > > > > > > series cleans up some locking in nfsd. > > > > > > > > > > > > > > > > Specifically we remove fh_lock() and fh_unlock(). > > > > > > > > These functions are problematic for a few reasons. > > > > > > > > - they are deliberately idempotent - setting or clearing a flag > > > > > > > > so that a second call does nothing. This makes locking errors harder, > > > > > > > > but it results in code that looks wrong ... and maybe sometimes is a > > > > > > > > little bit wrong. > > > > > > > > Early patches clean up several places where this idempotent nature of > > > > > > > > the functions is depended on, and so makes the code clearer. > > > > > > > > > > > > > > > > - they transparently call fh_fill_pre/post_attrs(), including at times > > > > > > > > when this is not necessary. Having the calls only when necessary is > > > > > > > > marginally more efficient, and arguably makes the code clearer. > > > > > > > > > > > > > > > > nfsd_lookup() currently always locks the directory, though often no lock > > > > > > > > is needed. So a patch in this series reduces this locking. > > > > > > > > > > > > > > > > There is an awkward case that could still be further improved. > > > > > > > > NFSv4 open needs to ensure the file is not renamed (or unlinked etc) > > > > > > > > between the moment when the open succeeds, and a later moment when a > > > > > > > > "lease" is attached to support a delegation. The handling of this lock > > > > > > > > is currently untidy, particularly when creating a file. > > > > > > > > It would probably be better to take a lease immediately after > > > > > > > > opening the file, and then discarding if after deciding not to provide a > > > > > > > > delegation. > > > > > > > > > > > > > > > > I have run fstests and cthon tests on this, but I wouldn't be surprised > > > > > > > > if there is a corner case that I've missed. > > > > > > > > > > > > > > Hi Neil, thanks for (re)posting. > > > > > > > > > > > > > > Let me make a few general comments here before I send out specific > > > > > > > review nits. > > > > > > > > > > > > > > I'm concerned mostly with how this series can be adequately tested. > > > > > > > The two particular areas I'm worried about: > > > > > > > > > > > > > > - There are some changes to NFSv2 code, which is effectively > > > > > > > fallow. I think I can run v2 tests, once we decide what tests > > > > > > > should be run. > > > > > > > > > > > > I hope we can still test v2... I know it is disabled by default.. > > > > > > If we can't test it, we should consider removing it. > > > > > > > > > > The work of deprecating and removing NFSv2 is already under way. > > > > > I think what I'm asking is if /you/ have tested the NFSv2 changes. > > > > > > > > That's a question I can answer. I haven't. I will. > > > > > > Just in case it's not clear by now, NFSv2 (and NFSv3) > > > stability is the reason I don't want to perturb the > > > NFSD VFS API code in any significant way unless it's > > > absolutely needed. > > > > "absolutely" seems like a rather high bar. It makes the goal of > > "stability" seem more like "ossification". > > Certainly we don't want to break things. Equally certainly we will. > > I don't think "hold back useful changes out of fear" is the path to > > success. Testing, review, and responding to bug reports is what we find > > works best - and what we generally do. > > > > And I wouldn't class NFSv3 at all with NFSv2 (not even in parentheses). > > NFSv2 has very few (if any) users in the broad community, so bugs are > > likely to go unnoticed for extended periods. NFSv3 is, I believe, still > > widely used, though not as widely as v4. There are use-cases where I > > would recommend v3. > > > > e.g. a case could certainly be made to not extend my directory-locking > > changes to v2-specific code, but v3 should get them. > > > > > > > > > > > > > > > Secondarily, the series adds more bells and whistles to the generic > > > > > > > NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular: > > > > > > > > > > > > > > - ("NFSD: change nfsd_create() to unlock directory before returning.") > > > > > > > makes some big changes to nfsd_create(). But that helper itself > > > > > > > is pretty small. Seems like cleaner code would result if NFSv4 > > > > > > > had its own version of nfsd_create() to deal with the post-change > > > > > > > cases. > > > > > > > > > > > > I would not like that approach. Duplicating code is rarely a good idea. > > > > > > > > > > De-duplicating code /can/ be a good idea, but isn't always a good > > > > > idea. If the exceptional cases add a lot of logic, that can make the > > > > > de-duplicated code difficult to read and reason about, and it can > > > > > make it brittle, just as it does in this case. Modifications on > > > > > behalf of NFSv4 in this common piece of code is possibly hazardous > > > > > to NFSv3, and navigating around the exception logic makes it > > > > > difficult to understand and review. > > > > > > > > Are we looking at the same code? > > > > The sum total of extra code needed for v4 is: > > > > - two extra parameters: > > > > void (*post_create)(struct svc_fh *fh, void *data), > > > > void *data) > > > > - two lines of code: > > > > if (!err && post_create) > > > > post_create(resfhp, data); > > > > > > > > does that really make anything hard to follow? > > > > > > The synopsis of nfsd_create() is already pretty cluttered. > > > > Would it help to collect related details (name, type, attributes, acl, > > label) into a struct and pass a reference to that around? > > One awkwardness in my latest patch is that the acl and label are not set > > in nfsd_create_setattr(). If they were passed around together with the > > attributes, that would be a lot easier to achieve. > > > > > > > > You're adding that extra code in nfsd_symlink() too IIRC? And, > > > this change adds a virtual function call (which has significant > > > overhead these days) for reasons of convenience rather than > > > necessity. > > > > "significant"? In context of creation a file, I don't think one more > > virtual function call is all that significant? > > > > I started writing code to avoid the function pointer, but the function > > args list exploded. If you could be happy with e.g. a struct > > nfs_create_args which contains attrs, acl, label, and was passed into > > nfsd_create_setattr(), then I think that would cleanly avoid the desire > > for a function pointer. > > > > > > > > > > > > > IMO code duplication is not an appropriate design pattern in this > > > > > specific instance. > > > > > > > > I'm guessing there is a missing negation in there. > > > > > > Yes, thanks. > > > > > > > > > > > > Maybe, rather than passing a function and void* to nfsd_create(), we > > > > > > could pass an acl and a label and do the setting in vfs.c rather then > > > > > > nfs4proc.c. The difficult part of that approach is getting back the > > > > > > individual error statuses. That should be solvable though. > > > > > > > > > > The bulk of the work in nfsd_create() is done by lookup_one_len() > > > > > and nfsd_create_locked(), both of which are public APIs. The rest > > > > > of nfsd_create() is code that appears in several other places > > > > > already. > > > > > > > > "several" == 1. The only other call site for nfsd_create_locked() is in > > > > nfsd_proc_create() > > > > > > But there are 15 call sites under fs/nfsd/ for lookup_one_len(). > > > It's a pretty common trope. > > > > > > > > > > I would say that the "bulk" of the work is the interplay between > > > > locking, error checking, and these two functions you mention. > > > > > > You're looking at the details of your particular change, and > > > I'm concerned about how much technical debt is going to > > > continue to accrue in an area shared with legacy protocol > > > implementation. > > > > > > I'm still asking myself if I can live with the extra parameters > > > and virtual function call. Maybe. IMO, there are three ways > > > forward: > > > > > > 1. I can merge your patch and move on. > > > 2. I can merge your patch as it is and follow up with clean-up. > > > 3. I can drop your patch and write it the way I prefer. > > > > > > > > > > > > > - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue: > > > > > > > nfsd_lookup() is being changed such that its semantics are > > > > > > > substantially different for NFSv4 than for others. This is > > > > > > > possibly an indication that nfsd_lookup() should also be > > > > > > > duplicated into the NFSv4-specific code and the generic VFS > > > > > > > version should be left alone. > > > > > > > > > > > > Again, I don't like duplication. In this case, I think the longer term > > > > > > solution is to remove the NFSv4 specific locking differences and solve > > > > > > the problem differently. i.e. don't hold the inode locked, but check > > > > > > for any possible rename after getting a lease. Once that is done, > > > > > > nfsd_lookup() can have saner semantics. > > > > > > > > > > Then, perhaps we should bite that bullet and do that work now. > > > > > > > > While this does have an appeal, it also looks like the start of a rabbit > > > > hole where I find have to fix up a growing collection of problems before > > > > my patch can land. > > > > > > That's kind of the nature of the beast. > > > > > > You are requesting changes that add technical debt with the > > > promise that "sometime in the future" that debt will be > > > erased. Meanwhile, nfsd_lookup() will be made harder to > > > maintain, and it will continue to contain a real bug. > > > > > > I'm saying if there's a real bug here, addressing that should > > > be the priority. > > > > > > > > > > I think balance is needed - certainly asking for some preliminary > > > > cleanup is appropriate. Expecting long standing and subtle issues that > > > > are tangential to the main goal to be resolved first is possibly asking > > > > too much. > > > > > > Fixing the bug seems to me (perhaps naively) to be directly > > > related to the parallelism of directory operations. Why do > > > you think it is tangential? > > > > > > > The bug was exposed by the analysis required for the changes I want, but > > I think that is as close as the connection goes. > > To really see if it is tangential, we would need to come up with a > > solution and see how easily it is ported across my patches. > > > > As I said in a reply to Jeff, I don't think locking of the parent > > directory should be part of the solution. After getting a lease, we > > repeat the lookup and see if dentry has changed. After my patches, that > > would mean calling lookup_one_len_unlocked(). Before my patches, it > > would mean calling fh_unlock() earlier to ensure the directory is no > > longer locked but still calling lookup_one_len_unlocked() > > > > > > > Asking for bugs to be addressed before new features go in > > > seems sensible to me, and is a common practice to enable the > > > resulting fixes to be backported. If you don't want to address > > > the bug you mentioned, then someone else will need to, and > > > that's fine. I think the priority should be bug fixes first. > > > > As a general principle I would agree. > > In this case the bug is minor, long standing, and tangential. > > And the series imposes enough review burden as it is. > > > > But if Jeff can produce a fix, either to be applied before or after, > > then that would be an excellent solution. > > > > Thanks, > > NeilBrown > > > > > > > FWIW, I hit this while running fstests against the server with some > draft changes. This crash is not in an area I touched, so I suspect > something in Neil's locking cleanup: > > [ 434.257242] ------------[ cut here ]------------ > [ 434.257278] kernel BUG at fs/nfsd/xdr4.h:752! > [ 434.257755] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI > [ 434.257937] CPU: 2 PID: 1202 Comm: nfsd Kdump: loaded Tainted: G OE 5.19.0-rc5+ #316 > [ 434.258179] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.0-3.el9 04/01/2014 > [ 434.258371] RIP: 0010:nfsd4_open+0x1940/0x1a30 [nfsd] > [ 434.258615] Code: 40 e8 54 04 e2 e6 48 8b 7c 24 40 41 89 c4 e8 57 7e d4 e6 41 f7 d4 4c 8b 44 24 60 66 44 21 63 44 48 8b 54 24 40 e9 46 fc ff ff <0f> 0b 48 8d bd 88 00 00 00 e8 72 80 d4 e6 4c 8b ad 88 00 00 00 49 > [ 434.259053] RSP: 0018:ffff888134897c10 EFLAGS: 00010246 > [ 434.259211] RAX: 0000000000000000 RBX: ffff8881348791a0 RCX: ffffffffc07fbb44 > [ 434.259408] RDX: 1ffff1102691001a RSI: 0000000000000002 RDI: ffff8881348800d1 > [ 434.259606] RBP: ffff888134880030 R08: 0000000000000000 R09: 0000000000000001 > [ 434.259802] R10: fffffbfff542dfac R11: 0000000000000001 R12: 0000000000000000 > [ 434.260000] R13: ffff888133ab8000 R14: 0000000000000000 R15: ffff8881165db400 > [ 434.260195] FS: 0000000000000000(0000) GS:ffff888225500000(0000) knlGS:0000000000000000 > [ 434.260466] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 434.260697] CR2: 0000558890b5d178 CR3: 0000000107844000 CR4: 00000000003506e0 > [ 434.260949] Call Trace: > [ 434.261106] <TASK> > [ 434.261260] ? nfsd4_decode_notsupp+0x10/0x10 [nfsd] > [ 434.261549] ? nfsd4_interssc_connect+0x8e0/0x8e0 [nfsd] > [ 434.261858] ? preempt_count_sub+0x14/0xc0 > [ 434.262054] ? percpu_counter_add_batch+0x60/0xd0 > [ 434.262261] nfsd4_proc_compound+0x8c6/0xe90 [nfsd] > [ 434.262553] nfsd_dispatch+0x2a9/0x5c0 [nfsd] > [ 434.262836] svc_process_common+0x6ab/0xc30 [sunrpc] > [ 434.263162] ? svc_create_pooled+0x390/0x390 [sunrpc] > [ 434.263484] ? nfsd_svc+0x830/0x830 [nfsd] > [ 434.263755] ? svc_recv+0xabf/0x1310 [sunrpc] > [ 434.264074] svc_process+0x1a3/0x200 [sunrpc] > [ 434.264382] nfsd+0x1d7/0x390 [nfsd] > [ 434.264640] ? nfsd_shutdown_threads+0x200/0x200 [nfsd] > [ 434.264926] kthread+0x167/0x1a0 > [ 434.265101] ? kthread_complete_and_exit+0x20/0x20 > [ 434.265307] ret_from_fork+0x22/0x30 > [ 434.265494] </TASK> > [ 434.265652] Modules linked in: rpcsec_gss_krb5(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) rfkill(E) nf_tables(E) nfnetlink(E) iTCO_wdt(E) intel_rapl_msr(E) intel_pmc_bxt(E) iTCO_vendor_support(E) joydev(E) intel_rapl_common(E) i2c_i801(E) i2c_smbus(E) virtio_balloon(E) lpc_ich(E) nfsd(OE) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) drm(E) fuse(E) sunrpc(E) xfs(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) virtio_blk(E) virtio_console(E) virtio_net(E) net_failover(E) failover(E) qemu_fw_cfg(E) > [ 434.267660] ---[ end trace 0000000000000000 ]--- > [ 434.267870] RIP: 0010:nfsd4_open+0x1940/0x1a30 [nfsd] > [ 434.268161] Code: 40 e8 54 04 e2 e6 48 8b 7c 24 40 41 89 c4 e8 57 7e d4 e6 41 f7 d4 4c 8b 44 24 60 66 44 21 63 44 48 8b 54 24 40 e9 46 fc ff ff <0f> 0b 48 8d bd 88 00 00 00 e8 72 80 d4 e6 4c 8b ad 88 00 00 00 49 > [ 434.268771] RSP: 0018:ffff888134897c10 EFLAGS: 00010246 > [ 434.268995] RAX: 0000000000000000 RBX: ffff8881348791a0 RCX: ffffffffc07fbb44 > [ 434.269247] RDX: 1ffff1102691001a RSI: 0000000000000002 RDI: ffff8881348800d1 > [ 434.269511] RBP: ffff888134880030 R08: 0000000000000000 R09: 0000000000000001 > [ 434.269775] R10: fffffbfff542dfac R11: 0000000000000001 R12: 0000000000000000 > [ 434.270030] R13: ffff888133ab8000 R14: 0000000000000000 R15: ffff8881165db400 > [ 434.270288] FS: 0000000000000000(0000) GS:ffff888225500000(0000) knlGS:0000000000000000 > [ 434.270632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 434.270862] CR2: 0000558890b5d178 CR3: 0000000107844000 CR4: 00000000003506e0 > > faddr2line says: > > $ ./scripts/faddr2line --list fs/nfsd/nfsd.ko nfsd4_open+0x1940 > nfsd4_open+0x1940/0x1a30: > > set_change_info at /home/jlayton/git/linux/fs/nfsd/xdr4.h:752 > 747 #define NFS4_SVC_XDRSIZE sizeof(struct nfsd4_compoundargs) > 748 > 749 static inline void > 750 set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) > 751 { > > 752< BUG_ON(!fhp->fh_pre_saved); > 753 cinfo->atomic = (u32)(fhp->fh_post_saved && !fhp->fh_no_atomic_attr); > 754 > 755 cinfo->before_change = fhp->fh_pre_change; > 756 cinfo->after_change = fhp->fh_post_change; > 757 } > > (inlined by) do_open_lookup at /home/jlayton/git/linux/fs/nfsd/nfs4proc.c:502 > 497 accmode = NFSD_MAY_NOP; > 498 if (open->op_created || > 499 open->op_claim_type == NFS4_OPEN_CLAIM_DELEGATE_CUR) > 500 accmode |= NFSD_MAY_OWNER_OVERRIDE; > 501 status = do_open_permission(rqstp, *resfh, open, accmode); > > 502< set_change_info(&open->op_cinfo, current_fh); > 503 out: > 504 if (status) > 505 inode_unlock(current_fh->fh_dentry->d_inode); > 506 return status; > 507 } > > (inlined by) nfsd4_open at /home/jlayton/git/linux/fs/nfsd/nfs4proc.c:625 > 620 goto out; > 621 > 622 switch (open->op_claim_type) { > 623 case NFS4_OPEN_CLAIM_DELEGATE_CUR: > 624 case NFS4_OPEN_CLAIM_NULL: > > 625< status = do_open_lookup(rqstp, cstate, open, &resfh); > 626 if (status) > 627 goto out; > 628 locked = true; > 629 break; > 630 case NFS4_OPEN_CLAIM_PREVIOUS: > > I haven't yet dug down into the actual cause yet. Maybe the patch below will fix it? I didn't see other cases where we could return nfs_ok w/o setting the pre-op attrs, but there may be some. Neil, feel free to fold this into the appropriate patch in your series if you think it's correct: ----------------------8<-------------------- [PATCH] nfsd: ensure we fill in pre-op-attrs in v4.1 exclusive create Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/nfsd/nfs4proc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 242f059e6788..15a08fdaf8b1 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -374,6 +374,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (d_inode(child)->i_mtime.tv_sec == v_mtime && d_inode(child)->i_atime.tv_sec == v_atime && d_inode(child)->i_size == 0) { + fh_fill_pre_attrs(fhp); open->op_created = true; goto set_attr; /* subtle */ }
> On Jul 14, 2022, at 10:36 PM, NeilBrown <neilb@suse.de> wrote: > > On Thu, 14 Jul 2022, Chuck Lever III wrote: >> >>> On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote: >>> >>> On Wed, 13 Jul 2022, Chuck Lever III wrote: >> >>>>>> Secondarily, the series adds more bells and whistles to the generic >>>>>> NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular: >>>>>> >>>>>> - ("NFSD: change nfsd_create() to unlock directory before returning.") >>>>>> makes some big changes to nfsd_create(). But that helper itself >>>>>> is pretty small. Seems like cleaner code would result if NFSv4 >>>>>> had its own version of nfsd_create() to deal with the post-change >>>>>> cases. >>>>> >>>>> I would not like that approach. Duplicating code is rarely a good idea. >>>> >>>> De-duplicating code /can/ be a good idea, but isn't always a good >>>> idea. If the exceptional cases add a lot of logic, that can make the >>>> de-duplicated code difficult to read and reason about, and it can >>>> make it brittle, just as it does in this case. Modifications on >>>> behalf of NFSv4 in this common piece of code is possibly hazardous >>>> to NFSv3, and navigating around the exception logic makes it >>>> difficult to understand and review. >>> >>> Are we looking at the same code? >>> The sum total of extra code needed for v4 is: >>> - two extra parameters: >>> void (*post_create)(struct svc_fh *fh, void *data), >>> void *data) >>> - two lines of code: >>> if (!err && post_create) >>> post_create(resfhp, data); >>> >>> does that really make anything hard to follow? >> >> The synopsis of nfsd_create() is already pretty cluttered. > > Would it help to collect related details (name, type, attributes, acl, > label) into a struct and pass a reference to that around? > One awkwardness in my latest patch is that the acl and label are not set > in nfsd_create_setattr(). If they were passed around together with the > attributes, that would be a lot easier to achieve. That kind of makes my point: using nfsd_create() for both classic NFSv2/3 and the new NFSv4 hotness overloads the API. But OK, you seem very set on this. So, let's construct a set of parameters for a create operation and give the set a sensible name. The "_args" suffix already has some meaning and precedence in this code. How about "struct nfsd_create_info" ? >> You're adding that extra code in nfsd_symlink() too IIRC? And, >> this change adds a virtual function call (which has significant >> overhead these days) for reasons of convenience rather than >> necessity. > > "significant"? In context of creation a file, I don't think one more > virtual function call is all that significant? If you consider how fast underlying storage has become, and how fast it will become soon (eg, NVRAM-backed filesystems) then the cost of a virtual function call becomes significant. And note there is also control flow integrity. A virtual function can be used to branch into malicious code unless we are careful to lock it down correctly. But these are details, and kind of miss my main point. > I started writing code to avoid the function pointer, but the function > args list exploded. If you could be happy with e.g. a struct > nfs_create_args which contains attrs, acl, label, and was passed into > nfsd_create_setattr(), then I think that would cleanly avoid the desire > for a function pointer. Note that my complaint was also about adding more logic in these functions. It isn't much now, but having an "info struct" means we can pass all kinds of junk into nfsd_create() and add lots more exceptions to the code path. Do you see why I really don't like this approach? -- Chuck Lever