Message ID | 20220612072253.66354-1-wangyugui@e16-tech.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RPC] nfsd: NFSv4 close a file completely | expand |
> On Jun 12, 2022, at 3:22 AM, Wang Yugui <wangyugui@e16-tech.com> wrote: > > NFSv4 need to close a file completely (no lingering open) when it does > a CLOSE or DELEGRETURN. > > When multiple NFSv4/OPEN from different clients, we need to check the > reference count. The flowing reference-count-check change the behavior > of NFSv3 nfsd_rename()/nfsd_unlink() too. > > Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=387 > Signed-off-by: Wang Yugui <wangyugui@e16-tech.com> > --- > TO-CHECK: > 1) NFSv3 nfsd_rename()/nfsd_unlink() feature change is OK? > 2) Can we do better performance than nfsd_file_close_inode_sync()? > 3) nfsd_file_close_inode_sync()->nfsd_file_close_inode() in nfsd4_delegreturn() > => 'Text file busy' about 4s > 4) reference-count-check : refcount_read(&nf->nf_ref) <= 1 or ==0? > nfsd_file_alloc() refcount_set(&nf->nf_ref, 1); > > fs/nfsd/filecache.c | 2 +- > fs/nfsd/nfs4state.c | 4 ++++ > 2 files changed, 5 insertions(+), 1 deletion(-) I suppose I owe you (and Frank) a progress report on #386. I've fixed the LRU algorithm and added some observability features to measure how the fix impacts the cache's efficiency for NFSv3 workloads. These new features show that the hit rate and average age of cache items goes down after the fix is applied. I'm trying to understand if I've done something wrong or if the fix is supposed to do that. To handle the case of hundreds of thousands of open files more efficiently, I'd like to convert the filecache to use rhashtable. > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > index 9cb2d590c036..8890a8fa7402 100644 > --- a/fs/nfsd/filecache.c > +++ b/fs/nfsd/filecache.c > @@ -512,7 +512,7 @@ __nfsd_file_close_inode(struct inode *inode, unsigned int hashval, > > spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); > hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) { > - if (inode == nf->nf_inode) > + if (inode == nf->nf_inode && refcount_read(&nf->nf_ref) <= 1) > nfsd_file_unhash_and_release_locked(nf, dispose); > } > spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index 9409a0dc1b76..be4b480f5914 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -6673,6 +6673,8 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > /* put reference from nfs4_preprocess_seqid_op */ > nfs4_put_stid(&stp->st_stid); > + > + nfsd_file_close_inode_sync(d_inode(cstate->current_fh.fh_dentry)); IIUC this closes /all/ nfsd_file objects that refer to the same inode. CLOSE is supposed to release only the state held by a particular user on a particular client. This is like closing a file descriptor; you release the resources associated with that file descriptor, and other users of the inode are not supposed to be affected. Thus using an inode-based API in nfsd4_close/delegreturn seems like the wrong approach. > out: > return status; > } > @@ -6702,6 +6704,8 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > destroy_delegation(dp); > put_stateid: > nfs4_put_stid(&dp->dl_stid); > + > + nfsd_file_close_inode_sync(d_inode(cstate->current_fh.fh_dentry)); > out: > return status; > } > -- > 2.36.1 > -- Chuck Lever
Hi, > > On Jun 12, 2022, at 3:22 AM, Wang Yugui <wangyugui@e16-tech.com> wrote: > > > > NFSv4 need to close a file completely (no lingering open) when it does > > a CLOSE or DELEGRETURN. > > > > When multiple NFSv4/OPEN from different clients, we need to check the > > reference count. The flowing reference-count-check change the behavior > > of NFSv3 nfsd_rename()/nfsd_unlink() too. > > > > Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=387 > > Signed-off-by: Wang Yugui <wangyugui@e16-tech.com> > > --- > > TO-CHECK: > > 1) NFSv3 nfsd_rename()/nfsd_unlink() feature change is OK? > > 2) Can we do better performance than nfsd_file_close_inode_sync()? > > 3) nfsd_file_close_inode_sync()->nfsd_file_close_inode() in nfsd4_delegreturn() > > => 'Text file busy' about 4s > > 4) reference-count-check : refcount_read(&nf->nf_ref) <= 1 or ==0? > > nfsd_file_alloc() refcount_set(&nf->nf_ref, 1); > > > > fs/nfsd/filecache.c | 2 +- > > fs/nfsd/nfs4state.c | 4 ++++ > > 2 files changed, 5 insertions(+), 1 deletion(-) > > I suppose I owe you (and Frank) a progress report on #386. I've fixed > the LRU algorithm and added some observability features to measure > how the fix impacts the cache's efficiency for NFSv3 workloads. > > These new features show that the hit rate and average age of cache > items goes down after the fix is applied. I'm trying to understand > if I've done something wrong or if the fix is supposed to do that. > > To handle the case of hundreds of thousands of open files more > efficiently, I'd like to convert the filecache to use rhashtable. A question about the comming rhashtable. Now multiple nfsd export share a cache pool. In the coming rhashtable, a nfsd export could use a private cache pool to improve scale out? Best Regards Wang Yugui (wangyugui@e16-tech.com) 2022/06/15
> On Jun 15, 2022, at 11:28 AM, Wang Yugui <wangyugui@e16-tech.com> wrote: > > Hi, > >>> On Jun 12, 2022, at 3:22 AM, Wang Yugui <wangyugui@e16-tech.com> wrote: >>> >>> NFSv4 need to close a file completely (no lingering open) when it does >>> a CLOSE or DELEGRETURN. >>> >>> When multiple NFSv4/OPEN from different clients, we need to check the >>> reference count. The flowing reference-count-check change the behavior >>> of NFSv3 nfsd_rename()/nfsd_unlink() too. >>> >>> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=387 >>> Signed-off-by: Wang Yugui <wangyugui@e16-tech.com> >>> --- >>> TO-CHECK: >>> 1) NFSv3 nfsd_rename()/nfsd_unlink() feature change is OK? >>> 2) Can we do better performance than nfsd_file_close_inode_sync()? >>> 3) nfsd_file_close_inode_sync()->nfsd_file_close_inode() in nfsd4_delegreturn() >>> => 'Text file busy' about 4s >>> 4) reference-count-check : refcount_read(&nf->nf_ref) <= 1 or ==0? >>> nfsd_file_alloc() refcount_set(&nf->nf_ref, 1); >>> >>> fs/nfsd/filecache.c | 2 +- >>> fs/nfsd/nfs4state.c | 4 ++++ >>> 2 files changed, 5 insertions(+), 1 deletion(-) >> >> I suppose I owe you (and Frank) a progress report on #386. I've fixed >> the LRU algorithm and added some observability features to measure >> how the fix impacts the cache's efficiency for NFSv3 workloads. >> >> These new features show that the hit rate and average age of cache >> items goes down after the fix is applied. I'm trying to understand >> if I've done something wrong or if the fix is supposed to do that. >> >> To handle the case of hundreds of thousands of open files more >> efficiently, I'd like to convert the filecache to use rhashtable. > > A question about the comming rhashtable. > > Now multiple nfsd export share a cache pool. > > In the coming rhashtable, a nfsd export could use a private cache pool > to improve scale out? That seems like a premature optimization. We don't know that the hashtable, under normal (ie, non-generic/531) workloads, is a scaling problem. However, I am considering (in the future) creating separate filecaches for each nfsd_net. -- Chuck Lever
> On Jun 15, 2022, at 11:39 AM, Chuck Lever III <chuck.lever@oracle.com> wrote: > >> On Jun 15, 2022, at 11:28 AM, Wang Yugui <wangyugui@e16-tech.com> wrote: >> >> A question about the coming rhashtable. >> >> Now multiple nfsd export share a cache pool. >> >> In the coming rhashtable, a nfsd export could use a private cache pool >> to improve scale out? > > That seems like a premature optimization. We don't know that the hashtable, > under normal (ie, non-generic/531) workloads, is a scaling problem. > > However, I am considering (in the future) creating separate filecaches > for each nfsd_net. So I'm not rejecting your idea outright. To expand on this a little: Just a single rhashtable will enable better scaling, and so will fixing the LRU issues, and those are both in plan for my current set of fixes. It's not clear to me that pathological workloads like generic/531 on NFSv4 are common, so it's quite possible that just these two changes will be enough for realistic workloads for the time being. My near term goal for generic/531 is to prevent it from crashing NFSD. Hopefully we can look at ways to enable that test to pass more often, and fail gracefully when it doesn't pass. The issue is how the server behaves when it can't open more files, which is somewhat separate from the data structure efficiency issues you and Frank pointed out. I'd like to get the above fixes ready for merge by the next window. So I'm trying to narrow the changes in this set of fixes to make sure they will be solid in a couple of months. It will be a heavier lift to go from just one to two filecaches per server. After that, it will likely be easier to go from two filecaches to multiple filecaches, but I feel that's down the road. In the medium term, supporting separate filecaches for NFSv3 and NFSv4 files is worth considering. NFSv3 nfsd_file items need to be managed automatically and can be subject to a shrinker since there's no client impact on releasing a cached filecache item. NFSv4 nfsd_file items manage themselves (via OPEN/CLOSE) so an LRU isn't really needed there (and isn't terribly effective anyway). A shrinker can't easily release NFSv4 nfsd_file items without the server losing state, and clients have to recover in that case. And, it turns out that the current filecache capacity-limiting mechanism forces NFSv3 items out of the filecache in favor of NFSv4 items when the cache has more that NFSD_FILE_LRU_LIMIT items in it. IMO that's obviously undesirable behavior for common mixed-version workloads. -- Chuck Lever
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 9cb2d590c036..8890a8fa7402 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -512,7 +512,7 @@ __nfsd_file_close_inode(struct inode *inode, unsigned int hashval, spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) { - if (inode == nf->nf_inode) + if (inode == nf->nf_inode && refcount_read(&nf->nf_ref) <= 1) nfsd_file_unhash_and_release_locked(nf, dispose); } spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9409a0dc1b76..be4b480f5914 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6673,6 +6673,8 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, /* put reference from nfs4_preprocess_seqid_op */ nfs4_put_stid(&stp->st_stid); + + nfsd_file_close_inode_sync(d_inode(cstate->current_fh.fh_dentry)); out: return status; } @@ -6702,6 +6704,8 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, destroy_delegation(dp); put_stateid: nfs4_put_stid(&dp->dl_stid); + + nfsd_file_close_inode_sync(d_inode(cstate->current_fh.fh_dentry)); out: return status; }
NFSv4 need to close a file completely (no lingering open) when it does a CLOSE or DELEGRETURN. When multiple NFSv4/OPEN from different clients, we need to check the reference count. The flowing reference-count-check change the behavior of NFSv3 nfsd_rename()/nfsd_unlink() too. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=387 Signed-off-by: Wang Yugui <wangyugui@e16-tech.com> --- TO-CHECK: 1) NFSv3 nfsd_rename()/nfsd_unlink() feature change is OK? 2) Can we do better performance than nfsd_file_close_inode_sync()? 3) nfsd_file_close_inode_sync()->nfsd_file_close_inode() in nfsd4_delegreturn() => 'Text file busy' about 4s 4) reference-count-check : refcount_read(&nf->nf_ref) <= 1 or ==0? nfsd_file_alloc() refcount_set(&nf->nf_ref, 1); fs/nfsd/filecache.c | 2 +- fs/nfsd/nfs4state.c | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-)