Message ID | 172972144286.81717.3023946721770566532@noble.neil.brown.name (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] nfsd: Don't fail OP_SETCLIENTID when there are too many clients. | expand |
On Thu, Oct 24, 2024 at 09:10:42AM +1100, NeilBrown wrote: > > Failing OP_SETCLIENTID or OP_EXCHANGE_ID should only happen if there is > memory allocation failure. Putting a hard limit on the number of > clients is really helpful as it will either happen too early and prevent Do you mean "is not really helpful" ? I recall that Dai had suggested something similar in this area, while he was working on the courteous server patches. Dai, do you have any comments about this change? If not, can you send a R-b ? > clients that the server can easily handle, or too late and allow clients > when the server is swamped. > > The calculated limit is still useful for expiring courtesy clients where > there are "too many" clients, but it shouldn't prevent the creation of > active clients. > > Testing of lots of clients against small-mem servers reports repeated > NFS4ERR_DELAY responses which doesn't seem helpful. There may have been > reports of similar problems in production use. > > Also remove an outdated comment - we do use a slab cache. > > Signed-off-by: NeilBrown <neilb@suse.de> > --- > fs/nfsd/nfs4state.c | 11 +++-------- > 1 file changed, 3 insertions(+), 8 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index d585c267731b..0791a43b19e6 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -2218,21 +2218,16 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn) > return 1; > } > > -/* > - * XXX Should we use a slab cache ? > - * This type of memory management is somewhat inefficient, but we use it > - * anyway since SETCLIENTID is not a common operation. > - */ > static struct nfs4_client *alloc_client(struct xdr_netobj name, > struct nfsd_net *nn) > { > struct nfs4_client *clp; > int i; > > - if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) { > + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients && > + atomic_read(&nn->nfsd_courtesy_clients) > 0) > mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); > - return NULL; > - } > + > clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); > if (clp == NULL) > return NULL; > -- > 2.46.0 >
On Thu, 2024-10-24 at 09:10 +1100, NeilBrown wrote: > Failing OP_SETCLIENTID or OP_EXCHANGE_ID should only happen if there is > memory allocation failure. Putting a hard limit on the number of > clients is really helpful as it will either happen too early and prevent "unhelpful" ? > clients that the server can easily handle, or too late and allow clients > when the server is swamped. > > The calculated limit is still useful for expiring courtesy clients where > there are "too many" clients, but it shouldn't prevent the creation of > active clients. > > Testing of lots of clients against small-mem servers reports repeated > NFS4ERR_DELAY responses which doesn't seem helpful. There may have been > reports of similar problems in production use. > > Also remove an outdated comment - we do use a slab cache. > > Signed-off-by: NeilBrown <neilb@suse.de> > --- > fs/nfsd/nfs4state.c | 11 +++-------- > 1 file changed, 3 insertions(+), 8 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index d585c267731b..0791a43b19e6 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -2218,21 +2218,16 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn) > return 1; > } > > -/* > - * XXX Should we use a slab cache ? > - * This type of memory management is somewhat inefficient, but we use it > - * anyway since SETCLIENTID is not a common operation. > - */ > static struct nfs4_client *alloc_client(struct xdr_netobj name, > struct nfsd_net *nn) > { > struct nfs4_client *clp; > int i; > > - if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) { > + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients && > + atomic_read(&nn->nfsd_courtesy_clients) > 0) > mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); > - return NULL; > - } > + > clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); > if (clp == NULL) > return NULL; Do we even need to check nn->nfs4_max_clients at all? Maybe we should just kick the laundromat whenever nfsd_courtesy_clients > 0. I would suggest just removing nfs4_max_clients altogether, but it looks like nfs4_get_client_reaplist() uses it and it's not clear to me what would better replace it.
From: Chuck Lever <chuck.lever@oracle.com> On Thu, 24 Oct 2024 09:10:42 +1100, NeilBrown wrote: > Failing OP_SETCLIENTID or OP_EXCHANGE_ID should only happen if there is > memory allocation failure. Putting a hard limit on the number of > clients is really helpful as it will either happen too early and prevent > clients that the server can easily handle, or too late and allow clients > when the server is swamped. > > The calculated limit is still useful for expiring courtesy clients where > there are "too many" clients, but it shouldn't prevent the creation of > active clients. > > [...] Applied to nfsd-next for v6.13, thanks! [1/1] nfsd: Don't fail OP_SETCLIENTID when there are too many clients. commit: c7b8826b41906db1c930cbb10abb94eb24247f20 -- Chuck Lever
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d585c267731b..0791a43b19e6 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2218,21 +2218,16 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn) return 1; } -/* - * XXX Should we use a slab cache ? - * This type of memory management is somewhat inefficient, but we use it - * anyway since SETCLIENTID is not a common operation. - */ static struct nfs4_client *alloc_client(struct xdr_netobj name, struct nfsd_net *nn) { struct nfs4_client *clp; int i; - if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) { + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients && + atomic_read(&nn->nfsd_courtesy_clients) > 0) mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); - return NULL; - } + clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); if (clp == NULL) return NULL;
Failing OP_SETCLIENTID or OP_EXCHANGE_ID should only happen if there is memory allocation failure. Putting a hard limit on the number of clients is really helpful as it will either happen too early and prevent clients that the server can easily handle, or too late and allow clients when the server is swamped. The calculated limit is still useful for expiring courtesy clients where there are "too many" clients, but it shouldn't prevent the creation of active clients. Testing of lots of clients against small-mem servers reports repeated NFS4ERR_DELAY responses which doesn't seem helpful. There may have been reports of similar problems in production use. Also remove an outdated comment - we do use a slab cache. Signed-off-by: NeilBrown <neilb@suse.de> --- fs/nfsd/nfs4state.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)