Message ID | 1657656660-16647-3-git-send-email-dai.ngo@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | limit the number of v4 clients to 4096 per 4GB of system memory | expand |
Hello Dai, lovely to see this! > On Jul 12, 2022, at 4:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > Currently there is no limit on how many v4 clients are supported > by the system. This can be a problem in systems with small memory > configuration to function properly when a very large number of > clients exist that creates memory shortage conditions. > > This patch enforces a limit of 4096 NFSv4 clients, including courtesy > clients, per 4GB of system memory. When the number of the clients > reaches the limit, requests that create new clients are returned > with NFS4ERR_DELAY. The laundromat detects this condition and removes > older courtesy clients. Due to the overhead of the upcall to remove > the client record, the maximun number of clients the laundromat > removes on each run is limited to 128. This is done to ensure the > laundromat can still process other tasks in a timely manner. > > Since there is now a limit of the number of clients, the 24-hr > idle time limit of courtesy client is no longer needed and was > removed. > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com> > --- > fs/nfsd/netns.h | 1 + > fs/nfsd/nfs4state.c | 17 +++++++++++++---- > fs/nfsd/nfsctl.c | 8 ++++++++ > 3 files changed, 22 insertions(+), 4 deletions(-) > > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h > index ce864f001a3e..8d72b302a49c 100644 > --- a/fs/nfsd/netns.h > +++ b/fs/nfsd/netns.h > @@ -191,6 +191,7 @@ struct nfsd_net { > siphash_key_t siphash_key; > > atomic_t nfs4_client_count; > + unsigned int nfs4_max_clients; > }; > > /* Simple check to find out if a given net was properly initialized */ > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index 30e16d9e8657..e54db346dc00 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -126,6 +126,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_recall_ops; > static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops; > > static struct workqueue_struct *laundry_wq; > +#define NFSD_CLIENT_MAX_TRIM_PER_RUN 128 Let's move these #defines to a header file instead of scattering them in the source code. How about fs/nfsd/nfsd.h ? > int nfsd4_create_laundry_wq(void) > { > @@ -2059,6 +2060,8 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name, > struct nfs4_client *clp; > int i; > > + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) > + return NULL; So, NFSD will return NFS4ERR_DELAY if it is asked to establish a new client and we've hit this limit. The next laundromat run should knock a few lingering COURTESY clients out of the LRU to make room for a new client. Maybe you want to kick the laundromat here to get that process moving sooner? > clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); > if (clp == NULL) > return NULL; > @@ -5796,9 +5799,12 @@ static void > nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, > struct laundry_time *lt) > { > + unsigned int maxreap = 0, reapcnt = 0; > struct list_head *pos, *next; > struct nfs4_client *clp; > > + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) > + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; The idea I guess is "don't reap anything until we exceed the maximum number of clients". It took me a bit to figure that out. > INIT_LIST_HEAD(reaplist); > spin_lock(&nn->client_lock); > list_for_each_safe(pos, next, &nn->client_lru) { > @@ -5809,14 +5815,17 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, > break; > if (!atomic_read(&clp->cl_rpc_users)) > clp->cl_state = NFSD4_COURTESY; > - if (!client_has_state(clp) || > - ktime_get_boottime_seconds() >= > - (clp->cl_time + NFSD_COURTESY_CLIENT_TIMEOUT)) > + if (!client_has_state(clp)) > goto exp_client; > if (nfs4_anylock_blockers(clp)) { > exp_client: > - if (!mark_client_expired_locked(clp)) > + if (!mark_client_expired_locked(clp)) { > list_add(&clp->cl_lru, reaplist); > + reapcnt++; > + } > + } else { > + if (reapcnt < maxreap) > + goto exp_client; > } > } Would something like this be more straightforward? I probably didn't get the logic exactly right. if (!nfs4_anylock_blockers(clp)) if (reapcnt > maxreap) continue; exp_client: if (!mark_client_expired_locked(clp)) { list_add(&clp->cl_lru, reaplist); reapcnt++; } } The idea is: once maxreap has been reached, continue walking the LRU looking for clients to convert from ACTIVE to COURTESY, but do not reap any more COURTESY clients that might be found. > spin_unlock(&nn->client_lock); > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > index 547f4c4b9668..223659e15af3 100644 > --- a/fs/nfsd/nfsctl.c > +++ b/fs/nfsd/nfsctl.c > @@ -96,6 +96,8 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = { > #endif > }; > > +#define NFS4_MAX_CLIENTS_PER_4GB 4096 No need for "MAX" in this name. And, ditto the above comment: move this to a header file. > + > static ssize_t nfsctl_transaction_write(struct file *file, const char __user *buf, size_t size, loff_t *pos) > { > ino_t ino = file_inode(file)->i_ino; > @@ -1462,6 +1464,8 @@ unsigned int nfsd_net_id; > static __net_init int nfsd_init_net(struct net *net) > { > int retval; > + unsigned long lowmem; > + struct sysinfo si; > struct nfsd_net *nn = net_generic(net, nfsd_net_id); Nit: I prefer the reverse christmas tree style. Can you add the new stack variables after "struct nfsd_net *nn ..." ? > > retval = nfsd_export_init(net); > @@ -1488,6 +1492,10 @@ static __net_init int nfsd_init_net(struct net *net) > seqlock_init(&nn->writeverf_lock); > > atomic_set(&nn->nfs4_client_count, 0); > + si_meminfo(&si); > + lowmem = (si.totalram - si.totalhigh) * si.mem_unit; There's no reason to restrict this to lowmem, since we're not using a struct nfs4_client as the target of I/O. > + nn->nfs4_max_clients = (((lowmem * 100) >> 32) * > + NFS4_MAX_CLIENTS_PER_4GB) / 100; On a platform where "unsigned long" is a 32-bit type, will the shift-right-by-32 continue to work as you expect? Let's try to simplify this computation, because it isn't especially clear what is going on. The math might work a little better if it were "1024 clients per GB" for example. > return 0; > > -- > 2.9.5 > -- Chuck Lever
On 7/12/22 2:54 PM, Chuck Lever III wrote: > Hello Dai, lovely to see this! > > >> On Jul 12, 2022, at 4:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >> >> Currently there is no limit on how many v4 clients are supported >> by the system. This can be a problem in systems with small memory >> configuration to function properly when a very large number of >> clients exist that creates memory shortage conditions. >> >> This patch enforces a limit of 4096 NFSv4 clients, including courtesy >> clients, per 4GB of system memory. When the number of the clients >> reaches the limit, requests that create new clients are returned >> with NFS4ERR_DELAY. The laundromat detects this condition and removes >> older courtesy clients. Due to the overhead of the upcall to remove >> the client record, the maximun number of clients the laundromat >> removes on each run is limited to 128. This is done to ensure the >> laundromat can still process other tasks in a timely manner. >> >> Since there is now a limit of the number of clients, the 24-hr >> idle time limit of courtesy client is no longer needed and was >> removed. >> >> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> >> --- >> fs/nfsd/netns.h | 1 + >> fs/nfsd/nfs4state.c | 17 +++++++++++++---- >> fs/nfsd/nfsctl.c | 8 ++++++++ >> 3 files changed, 22 insertions(+), 4 deletions(-) >> >> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h >> index ce864f001a3e..8d72b302a49c 100644 >> --- a/fs/nfsd/netns.h >> +++ b/fs/nfsd/netns.h >> @@ -191,6 +191,7 @@ struct nfsd_net { >> siphash_key_t siphash_key; >> >> atomic_t nfs4_client_count; >> + unsigned int nfs4_max_clients; >> }; >> >> /* Simple check to find out if a given net was properly initialized */ >> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >> index 30e16d9e8657..e54db346dc00 100644 >> --- a/fs/nfsd/nfs4state.c >> +++ b/fs/nfsd/nfs4state.c >> @@ -126,6 +126,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_recall_ops; >> static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops; >> >> static struct workqueue_struct *laundry_wq; >> +#define NFSD_CLIENT_MAX_TRIM_PER_RUN 128 > Let's move these #defines to a header file instead of scattering > them in the source code. How about fs/nfsd/nfsd.h ? fix in v2. > > >> int nfsd4_create_laundry_wq(void) >> { >> @@ -2059,6 +2060,8 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name, >> struct nfs4_client *clp; >> int i; >> >> + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) >> + return NULL; > So, NFSD will return NFS4ERR_DELAY if it is asked to establish > a new client and we've hit this limit. The next laundromat run > should knock a few lingering COURTESY clients out of the LRU > to make room for a new client. > > Maybe you want to kick the laundromat here to get that process > moving sooner? Yes, good idea. Fix in v2. > > >> clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); >> if (clp == NULL) >> return NULL; >> @@ -5796,9 +5799,12 @@ static void >> nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, >> struct laundry_time *lt) >> { >> + unsigned int maxreap = 0, reapcnt = 0; >> struct list_head *pos, *next; >> struct nfs4_client *clp; >> >> + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) >> + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; > The idea I guess is "don't reap anything until we exceed the > maximum number of clients". It took me a bit to figure that > out. Not sure how to make it more clear, should I add a comment? > > >> INIT_LIST_HEAD(reaplist); >> spin_lock(&nn->client_lock); >> list_for_each_safe(pos, next, &nn->client_lru) { >> @@ -5809,14 +5815,17 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, >> break; >> if (!atomic_read(&clp->cl_rpc_users)) >> clp->cl_state = NFSD4_COURTESY; >> - if (!client_has_state(clp) || >> - ktime_get_boottime_seconds() >= >> - (clp->cl_time + NFSD_COURTESY_CLIENT_TIMEOUT)) >> + if (!client_has_state(clp)) >> goto exp_client; >> if (nfs4_anylock_blockers(clp)) { >> exp_client: >> - if (!mark_client_expired_locked(clp)) >> + if (!mark_client_expired_locked(clp)) { >> list_add(&clp->cl_lru, reaplist); >> + reapcnt++; >> + } >> + } else { >> + if (reapcnt < maxreap) >> + goto exp_client; >> } >> } > Would something like this be more straightforward? I probably > didn't get the logic exactly right. > > if (!nfs4_anylock_blockers(clp)) > if (reapcnt > maxreap) > continue; This would not work. If there is no blocker, the client should become courtesy client if reaping client is not needed. With this logic, when reapcnt == maxreap == 0 (no reap needed) we still reap the client. If we change from (reapcnt > maxreap) to (reapcnt >= maxreap) then it may work. I have to test it out. > exp_client: > if (!mark_client_expired_locked(clp)) { > list_add(&clp->cl_lru, reaplist); > reapcnt++; > } > } > > The idea is: once maxreap has been reached, continue walking the > LRU looking for clients to convert from ACTIVE to COURTESY, but > do not reap any more COURTESY clients that might be found. Right. I'm ok with either logic as long as it works. The clarity of the logic does not seem much different to me. > > >> spin_unlock(&nn->client_lock); >> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c >> index 547f4c4b9668..223659e15af3 100644 >> --- a/fs/nfsd/nfsctl.c >> +++ b/fs/nfsd/nfsctl.c >> @@ -96,6 +96,8 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = { >> #endif >> }; >> >> +#define NFS4_MAX_CLIENTS_PER_4GB 4096 > No need for "MAX" in this name. Fix in v2. > > And, ditto the above comment: move this to a header file. Fix in v2. > > >> + >> static ssize_t nfsctl_transaction_write(struct file *file, const char __user *buf, size_t size, loff_t *pos) >> { >> ino_t ino = file_inode(file)->i_ino; >> @@ -1462,6 +1464,8 @@ unsigned int nfsd_net_id; >> static __net_init int nfsd_init_net(struct net *net) >> { >> int retval; >> + unsigned long lowmem; >> + struct sysinfo si; >> struct nfsd_net *nn = net_generic(net, nfsd_net_id); > Nit: I prefer the reverse christmas tree style. Can you add > the new stack variables after "struct nfsd_net *nn ..." ? FIx in v2. > > >> retval = nfsd_export_init(net); >> @@ -1488,6 +1492,10 @@ static __net_init int nfsd_init_net(struct net *net) >> seqlock_init(&nn->writeverf_lock); >> >> atomic_set(&nn->nfs4_client_count, 0); >> + si_meminfo(&si); >> + lowmem = (si.totalram - si.totalhigh) * si.mem_unit; > There's no reason to restrict this to lowmem, since we're not > using a struct nfs4_client as the target of I/O. From reading the code, my impression is himem is reserved for some specific usages and the actual available memory does not account for himem area. Few examples, eventpoll_init, fanotify_user_setup, etc. These objects are not used for I/O. > > >> + nn->nfs4_max_clients = (((lowmem * 100) >> 32) * >> + NFS4_MAX_CLIENTS_PER_4GB) / 100; > On a platform where "unsigned long" is a 32-bit type, will > the shift-right-by-32 continue to work as you expect? I will try unsigned long long, this would work on 32-bit platform. > > Let's try to simplify this computation, because it isn't > especially clear what is going on. The math might work a > little better if it were "1024 clients per GB" for example. I'm not sure how to make it simpler, open for suggestions. Thanks for your quick review! -Dai > > >> return 0; >> >> -- >> 2.9.5 >> > -- > Chuck Lever > > >
> On Jul 12, 2022, at 6:30 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > On 7/12/22 2:54 PM, Chuck Lever III wrote: >> Hello Dai, lovely to see this! >> >> >>> On Jul 12, 2022, at 4:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> Currently there is no limit on how many v4 clients are supported >>> by the system. This can be a problem in systems with small memory >>> configuration to function properly when a very large number of >>> clients exist that creates memory shortage conditions. >>> >>> This patch enforces a limit of 4096 NFSv4 clients, including courtesy >>> clients, per 4GB of system memory. When the number of the clients >>> reaches the limit, requests that create new clients are returned >>> with NFS4ERR_DELAY. The laundromat detects this condition and removes >>> older courtesy clients. Due to the overhead of the upcall to remove >>> the client record, the maximun number of clients the laundromat >>> removes on each run is limited to 128. This is done to ensure the >>> laundromat can still process other tasks in a timely manner. >>> >>> Since there is now a limit of the number of clients, the 24-hr >>> idle time limit of courtesy client is no longer needed and was >>> removed. >>> >>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> >>> --- >>> fs/nfsd/netns.h | 1 + >>> fs/nfsd/nfs4state.c | 17 +++++++++++++---- >>> fs/nfsd/nfsctl.c | 8 ++++++++ >>> 3 files changed, 22 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h >>> index ce864f001a3e..8d72b302a49c 100644 >>> --- a/fs/nfsd/netns.h >>> +++ b/fs/nfsd/netns.h >>> @@ -191,6 +191,7 @@ struct nfsd_net { >>> siphash_key_t siphash_key; >>> >>> atomic_t nfs4_client_count; >>> + unsigned int nfs4_max_clients; I missed this before. atomic_read(&nn->nfs4_client_count) returns an int. So nfs4_max_clients will need to be declared as an int as well to prevent mixed sign comparisons between the two. >>> }; >>> >>> /* Simple check to find out if a given net was properly initialized */ >>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>> index 30e16d9e8657..e54db346dc00 100644 >>> --- a/fs/nfsd/nfs4state.c >>> +++ b/fs/nfsd/nfs4state.c >>> @@ -126,6 +126,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_recall_ops; >>> static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops; >>> >>> static struct workqueue_struct *laundry_wq; >>> +#define NFSD_CLIENT_MAX_TRIM_PER_RUN 128 >> Let's move these #defines to a header file instead of scattering >> them in the source code. How about fs/nfsd/nfsd.h ? > > fix in v2. > >> >> >>> int nfsd4_create_laundry_wq(void) >>> { >>> @@ -2059,6 +2060,8 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name, >>> struct nfs4_client *clp; >>> int i; >>> >>> + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) >>> + return NULL; >> So, NFSD will return NFS4ERR_DELAY if it is asked to establish >> a new client and we've hit this limit. The next laundromat run >> should knock a few lingering COURTESY clients out of the LRU >> to make room for a new client. >> >> Maybe you want to kick the laundromat here to get that process >> moving sooner? > > Yes, good idea. Fix in v2. > >> >> >>> clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); >>> if (clp == NULL) >>> return NULL; >>> @@ -5796,9 +5799,12 @@ static void >>> nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, >>> struct laundry_time *lt) >>> { >>> + unsigned int maxreap = 0, reapcnt = 0; >>> struct list_head *pos, *next; >>> struct nfs4_client *clp; >>> >>> + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) >>> + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; >> The idea I guess is "don't reap anything until we exceed the >> maximum number of clients". It took me a bit to figure that >> out. > > Not sure how to make it more clear, should I add a comment? I think I would prefer that you don't use a variable initializer here. That splits the initial value between two separate code paragraphs, which takes more time to read and understand. The following is only a suggestion: maxreap = (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) ? NFSD_CLIENT_MAX_TRIM_PER_RUN ? 0; It's just a nit, though. Optional to fix. >>> INIT_LIST_HEAD(reaplist); >>> spin_lock(&nn->client_lock); >>> list_for_each_safe(pos, next, &nn->client_lru) { >>> @@ -5809,14 +5815,17 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, >>> break; >>> if (!atomic_read(&clp->cl_rpc_users)) >>> clp->cl_state = NFSD4_COURTESY; >>> - if (!client_has_state(clp) || >>> - ktime_get_boottime_seconds() >= >>> - (clp->cl_time + NFSD_COURTESY_CLIENT_TIMEOUT)) >>> + if (!client_has_state(clp)) >>> goto exp_client; >>> if (nfs4_anylock_blockers(clp)) { >>> exp_client: >>> - if (!mark_client_expired_locked(clp)) >>> + if (!mark_client_expired_locked(clp)) { >>> list_add(&clp->cl_lru, reaplist); >>> + reapcnt++; >>> + } >>> + } else { >>> + if (reapcnt < maxreap) >>> + goto exp_client; >>> } >>> } >> Would something like this be more straightforward? I probably >> didn't get the logic exactly right. >> >> if (!nfs4_anylock_blockers(clp)) >> if (reapcnt > maxreap) >> continue; > > This would not work. If there is no blocker, the client should become > courtesy client if reaping client is not needed. With this logic, when > reapcnt == maxreap == 0 (no reap needed) we still reap the client. If > we change from (reapcnt > maxreap) to (reapcnt >= maxreap) then it > may work. I have to test it out. > >> exp_client: >> if (!mark_client_expired_locked(clp)) { >> list_add(&clp->cl_lru, reaplist); >> reapcnt++; >> } >> } >> >> The idea is: once maxreap has been reached, continue walking the >> LRU looking for clients to convert from ACTIVE to COURTESY, but >> do not reap any more COURTESY clients that might be found. > > Right. I'm ok with either logic as long as it works. The clarity of > the logic does not seem much different to me. The style I proposed above is more readable to me. Please go with that. >>> spin_unlock(&nn->client_lock); >>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c >>> index 547f4c4b9668..223659e15af3 100644 >>> --- a/fs/nfsd/nfsctl.c >>> +++ b/fs/nfsd/nfsctl.c >>> @@ -96,6 +96,8 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = { >>> #endif >>> }; >>> >>> +#define NFS4_MAX_CLIENTS_PER_4GB 4096 >> No need for "MAX" in this name. > > Fix in v2. > >> >> And, ditto the above comment: move this to a header file. > > Fix in v2. > >> >> >>> + >>> static ssize_t nfsctl_transaction_write(struct file *file, const char __user *buf, size_t size, loff_t *pos) >>> { >>> ino_t ino = file_inode(file)->i_ino; >>> @@ -1462,6 +1464,8 @@ unsigned int nfsd_net_id; >>> static __net_init int nfsd_init_net(struct net *net) >>> { >>> int retval; >>> + unsigned long lowmem; >>> + struct sysinfo si; >>> struct nfsd_net *nn = net_generic(net, nfsd_net_id); >> Nit: I prefer the reverse christmas tree style. Can you add >> the new stack variables after "struct nfsd_net *nn ..." ? > > FIx in v2. > >> >> >>> retval = nfsd_export_init(net); >>> @@ -1488,6 +1492,10 @@ static __net_init int nfsd_init_net(struct net *net) >>> seqlock_init(&nn->writeverf_lock); >>> >>> atomic_set(&nn->nfs4_client_count, 0); >>> + si_meminfo(&si); >>> + lowmem = (si.totalram - si.totalhigh) * si.mem_unit; >> There's no reason to restrict this to lowmem, since we're not >> using a struct nfs4_client as the target of I/O. > > From reading the code, my impression is himem is reserved for some > specific usages and the actual available memory does not account > for himem area. Few examples, eventpoll_init, fanotify_user_setup, > etc. These objects are not used for I/O. There are only a few users of si.totalram that subtract totalhigh, and those that do need to strictly avoid the use of high memory, including the ones you mention here. I'm looking at ttm_global_init(). In there, both the total amount of memory is computed, and so is DMA32 memory. The comments explain the difference. Unless I've misunderstood, it isn't necessary to subtract si.totalhigh for our purpose, so please remove that. >>> + nn->nfs4_max_clients = (((lowmem * 100) >> 32) * >>> + NFS4_MAX_CLIENTS_PER_4GB) / 100; >> On a platform where "unsigned long" is a 32-bit type, will >> the shift-right-by-32 continue to work as you expect? > > I will try unsigned long long, this would work on 32-bit platform. Actually, using a type that explicitly specifies its bit-width is even better. >> Let's try to simplify this computation, because it isn't >> especially clear what is going on. The math might work a >> little better if it were "1024 clients per GB" for example. > > I'm not sure how to make it simpler, open for suggestions. How about: u64 max_clients; max_clients = (u64)si.totalram * si.mem_unit / (1024 * 1024 * 1024); max_clients *= NFS4_CLIENTS_PER_GB; nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); > Thanks for your quick review! > > -Dai > >> >> >>> return 0; >>> >>> -- >>> 2.9.5 >>> >> -- >> Chuck Lever -- Chuck Lever
Hi Dai, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v5.19-rc6 next-20220714] [cannot apply to cel-2.6/for-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Dai-Ngo/limit-the-number-of-v4-clients-to-4096-per-4GB-of-system-memory/20220713-041137 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 72a8e05d4f66b5af7854df4490e3135168694b6b config: i386-randconfig-a015 (https://download.01.org/0day-ci/archive/20220715/202207150938.TXWZZpEX-lkp@intel.com/config) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5e61b9c556267086ef9b743a0b57df302eef831b) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/f4722e6591821876532c91087a3ac0e103e8d5d7 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Dai-Ngo/limit-the-number-of-v4-clients-to-4096-per-4GB-of-system-memory/20220713-041137 git checkout f4722e6591821876532c91087a3ac0e103e8d5d7 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash fs/nfsd/ If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): >> fs/nfsd/nfsctl.c:1497:42: warning: shift count >= width of type [-Wshift-count-overflow] nn->nfs4_max_clients = (((lowmem * 100) >> 32) * ^ ~~ 1 warning generated. vim +1497 fs/nfsd/nfsctl.c 1463 1464 static __net_init int nfsd_init_net(struct net *net) 1465 { 1466 int retval; 1467 unsigned long lowmem; 1468 struct sysinfo si; 1469 struct nfsd_net *nn = net_generic(net, nfsd_net_id); 1470 1471 retval = nfsd_export_init(net); 1472 if (retval) 1473 goto out_export_error; 1474 retval = nfsd_idmap_init(net); 1475 if (retval) 1476 goto out_idmap_error; 1477 nn->nfsd_versions = NULL; 1478 nn->nfsd4_minorversions = NULL; 1479 retval = nfsd_reply_cache_init(nn); 1480 if (retval) 1481 goto out_drc_error; 1482 nn->nfsd4_lease = 90; /* default lease time */ 1483 nn->nfsd4_grace = 90; 1484 nn->somebody_reclaimed = false; 1485 nn->track_reclaim_completes = false; 1486 nn->clverifier_counter = prandom_u32(); 1487 nn->clientid_base = prandom_u32(); 1488 nn->clientid_counter = nn->clientid_base + 1; 1489 nn->s2s_cp_cl_id = nn->clientid_counter++; 1490 1491 get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); 1492 seqlock_init(&nn->writeverf_lock); 1493 1494 atomic_set(&nn->nfs4_client_count, 0); 1495 si_meminfo(&si); 1496 lowmem = (si.totalram - si.totalhigh) * si.mem_unit; > 1497 nn->nfs4_max_clients = (((lowmem * 100) >> 32) * 1498 NFS4_MAX_CLIENTS_PER_4GB) / 100; 1499 1500 return 0; 1501 1502 out_drc_error: 1503 nfsd_idmap_shutdown(net); 1504 out_idmap_error: 1505 nfsd_export_shutdown(net); 1506 out_export_error: 1507 return retval; 1508 } 1509
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index ce864f001a3e..8d72b302a49c 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -191,6 +191,7 @@ struct nfsd_net { siphash_key_t siphash_key; atomic_t nfs4_client_count; + unsigned int nfs4_max_clients; }; /* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 30e16d9e8657..e54db346dc00 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -126,6 +126,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_recall_ops; static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops; static struct workqueue_struct *laundry_wq; +#define NFSD_CLIENT_MAX_TRIM_PER_RUN 128 int nfsd4_create_laundry_wq(void) { @@ -2059,6 +2060,8 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name, struct nfs4_client *clp; int i; + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) + return NULL; clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); if (clp == NULL) return NULL; @@ -5796,9 +5799,12 @@ static void nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, struct laundry_time *lt) { + unsigned int maxreap = 0, reapcnt = 0; struct list_head *pos, *next; struct nfs4_client *clp; + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; INIT_LIST_HEAD(reaplist); spin_lock(&nn->client_lock); list_for_each_safe(pos, next, &nn->client_lru) { @@ -5809,14 +5815,17 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, break; if (!atomic_read(&clp->cl_rpc_users)) clp->cl_state = NFSD4_COURTESY; - if (!client_has_state(clp) || - ktime_get_boottime_seconds() >= - (clp->cl_time + NFSD_COURTESY_CLIENT_TIMEOUT)) + if (!client_has_state(clp)) goto exp_client; if (nfs4_anylock_blockers(clp)) { exp_client: - if (!mark_client_expired_locked(clp)) + if (!mark_client_expired_locked(clp)) { list_add(&clp->cl_lru, reaplist); + reapcnt++; + } + } else { + if (reapcnt < maxreap) + goto exp_client; } } spin_unlock(&nn->client_lock); diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 547f4c4b9668..223659e15af3 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -96,6 +96,8 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = { #endif }; +#define NFS4_MAX_CLIENTS_PER_4GB 4096 + static ssize_t nfsctl_transaction_write(struct file *file, const char __user *buf, size_t size, loff_t *pos) { ino_t ino = file_inode(file)->i_ino; @@ -1462,6 +1464,8 @@ unsigned int nfsd_net_id; static __net_init int nfsd_init_net(struct net *net) { int retval; + unsigned long lowmem; + struct sysinfo si; struct nfsd_net *nn = net_generic(net, nfsd_net_id); retval = nfsd_export_init(net); @@ -1488,6 +1492,10 @@ static __net_init int nfsd_init_net(struct net *net) seqlock_init(&nn->writeverf_lock); atomic_set(&nn->nfs4_client_count, 0); + si_meminfo(&si); + lowmem = (si.totalram - si.totalhigh) * si.mem_unit; + nn->nfs4_max_clients = (((lowmem * 100) >> 32) * + NFS4_MAX_CLIENTS_PER_4GB) / 100; return 0;
Currently there is no limit on how many v4 clients are supported by the system. This can be a problem in systems with small memory configuration to function properly when a very large number of clients exist that creates memory shortage conditions. This patch enforces a limit of 4096 NFSv4 clients, including courtesy clients, per 4GB of system memory. When the number of the clients reaches the limit, requests that create new clients are returned with NFS4ERR_DELAY. The laundromat detects this condition and removes older courtesy clients. Due to the overhead of the upcall to remove the client record, the maximun number of clients the laundromat removes on each run is limited to 128. This is done to ensure the laundromat can still process other tasks in a timely manner. Since there is now a limit of the number of clients, the 24-hr idle time limit of courtesy client is no longer needed and was removed. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> --- fs/nfsd/netns.h | 1 + fs/nfsd/nfs4state.c | 17 +++++++++++++---- fs/nfsd/nfsctl.c | 8 ++++++++ 3 files changed, 22 insertions(+), 4 deletions(-)