Message ID | 20220531103427.47769-1-wangyugui@e16-tech.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] nfsd: serialize filecache garbage collector | expand |
> On May 31, 2022, at 6:34 AM, Wang Yugui <wangyugui@e16-tech.com> wrote: > > When many(>NFSD_FILE_LRU_THRESHOLD) files are kept as OPEN, such as > xfstests generic/531, nfsd proceses are in CPU high-load state, > and nfsd_file_gc(nfsd filecache garbage collector) waste many CPU times. Over the past few days, I've been able to reproduce a lot of bad behavior with generic/531. My test client has 12 physical CPU cores, and my lab network is 56Gb InfiniBand. Unfortunately this patch doesn't really begin to address it. For example, with this patch applied, CPU idle is in single digits on the NFS server that exports the test's scratch device, and that server can still get into a soft lock-up. IMO that is because this change works around the underlying problem but makes no attempt to root-cause or address that issue. I agree that the NFS server's behavior needs attention, but I'm not inclined to apply this particular patch as it is. > concurrency nfsd_file_gc() is almost meaningless, so serialize it. > > Signed-off-by: Wang Yugui <wangyugui@e16-tech.com> > --- > Changes since v1: > - add static to 'atomic_t nfsd_file_gc_running'. > thanks for kernel test robot <lkp@intel.com> > > fs/nfsd/filecache.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > index f172412447f5..28a8f8d6d235 100644 > --- a/fs/nfsd/filecache.c > +++ b/fs/nfsd/filecache.c > @@ -471,10 +471,15 @@ nfsd_file_lru_walk_list(struct shrink_control *sc) > return ret; > } > > +/* concurrency nfsd_file_gc() is almost meaningless, so serialize it. */ > +static atomic_t nfsd_file_gc_running = ATOMIC_INIT(0); > static void > nfsd_file_gc(void) > { > - nfsd_file_lru_walk_list(NULL); > + if(atomic_cmpxchg(&nfsd_file_gc_running, 0, 1) == 0) { > + nfsd_file_lru_walk_list(NULL); > + atomic_set(&nfsd_file_gc_running, 0); > + } > } > > static void > -- > 2.36.1 > -- Chuck Lever
Hi, > > On May 31, 2022, at 6:34 AM, Wang Yugui <wangyugui@e16-tech.com> wrote: > > > > When many(>NFSD_FILE_LRU_THRESHOLD) files are kept as OPEN, such as > > xfstests generic/531, nfsd proceses are in CPU high-load state, > > and nfsd_file_gc(nfsd filecache garbage collector) waste many CPU times. > > Over the past few days, I've been able to reproduce a lot of bad > behavior with generic/531. My test client has 12 physical CPU > cores, and my lab network is 56Gb InfiniBand. > > Unfortunately this patch doesn't really begin to address it. For > example, with this patch applied, CPU idle is in single digits > on the NFS server that exports the test's scratch device, and > that server can still get into a soft lock-up. IMO that is > because this change works around the underlying problem but > makes no attempt to root-cause or address that issue. > > I agree that the NFS server's behavior needs attention, but I'm > not inclined to apply this particular patch as it is. Yes. this patch is just particular for xfstests generic/531. In xfstests generic/531, when many(>500K ) files are kept as OPEN, a file delete will cause LRU walk( CPU soft look-up) too. big LRU data is still fast to add, but very slow to remove some random one? Best Regards Wang Yugui (wangyugui@e16-tech.com) 2022/05/31
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index f172412447f5..28a8f8d6d235 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -471,10 +471,15 @@ nfsd_file_lru_walk_list(struct shrink_control *sc) return ret; } +/* concurrency nfsd_file_gc() is almost meaningless, so serialize it. */ +static atomic_t nfsd_file_gc_running = ATOMIC_INIT(0); static void nfsd_file_gc(void) { - nfsd_file_lru_walk_list(NULL); + if(atomic_cmpxchg(&nfsd_file_gc_running, 0, 1) == 0) { + nfsd_file_lru_walk_list(NULL); + atomic_set(&nfsd_file_gc_running, 0); + } } static void
When many(>NFSD_FILE_LRU_THRESHOLD) files are kept as OPEN, such as xfstests generic/531, nfsd proceses are in CPU high-load state, and nfsd_file_gc(nfsd filecache garbage collector) waste many CPU times. concurrency nfsd_file_gc() is almost meaningless, so serialize it. Signed-off-by: Wang Yugui <wangyugui@e16-tech.com> --- Changes since v1: - add static to 'atomic_t nfsd_file_gc_running'. thanks for kernel test robot <lkp@intel.com> fs/nfsd/filecache.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)