Message ID | 20190630135240.7490-1-trond.myklebust@hammerspace.com (mailing list archive) |
---|---|
Headers | show |
Series | Cache open file descriptors in knfsd | expand |
Interesting work! Kudos to you and Jeff. > On Jun 30, 2019, at 9:52 AM, Trond Myklebust <trondmy@gmail.com> wrote: > > When a NFSv3 READ or WRITE request comes in, the first thing knfsd has > to do is open a new file descriptor. While this is often a relatively > inexpensive thing to do for most local filesystems, it is usually less > so for FUSE, clustered or networked filesystems that are being exported > by knfsd. True, I haven't measured much effect if any of open and close on local file systems. It would be valuable if the cover letter provided a more quantified assessment of the cost for these other use cases. It sounds plausible to me that they would be more expensive, but I'm wondering if the additional complexity of an open file cache is warranted and effective. Do you have any benchmark results to share? Are there particular workloads where you believe open caching will be especially beneficial? > This set of patches attempts to reduce some of that cost by caching > open file descriptors so that they may be reused by other incoming > READ/WRITE requests for the same file. Is the open file cache a single cache per server? Wondering if there can be significant interference (eg lock contention or cache sloshing) between separate workloads on different exports, for example. Do you have any benchmark results that show that removing the raparms cache is harmless? > One danger when doing this, is that knfsd may end up caching file > descriptors for files that have been unlinked. In order to deal with > this issue, we use fsnotify to monitor the files, and have hooks to > evict those descriptors from the file cache if the i_nlink value > goes to 0. > > Jeff Layton (12): > sunrpc: add a new cache_detail operation for when a cache is flushed > locks: create a new notifier chain for lease attempts > nfsd: add a new struct file caching facility to nfsd > nfsd: hook up nfsd_write to the new nfsd_file cache > nfsd: hook up nfsd_read to the nfsd_file cache > nfsd: hook nfsd_commit up to the nfsd_file cache > nfsd: convert nfs4_file->fi_fds array to use nfsd_files > nfsd: convert fi_deleg_file and ls_file fields to nfsd_file > nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache > nfsd: have nfsd_test_lock use the nfsd_file cache > nfsd: rip out the raparms cache > nfsd: close cached files prior to a REMOVE or RENAME that would > replace target > > Trond Myklebust (4): > notify: export symbols for use by the knfsd file cache > vfs: Export flush_delayed_fput for use by knfsd. > nfsd: Fix up some unused variable warnings > nfsd: Fix the documentation for svcxdr_tmpalloc() > > fs/file_table.c | 1 + > fs/locks.c | 62 +++ > fs/nfsd/Kconfig | 1 + > fs/nfsd/Makefile | 3 +- > fs/nfsd/blocklayout.c | 3 +- > fs/nfsd/export.c | 13 + > fs/nfsd/filecache.c | 885 +++++++++++++++++++++++++++++++ > fs/nfsd/filecache.h | 60 +++ > fs/nfsd/nfs4layouts.c | 12 +- > fs/nfsd/nfs4proc.c | 83 +-- > fs/nfsd/nfs4state.c | 183 ++++--- > fs/nfsd/nfs4xdr.c | 31 +- > fs/nfsd/nfssvc.c | 16 +- > fs/nfsd/state.h | 10 +- > fs/nfsd/trace.h | 140 +++++ > fs/nfsd/vfs.c | 295 ++++------- > fs/nfsd/vfs.h | 9 +- > fs/nfsd/xdr4.h | 19 +- > fs/notify/fsnotify.h | 2 - > fs/notify/group.c | 2 + > fs/notify/mark.c | 6 + > include/linux/fs.h | 5 + > include/linux/fsnotify_backend.h | 2 + > include/linux/sunrpc/cache.h | 1 + > net/sunrpc/cache.c | 3 + > 25 files changed, 1465 insertions(+), 382 deletions(-) > create mode 100644 fs/nfsd/filecache.c > create mode 100644 fs/nfsd/filecache.h > > -- > 2.21.0 > -- Chuck Lever
On Mon, 2019-07-01 at 11:02 -0400, Chuck Lever wrote: > Interesting work! Kudos to you and Jeff. > > > > On Jun 30, 2019, at 9:52 AM, Trond Myklebust <trondmy@gmail.com> > > wrote: > > > > When a NFSv3 READ or WRITE request comes in, the first thing knfsd > > has > > to do is open a new file descriptor. While this is often a > > relatively > > inexpensive thing to do for most local filesystems, it is usually > > less > > so for FUSE, clustered or networked filesystems that are being > > exported > > by knfsd. > > True, I haven't measured much effect if any of open and close on > local > file systems. It would be valuable if the cover letter provided a > more > quantified assessment of the cost for these other use cases. It > sounds > plausible to me that they would be more expensive, but I'm wondering > if > the additional complexity of an open file cache is warranted and > effective. Do you have any benchmark results to share? > > Are there particular workloads where you believe open caching will be > especially beneficial? I'd expect pretty much anything with a nontrivial open() method. i.e.: FUSE, GFS2, OCFS2, CEPH, etc. to benefit. I've seen no slowdowns so far with traditional filesystems: i.e. ext4 and xfs. Note that the removal of the raparms cache does in many way compensate for the new need to lookup the struct file. > > This set of patches attempts to reduce some of that cost by caching > > open file descriptors so that they may be reused by other incoming > > READ/WRITE requests for the same file. > > Is the open file cache a single cache per server? Wondering if there > can be significant interference (eg lock contention or cache > sloshing) > between separate workloads on different exports, for example. The file cache is global. Cache lookups are lockless (i.e. RCU protected), so there is little contention for the case where there is already an entry. For the case where we have to add an entry, there is a mutex that might get contended in the case of workloads with lots of small file open+closes. > Do you have any benchmark results that show that removing the raparms > cache is harmless? The same information is carried in struct file. The whole raparms cache was just a hack in order to allow us to port the readahead information across struct file instances. Now that we are caching the struct file itself, the raparms hack is unnecessary. IOW: I haven't seen any slowdowns so far, however I don't have access to a bleeding edge networking setup that would push this further. > > One danger when doing this, is that knfsd may end up caching file > > descriptors for files that have been unlinked. In order to deal > > with > > this issue, we use fsnotify to monitor the files, and have hooks to > > evict those descriptors from the file cache if the i_nlink value > > goes to 0. > > > > Jeff Layton (12): > > sunrpc: add a new cache_detail operation for when a cache is > > flushed > > locks: create a new notifier chain for lease attempts > > nfsd: add a new struct file caching facility to nfsd > > nfsd: hook up nfsd_write to the new nfsd_file cache > > nfsd: hook up nfsd_read to the nfsd_file cache > > nfsd: hook nfsd_commit up to the nfsd_file cache > > nfsd: convert nfs4_file->fi_fds array to use nfsd_files > > nfsd: convert fi_deleg_file and ls_file fields to nfsd_file > > nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache > > nfsd: have nfsd_test_lock use the nfsd_file cache > > nfsd: rip out the raparms cache > > nfsd: close cached files prior to a REMOVE or RENAME that would > > replace target > > > > Trond Myklebust (4): > > notify: export symbols for use by the knfsd file cache > > vfs: Export flush_delayed_fput for use by knfsd. > > nfsd: Fix up some unused variable warnings > > nfsd: Fix the documentation for svcxdr_tmpalloc() > > > > fs/file_table.c | 1 + > > fs/locks.c | 62 +++ > > fs/nfsd/Kconfig | 1 + > > fs/nfsd/Makefile | 3 +- > > fs/nfsd/blocklayout.c | 3 +- > > fs/nfsd/export.c | 13 + > > fs/nfsd/filecache.c | 885 > > +++++++++++++++++++++++++++++++ > > fs/nfsd/filecache.h | 60 +++ > > fs/nfsd/nfs4layouts.c | 12 +- > > fs/nfsd/nfs4proc.c | 83 +-- > > fs/nfsd/nfs4state.c | 183 ++++--- > > fs/nfsd/nfs4xdr.c | 31 +- > > fs/nfsd/nfssvc.c | 16 +- > > fs/nfsd/state.h | 10 +- > > fs/nfsd/trace.h | 140 +++++ > > fs/nfsd/vfs.c | 295 ++++------- > > fs/nfsd/vfs.h | 9 +- > > fs/nfsd/xdr4.h | 19 +- > > fs/notify/fsnotify.h | 2 - > > fs/notify/group.c | 2 + > > fs/notify/mark.c | 6 + > > include/linux/fs.h | 5 + > > include/linux/fsnotify_backend.h | 2 + > > include/linux/sunrpc/cache.h | 1 + > > net/sunrpc/cache.c | 3 + > > 25 files changed, 1465 insertions(+), 382 deletions(-) > > create mode 100644 fs/nfsd/filecache.c > > create mode 100644 fs/nfsd/filecache.h > > > > -- > > 2.21.0 > > > > -- > Chuck Lever > > >
> On Jul 1, 2019, at 11:17 AM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Mon, 2019-07-01 at 11:02 -0400, Chuck Lever wrote: >> Interesting work! Kudos to you and Jeff. >> >> >>> On Jun 30, 2019, at 9:52 AM, Trond Myklebust <trondmy@gmail.com> >>> wrote: >>> >>> When a NFSv3 READ or WRITE request comes in, the first thing knfsd >>> has >>> to do is open a new file descriptor. While this is often a >>> relatively >>> inexpensive thing to do for most local filesystems, it is usually >>> less >>> so for FUSE, clustered or networked filesystems that are being >>> exported >>> by knfsd. >> >> True, I haven't measured much effect if any of open and close on >> local >> file systems. It would be valuable if the cover letter provided a >> more >> quantified assessment of the cost for these other use cases. It >> sounds >> plausible to me that they would be more expensive, but I'm wondering >> if >> the additional complexity of an open file cache is warranted and >> effective. Do you have any benchmark results to share? >> >> Are there particular workloads where you believe open caching will be >> especially beneficial? > > I'd expect pretty much anything with a nontrivial open() method. i.e.: > FUSE, GFS2, OCFS2, CEPH, etc. to benefit. > > I've seen no slowdowns so far with traditional filesystems: i.e. ext4 > and xfs. > > Note that the removal of the raparms cache does in many way compensate > for the new need to lookup the struct file. > >>> This set of patches attempts to reduce some of that cost by caching >>> open file descriptors so that they may be reused by other incoming >>> READ/WRITE requests for the same file. >> >> Is the open file cache a single cache per server? Wondering if there >> can be significant interference (eg lock contention or cache >> sloshing) >> between separate workloads on different exports, for example. > > The file cache is global. Cache lookups are lockless (i.e. RCU > protected), so there is little contention for the case where there is > already an entry. For the case where we have to add an entry, there is > a mutex that might get contended in the case of workloads with lots of > small file open+closes. >> Do you have any benchmark results that show that removing the raparms >> cache is harmless? > > The same information is carried in struct file. The whole raparms cache > was just a hack in order to allow us to port the readahead information > across struct file instances. Now that we are caching the struct file > itself, the raparms hack is unnecessary. OK. I see the patch description of 11/16 mentions something about "stop fiddling with raparms" but IMO the patch description for 13/16 should be changed to make the above clear. Thanks! > IOW: I haven't seen any slowdowns so far, however I don't have access > to a bleeding edge networking setup that would push this further. > >>> One danger when doing this, is that knfsd may end up caching file >>> descriptors for files that have been unlinked. In order to deal >>> with >>> this issue, we use fsnotify to monitor the files, and have hooks to >>> evict those descriptors from the file cache if the i_nlink value >>> goes to 0. >>> >>> Jeff Layton (12): >>> sunrpc: add a new cache_detail operation for when a cache is >>> flushed >>> locks: create a new notifier chain for lease attempts >>> nfsd: add a new struct file caching facility to nfsd >>> nfsd: hook up nfsd_write to the new nfsd_file cache >>> nfsd: hook up nfsd_read to the nfsd_file cache >>> nfsd: hook nfsd_commit up to the nfsd_file cache >>> nfsd: convert nfs4_file->fi_fds array to use nfsd_files >>> nfsd: convert fi_deleg_file and ls_file fields to nfsd_file >>> nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache >>> nfsd: have nfsd_test_lock use the nfsd_file cache >>> nfsd: rip out the raparms cache >>> nfsd: close cached files prior to a REMOVE or RENAME that would >>> replace target >>> >>> Trond Myklebust (4): >>> notify: export symbols for use by the knfsd file cache >>> vfs: Export flush_delayed_fput for use by knfsd. >>> nfsd: Fix up some unused variable warnings >>> nfsd: Fix the documentation for svcxdr_tmpalloc() >>> >>> fs/file_table.c | 1 + >>> fs/locks.c | 62 +++ >>> fs/nfsd/Kconfig | 1 + >>> fs/nfsd/Makefile | 3 +- >>> fs/nfsd/blocklayout.c | 3 +- >>> fs/nfsd/export.c | 13 + >>> fs/nfsd/filecache.c | 885 >>> +++++++++++++++++++++++++++++++ >>> fs/nfsd/filecache.h | 60 +++ >>> fs/nfsd/nfs4layouts.c | 12 +- >>> fs/nfsd/nfs4proc.c | 83 +-- >>> fs/nfsd/nfs4state.c | 183 ++++--- >>> fs/nfsd/nfs4xdr.c | 31 +- >>> fs/nfsd/nfssvc.c | 16 +- >>> fs/nfsd/state.h | 10 +- >>> fs/nfsd/trace.h | 140 +++++ >>> fs/nfsd/vfs.c | 295 ++++------- >>> fs/nfsd/vfs.h | 9 +- >>> fs/nfsd/xdr4.h | 19 +- >>> fs/notify/fsnotify.h | 2 - >>> fs/notify/group.c | 2 + >>> fs/notify/mark.c | 6 + >>> include/linux/fs.h | 5 + >>> include/linux/fsnotify_backend.h | 2 + >>> include/linux/sunrpc/cache.h | 1 + >>> net/sunrpc/cache.c | 3 + >>> 25 files changed, 1465 insertions(+), 382 deletions(-) >>> create mode 100644 fs/nfsd/filecache.c >>> create mode 100644 fs/nfsd/filecache.h >>> >>> -- >>> 2.21.0 >>> >> >> -- >> Chuck Lever >> >> >> > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com -- Chuck Lever
Sorry for the delay responding.... On Sun, Jun 30, 2019 at 09:52:24AM -0400, Trond Myklebust wrote: > When a NFSv3 READ or WRITE request comes in, the first thing knfsd has > to do is open a new file descriptor. I assume it shouldn't make a significant difference for NFSv4? > While this is often a relatively > inexpensive thing to do for most local filesystems, it is usually less > so for FUSE, clustered or networked filesystems that are being exported > by knfsd. > > This set of patches attempts to reduce some of that cost by caching > open file descriptors so that they may be reused by other incoming > READ/WRITE requests for the same file. > One danger when doing this, is that knfsd may end up caching file > descriptors for files that have been unlinked. In order to deal with > this issue, we use fsnotify to monitor the files, and have hooks to > evict those descriptors from the file cache if the i_nlink value > goes to 0. That was one of the objections to previous attempts at a file cache so it's good to have a simple solution. This attempt seems pretty well thought-out. I'll tentatively target this for 5.4 pending some more review and testing. --b. > > Jeff Layton (12): > sunrpc: add a new cache_detail operation for when a cache is flushed > locks: create a new notifier chain for lease attempts > nfsd: add a new struct file caching facility to nfsd > nfsd: hook up nfsd_write to the new nfsd_file cache > nfsd: hook up nfsd_read to the nfsd_file cache > nfsd: hook nfsd_commit up to the nfsd_file cache > nfsd: convert nfs4_file->fi_fds array to use nfsd_files > nfsd: convert fi_deleg_file and ls_file fields to nfsd_file > nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache > nfsd: have nfsd_test_lock use the nfsd_file cache > nfsd: rip out the raparms cache > nfsd: close cached files prior to a REMOVE or RENAME that would > replace target > > Trond Myklebust (4): > notify: export symbols for use by the knfsd file cache > vfs: Export flush_delayed_fput for use by knfsd. > nfsd: Fix up some unused variable warnings > nfsd: Fix the documentation for svcxdr_tmpalloc() > > fs/file_table.c | 1 + > fs/locks.c | 62 +++ > fs/nfsd/Kconfig | 1 + > fs/nfsd/Makefile | 3 +- > fs/nfsd/blocklayout.c | 3 +- > fs/nfsd/export.c | 13 + > fs/nfsd/filecache.c | 885 +++++++++++++++++++++++++++++++ > fs/nfsd/filecache.h | 60 +++ > fs/nfsd/nfs4layouts.c | 12 +- > fs/nfsd/nfs4proc.c | 83 +-- > fs/nfsd/nfs4state.c | 183 ++++--- > fs/nfsd/nfs4xdr.c | 31 +- > fs/nfsd/nfssvc.c | 16 +- > fs/nfsd/state.h | 10 +- > fs/nfsd/trace.h | 140 +++++ > fs/nfsd/vfs.c | 295 ++++------- > fs/nfsd/vfs.h | 9 +- > fs/nfsd/xdr4.h | 19 +- > fs/notify/fsnotify.h | 2 - > fs/notify/group.c | 2 + > fs/notify/mark.c | 6 + > include/linux/fs.h | 5 + > include/linux/fsnotify_backend.h | 2 + > include/linux/sunrpc/cache.h | 1 + > net/sunrpc/cache.c | 3 + > 25 files changed, 1465 insertions(+), 382 deletions(-) > create mode 100644 fs/nfsd/filecache.c > create mode 100644 fs/nfsd/filecache.h > > -- > 2.21.0