Message ID | 161118128472.1232039.11746799833066425131.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | Network fs helper library & fscache kiocb API | expand |
On Wed, Jan 20, 2021 at 10:21:24PM +0000, David Howells wrote: > Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available > to be read from the cache. Whilst this is an improvement from the > bmap interface, it still has a problem with regard to a modern > extent-based filesystem inserting or removing bridging blocks of > zeros. What are the consequences from the point of view of a user? --b. > > This is a step towards overhauling the fscache API. The change is opt-in > on the part of the network filesystem. A netfs should not try to mix the > old and the new API because of conflicting ways of handling pages and the > PG_fscache page flag and because it would be mixing DIO with buffered I/O. > Further, the helper library can't be used with the old API. > > This does not change any of the fscache cookie handling APIs or the way > invalidation is done. > > In the near term, I intend to deprecate and remove the old I/O API > (fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(), > fscache_write_page() and fscache_uncache_page()) and eventually replace > most of fscache/cachefiles with something simpler and easier to follow. > > The patchset contains four parts: > > (1) Some helper patches, including provision of an ITER_XARRAY iov > iterator and a function to do readahead expansion. > > (2) Patches to add the netfs helper library. > > (3) A patch to add the fscache/cachefiles kiocb API > > (4) Patches to add support in AFS for this. > > With this, AFS without a cache passes all expected xfstests; with a cache, > there's an extra failure, but that's also there before these patches. > Fixing that probably requires a greater overhaul. > > These patches can be found also on: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-netfs-lib > > David > --- > David Howells (24): > iov_iter: Add ITER_XARRAY > vm: Add wait/unlock functions for PG_fscache > mm: Implement readahead_control pageset expansion > vfs: Export rw_verify_area() for use by cachefiles > netfs: Make a netfs helper module > netfs: Provide readahead and readpage netfs helpers > netfs: Add tracepoints > netfs: Gather stats > netfs: Add write_begin helper > netfs: Define an interface to talk to a cache > fscache, cachefiles: Add alternate API to use kiocb for read/write to cache > afs: Disable use of the fscache I/O routines > afs: Pass page into dirty region helpers to provide THP size > afs: Print the operation debug_id when logging an unexpected data version > afs: Move key to afs_read struct > afs: Don't truncate iter during data fetch > afs: Log remote unmarshalling errors > afs: Set up the iov_iter before calling afs_extract_data() > afs: Use ITER_XARRAY for writing > afs: Wait on PG_fscache before modifying/releasing a page > afs: Extract writeback extension into its own function > afs: Prepare for use of THPs > afs: Use the fs operation ops to handle FetchData completion > afs: Use new fscache read helper API > > Takashi Iwai (1): > cachefiles: Drop superfluous readpages aops NULL check > > > fs/Kconfig | 1 + > fs/Makefile | 1 + > fs/afs/Kconfig | 1 + > fs/afs/dir.c | 225 ++++--- > fs/afs/file.c | 472 ++++---------- > fs/afs/fs_operation.c | 4 +- > fs/afs/fsclient.c | 108 ++-- > fs/afs/inode.c | 7 +- > fs/afs/internal.h | 57 +- > fs/afs/rxrpc.c | 150 ++--- > fs/afs/write.c | 610 ++++++++++-------- > fs/afs/yfsclient.c | 82 +-- > fs/cachefiles/Makefile | 1 + > fs/cachefiles/interface.c | 5 +- > fs/cachefiles/internal.h | 9 + > fs/cachefiles/rdwr.c | 2 - > fs/cachefiles/rdwr2.c | 406 ++++++++++++ > fs/fscache/Makefile | 3 +- > fs/fscache/internal.h | 3 + > fs/fscache/page.c | 2 +- > fs/fscache/page2.c | 116 ++++ > fs/fscache/stats.c | 1 + > fs/internal.h | 5 - > fs/netfs/Kconfig | 23 + > fs/netfs/Makefile | 5 + > fs/netfs/internal.h | 97 +++ > fs/netfs/read_helper.c | 1142 +++++++++++++++++++++++++++++++++ > fs/netfs/stats.c | 57 ++ > fs/read_write.c | 1 + > include/linux/fs.h | 1 + > include/linux/fscache-cache.h | 4 + > include/linux/fscache.h | 28 +- > include/linux/netfs.h | 167 +++++ > include/linux/pagemap.h | 16 + > include/net/af_rxrpc.h | 2 +- > include/trace/events/afs.h | 74 +-- > include/trace/events/netfs.h | 201 ++++++ > mm/filemap.c | 18 + > mm/readahead.c | 70 ++ > net/rxrpc/recvmsg.c | 9 +- > 40 files changed, 3171 insertions(+), 1015 deletions(-) > create mode 100644 fs/cachefiles/rdwr2.c > create mode 100644 fs/fscache/page2.c > create mode 100644 fs/netfs/Kconfig > create mode 100644 fs/netfs/Makefile > create mode 100644 fs/netfs/internal.h > create mode 100644 fs/netfs/read_helper.c > create mode 100644 fs/netfs/stats.c > create mode 100644 include/linux/netfs.h > create mode 100644 include/trace/events/netfs.h >
J. Bruce Fields <bfields@fieldses.org> wrote: > On Wed, Jan 20, 2021 at 10:21:24PM +0000, David Howells wrote: > > Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available > > to be read from the cache. Whilst this is an improvement from the > > bmap interface, it still has a problem with regard to a modern > > extent-based filesystem inserting or removing bridging blocks of > > zeros. > > What are the consequences from the point of view of a user? The cache can get both false positive and false negative results on checks for the presence of data because an extent-based filesystem can, at will, insert or remove blocks of contiguous zeros to make the extents easier to encode (ie. bridge them or split them). A false-positive means that you get a block of zeros in the middle of your file that very probably shouldn't be there (ie. file corruption); a false-negative means that we go and reload the missing chunk from the server. The problem exists in cachefiles whether we use bmap or we use SEEK_HOLE/SEEK_DATA. The only way round it is to keep track of what data is present independently of backing filesystem's metadata. To this end, it shouldn't (mis)behave differently than the code already there - except that it handles better the case in which the backing filesystem blocksize != PAGE_SIZE (which may not be relevant on an extent-based filesystem anyway if it packs parts of different files together in a single block) because the current implementation only bmaps the first block in a page and doesn't probe for the rest. Fixing this requires a much bigger overhaul of cachefiles than this patchset performs. Also, it works towards getting rid of this use of bmap, but that's not user visible. David
On Thu, Jan 21, 2021 at 05:02:57PM +0000, David Howells wrote: > J. Bruce Fields <bfields@fieldses.org> wrote: > > > On Wed, Jan 20, 2021 at 10:21:24PM +0000, David Howells wrote: > > > Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available > > > to be read from the cache. Whilst this is an improvement from the > > > bmap interface, it still has a problem with regard to a modern > > > extent-based filesystem inserting or removing bridging blocks of > > > zeros. > > > > What are the consequences from the point of view of a user? > > The cache can get both false positive and false negative results on checks for > the presence of data because an extent-based filesystem can, at will, insert > or remove blocks of contiguous zeros to make the extents easier to encode > (ie. bridge them or split them). > > A false-positive means that you get a block of zeros in the middle of your > file that very probably shouldn't be there (ie. file corruption); a > false-negative means that we go and reload the missing chunk from the server. > > The problem exists in cachefiles whether we use bmap or we use > SEEK_HOLE/SEEK_DATA. The only way round it is to keep track of what data is > present independently of backing filesystem's metadata. > > To this end, it shouldn't (mis)behave differently than the code already there > - except that it handles better the case in which the backing filesystem > blocksize != PAGE_SIZE (which may not be relevant on an extent-based > filesystem anyway if it packs parts of different files together in a single > block) because the current implementation only bmaps the first block in a page > and doesn't probe for the rest. > > Fixing this requires a much bigger overhaul of cachefiles than this patchset > performs. That sounds like "sometimes you may get file corruption and there's nothing you can do about it". But I know people actually use fscache, so it must be reliable at least for some use cases. Is it that those "bridging" blocks only show up in certain corner cases that users can arrange to avoid? Or that it's OK as long as you use certain specific file systems whose behavior goes beyond what's technically required by the bamp or seek interfaces? --b. > > Also, it works towards getting rid of this use of bmap, but that's not user > visible. > > David
J. Bruce Fields <bfields@fieldses.org> wrote: > > Fixing this requires a much bigger overhaul of cachefiles than this patchset > > performs. > > That sounds like "sometimes you may get file corruption and there's > nothing you can do about it". But I know people actually use fscache, > so it must be reliable at least for some use cases. Yes. That's true for the upstream code because that uses bmap. I'm switching to use SEEK_HOLE/SEEK_DATA to get rid of the bmap usage, but it doesn't change the issue. > Is it that those "bridging" blocks only show up in certain corner cases > that users can arrange to avoid? Or that it's OK as long as you use > certain specific file systems whose behavior goes beyond what's > technically required by the bamp or seek interfaces? That's a question for the xfs, ext4 and btrfs maintainers, and may vary between kernel versions and fsck or filesystem packing utility versions. David
On Thu, Jan 21, 2021 at 06:55:13PM +0000, David Howells wrote: > J. Bruce Fields <bfields@fieldses.org> wrote: > > > > Fixing this requires a much bigger overhaul of cachefiles than this patchset > > > performs. > > > > That sounds like "sometimes you may get file corruption and there's > > nothing you can do about it". But I know people actually use fscache, > > so it must be reliable at least for some use cases. > > Yes. That's true for the upstream code because that uses bmap. Sorry, when you say "that's true", what part are you referring to? > I'm switching > to use SEEK_HOLE/SEEK_DATA to get rid of the bmap usage, but it doesn't change > the issue. > > > Is it that those "bridging" blocks only show up in certain corner cases > > that users can arrange to avoid? Or that it's OK as long as you use > > certain specific file systems whose behavior goes beyond what's > > technically required by the bamp or seek interfaces? > > That's a question for the xfs, ext4 and btrfs maintainers, and may vary > between kernel versions and fsck or filesystem packing utility versions. So, I'm still confused: there must be some case where we know fscache actually works reliably and doesn't corrupt your data, right? --b.
J. Bruce Fields <bfields@fieldses.org> wrote: > > J. Bruce Fields <bfields@fieldses.org> wrote: > > > > > > Fixing this requires a much bigger overhaul of cachefiles than this patchset > > > > performs. > > > > > > That sounds like "sometimes you may get file corruption and there's > > > nothing you can do about it". But I know people actually use fscache, > > > so it must be reliable at least for some use cases. > > > > Yes. That's true for the upstream code because that uses bmap. > > Sorry, when you say "that's true", what part are you referring to? Sometimes, theoretically, you may get file corruption due to this. > > I'm switching > > to use SEEK_HOLE/SEEK_DATA to get rid of the bmap usage, but it doesn't change > > the issue. > > > > > Is it that those "bridging" blocks only show up in certain corner cases > > > that users can arrange to avoid? Or that it's OK as long as you use > > > certain specific file systems whose behavior goes beyond what's > > > technically required by the bamp or seek interfaces? > > > > That's a question for the xfs, ext4 and btrfs maintainers, and may vary > > between kernel versions and fsck or filesystem packing utility versions. > > So, I'm still confused: there must be some case where we know fscache > actually works reliably and doesn't corrupt your data, right? Using ext2/3, for example. I don't know under what circumstances xfs, ext4 and btrfs might insert/remove blocks of zeros, but I'm told it can happen. David
On Thu, Jan 21, 2021 at 06:55:13PM +0000, David Howells wrote: > > Is it that those "bridging" blocks only show up in certain corner cases > > that users can arrange to avoid? Or that it's OK as long as you use > > certain specific file systems whose behavior goes beyond what's > > technically required by the bamp or seek interfaces? > > That's a question for the xfs, ext4 and btrfs maintainers, and may vary > between kernel versions and fsck or filesystem packing utility versions. For XFS if you do not use reflinks, extent size hints or the RT subvolume there are no new allocations before i_size that will magically show up. But relying on such undocumented assumptions is very dangerous.
On Thu, Jan 21, 2021 at 08:08:24PM +0000, David Howells wrote: > J. Bruce Fields <bfields@fieldses.org> wrote: > > So, I'm still confused: there must be some case where we know fscache > > actually works reliably and doesn't corrupt your data, right? > > Using ext2/3, for example. I don't know under what circumstances xfs, ext4 > and btrfs might insert/remove blocks of zeros, but I'm told it can happen. Do ext2/3 work well for fscache in other ways? --b.
J. Bruce Fields <bfields@fieldses.org> wrote: > > J. Bruce Fields <bfields@fieldses.org> wrote: > > > So, I'm still confused: there must be some case where we know fscache > > > actually works reliably and doesn't corrupt your data, right? > > > > Using ext2/3, for example. I don't know under what circumstances xfs, ext4 > > and btrfs might insert/remove blocks of zeros, but I'm told it can happen. > > Do ext2/3 work well for fscache in other ways? Ext3 shouldn't be a problem. That's what I used when developing it. I'm not sure if ext2 supports xattrs, though. David