Message ID | 20201216233149.39025-2-vgoyal@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfs, overlayfs: Fix syncfs() to return error | expand |
[Christoph added to Cc...] On Wed, Dec 16, 2020 at 06:31:47PM -0500, Vivek Goyal wrote: > Current implementation of __sync_filesystem() ignores the return code > from ->sync_fs(). I am not sure why that's the case. There must have > been some historical reason for this. > > Ignoring ->sync_fs() return code is problematic for overlayfs where > it can return error if sync_filesystem() on upper super block failed. > That error will simply be lost and sycnfs(overlay_fd), will get > success (despite the fact it failed). > > If we modify existing implementation, there is a concern that it will > lead to user space visible behavior changes and break things. So > instead implement a new file_operations->syncfs() call which will > be called in syncfs() syscall path. Return code from this new > call will be captured. And all the writeback error detection > logic can go in there as well. Only filesystems which implement > this call get affected by this change. Others continue to fallback > to existing mechanism. That smells like a massive source of confusion down the road. I'd just looked through the existing instances; many always return 0, but quite a few sometimes try to return an error: fs/btrfs/super.c:2412: .sync_fs = btrfs_sync_fs, fs/exfat/super.c:204: .sync_fs = exfat_sync_fs, fs/ext4/super.c:1674: .sync_fs = ext4_sync_fs, fs/f2fs/super.c:2480: .sync_fs = f2fs_sync_fs, fs/gfs2/super.c:1600: .sync_fs = gfs2_sync_fs, fs/hfsplus/super.c:368: .sync_fs = hfsplus_sync_fs, fs/nilfs2/super.c:689: .sync_fs = nilfs_sync_fs, fs/ocfs2/super.c:139: .sync_fs = ocfs2_sync_fs, fs/overlayfs/super.c:399: .sync_fs = ovl_sync_fs, fs/ubifs/super.c:2052: .sync_fs = ubifs_sync_fs, is the list of such. There are 4 method callers: dquot_quota_sync(), dquot_disable(), __sync_filesystem() and sync_fs_one_sb(). For sync_fs_one_sb() we want to ignore the return value; for __sync_filesystem() we almost certainly do *not* - it ends with return __sync_blockdev(sb->s_bdev, wait), after all. The question for that one is whether we want __sync_blockdev() called even in case of ->sync_fs() reporting a failure, and I suspect that it's safer to call it anyway and return the first error value we'd got. No idea about quota situation.
On Thu 17-12-20 00:49:35, Al Viro wrote: > [Christoph added to Cc...] > On Wed, Dec 16, 2020 at 06:31:47PM -0500, Vivek Goyal wrote: > > Current implementation of __sync_filesystem() ignores the return code > > from ->sync_fs(). I am not sure why that's the case. There must have > > been some historical reason for this. > > > > Ignoring ->sync_fs() return code is problematic for overlayfs where > > it can return error if sync_filesystem() on upper super block failed. > > That error will simply be lost and sycnfs(overlay_fd), will get > > success (despite the fact it failed). > > > > If we modify existing implementation, there is a concern that it will > > lead to user space visible behavior changes and break things. So > > instead implement a new file_operations->syncfs() call which will > > be called in syncfs() syscall path. Return code from this new > > call will be captured. And all the writeback error detection > > logic can go in there as well. Only filesystems which implement > > this call get affected by this change. Others continue to fallback > > to existing mechanism. > > That smells like a massive source of confusion down the road. I'd just > looked through the existing instances; many always return 0, but quite > a few sometimes try to return an error: > fs/btrfs/super.c:2412: .sync_fs = btrfs_sync_fs, > fs/exfat/super.c:204: .sync_fs = exfat_sync_fs, > fs/ext4/super.c:1674: .sync_fs = ext4_sync_fs, > fs/f2fs/super.c:2480: .sync_fs = f2fs_sync_fs, > fs/gfs2/super.c:1600: .sync_fs = gfs2_sync_fs, > fs/hfsplus/super.c:368: .sync_fs = hfsplus_sync_fs, > fs/nilfs2/super.c:689: .sync_fs = nilfs_sync_fs, > fs/ocfs2/super.c:139: .sync_fs = ocfs2_sync_fs, > fs/overlayfs/super.c:399: .sync_fs = ovl_sync_fs, > fs/ubifs/super.c:2052: .sync_fs = ubifs_sync_fs, > is the list of such. There are 4 method callers: > dquot_quota_sync(), dquot_disable(), __sync_filesystem() and > sync_fs_one_sb(). For sync_fs_one_sb() we want to ignore the > return value; for __sync_filesystem() we almost certainly > do *not* - it ends with return __sync_blockdev(sb->s_bdev, wait), > after all. The question for that one is whether we want > __sync_blockdev() called even in case of ->sync_fs() reporting > a failure, and I suspect that it's safer to call it anyway and > return the first error value we'd got. No idea about quota > situation. WRT quota situation: All the ->sync_fs() calls there are due to cache coherency reasons (we need to get quota changes to disk, then prune quota files's page cache, and then userspace can read current quota structures from the disk). We don't want to fail dquot_disable() just because caches might be incoherent so ignoring ->sync_fs() return value there is fine. With dquot_quota_sync() it might make some sense to return the error - that's just a backend for Q_SYNC quotactl(2). OTOH I'm not sure anybody really cares - Q_SYNC is rarely used. Honza
On Thu, Dec 17, 2020 at 12:49:35AM +0000, Al Viro wrote: > [Christoph added to Cc...] > On Wed, Dec 16, 2020 at 06:31:47PM -0500, Vivek Goyal wrote: > > Current implementation of __sync_filesystem() ignores the return code > > from ->sync_fs(). I am not sure why that's the case. There must have > > been some historical reason for this. > > > > Ignoring ->sync_fs() return code is problematic for overlayfs where > > it can return error if sync_filesystem() on upper super block failed. > > That error will simply be lost and sycnfs(overlay_fd), will get > > success (despite the fact it failed). > > > > If we modify existing implementation, there is a concern that it will > > lead to user space visible behavior changes and break things. So > > instead implement a new file_operations->syncfs() call which will > > be called in syncfs() syscall path. Return code from this new > > call will be captured. And all the writeback error detection > > logic can go in there as well. Only filesystems which implement > > this call get affected by this change. Others continue to fallback > > to existing mechanism. > > That smells like a massive source of confusion down the road. I'd just > looked through the existing instances; many always return 0, but quite > a few sometimes try to return an error: > fs/btrfs/super.c:2412: .sync_fs = btrfs_sync_fs, > fs/exfat/super.c:204: .sync_fs = exfat_sync_fs, > fs/ext4/super.c:1674: .sync_fs = ext4_sync_fs, > fs/f2fs/super.c:2480: .sync_fs = f2fs_sync_fs, > fs/gfs2/super.c:1600: .sync_fs = gfs2_sync_fs, > fs/hfsplus/super.c:368: .sync_fs = hfsplus_sync_fs, > fs/nilfs2/super.c:689: .sync_fs = nilfs_sync_fs, > fs/ocfs2/super.c:139: .sync_fs = ocfs2_sync_fs, > fs/overlayfs/super.c:399: .sync_fs = ovl_sync_fs, > fs/ubifs/super.c:2052: .sync_fs = ubifs_sync_fs, > is the list of such. There are 4 method callers: > dquot_quota_sync(), dquot_disable(), __sync_filesystem() and > sync_fs_one_sb(). For sync_fs_one_sb() we want to ignore the > return value; for __sync_filesystem() we almost certainly > do *not* - it ends with return __sync_blockdev(sb->s_bdev, wait), > after all. The question for that one is whether we want > __sync_blockdev() called even in case of ->sync_fs() reporting > a failure, and I suspect that it's safer to call it anyway and > return the first error value we'd got. I posted V1 patch to do exactly above. In __sync_filesystem(), capture return code from ->sync_fs() but continue to call __sync_blockdev() and and return error code from ->sync_fs() if there is one otherwise return error code from __sync_blockdev(). https://lore.kernel.org/linux-fsdevel/20201216143802.GA10550@redhat.com/ Thanks Vivek > No idea about quota situation. >
On Thu, Dec 17, 2020 at 10:57:28AM +0100, Jan Kara wrote: > On Thu 17-12-20 00:49:35, Al Viro wrote: > > [Christoph added to Cc...] > > On Wed, Dec 16, 2020 at 06:31:47PM -0500, Vivek Goyal wrote: > > > Current implementation of __sync_filesystem() ignores the return code > > > from ->sync_fs(). I am not sure why that's the case. There must have > > > been some historical reason for this. > > > > > > Ignoring ->sync_fs() return code is problematic for overlayfs where > > > it can return error if sync_filesystem() on upper super block failed. > > > That error will simply be lost and sycnfs(overlay_fd), will get > > > success (despite the fact it failed). > > > > > > If we modify existing implementation, there is a concern that it will > > > lead to user space visible behavior changes and break things. So > > > instead implement a new file_operations->syncfs() call which will > > > be called in syncfs() syscall path. Return code from this new > > > call will be captured. And all the writeback error detection > > > logic can go in there as well. Only filesystems which implement > > > this call get affected by this change. Others continue to fallback > > > to existing mechanism. > > > > That smells like a massive source of confusion down the road. I'd just > > looked through the existing instances; many always return 0, but quite > > a few sometimes try to return an error: > > fs/btrfs/super.c:2412: .sync_fs = btrfs_sync_fs, > > fs/exfat/super.c:204: .sync_fs = exfat_sync_fs, > > fs/ext4/super.c:1674: .sync_fs = ext4_sync_fs, > > fs/f2fs/super.c:2480: .sync_fs = f2fs_sync_fs, > > fs/gfs2/super.c:1600: .sync_fs = gfs2_sync_fs, > > fs/hfsplus/super.c:368: .sync_fs = hfsplus_sync_fs, > > fs/nilfs2/super.c:689: .sync_fs = nilfs_sync_fs, > > fs/ocfs2/super.c:139: .sync_fs = ocfs2_sync_fs, > > fs/overlayfs/super.c:399: .sync_fs = ovl_sync_fs, > > fs/ubifs/super.c:2052: .sync_fs = ubifs_sync_fs, > > is the list of such. There are 4 method callers: > > dquot_quota_sync(), dquot_disable(), __sync_filesystem() and > > sync_fs_one_sb(). For sync_fs_one_sb() we want to ignore the > > return value; for __sync_filesystem() we almost certainly > > do *not* - it ends with return __sync_blockdev(sb->s_bdev, wait), > > after all. The question for that one is whether we want > > __sync_blockdev() called even in case of ->sync_fs() reporting > > a failure, and I suspect that it's safer to call it anyway and > > return the first error value we'd got. No idea about quota > > situation. > > WRT quota situation: All the ->sync_fs() calls there are due to cache > coherency reasons (we need to get quota changes to disk, then prune quota > files's page cache, and then userspace can read current quota structures > from the disk). We don't want to fail dquot_disable() just because caches > might be incoherent so ignoring ->sync_fs() return value there is fine. > With dquot_quota_sync() it might make some sense to return the error - > that's just a backend for Q_SYNC quotactl(2). OTOH I'm not sure anybody > really cares - Q_SYNC is rarely used. Thanks Jan. May be I will leave dquot_quota_sync() untouched for now. When somebody has a need to capture return code from ->sync_fs() there, it can be easily added. Vivek
On Thu, 2020-12-17 at 00:49 +0000, Al Viro wrote: > [Christoph added to Cc...] > On Wed, Dec 16, 2020 at 06:31:47PM -0500, Vivek Goyal wrote: > > Current implementation of __sync_filesystem() ignores the return code > > from ->sync_fs(). I am not sure why that's the case. There must have > > been some historical reason for this. > > > > Ignoring ->sync_fs() return code is problematic for overlayfs where > > it can return error if sync_filesystem() on upper super block failed. > > That error will simply be lost and sycnfs(overlay_fd), will get > > success (despite the fact it failed). > > > > If we modify existing implementation, there is a concern that it will > > lead to user space visible behavior changes and break things. So > > instead implement a new file_operations->syncfs() call which will > > be called in syncfs() syscall path. Return code from this new > > call will be captured. And all the writeback error detection > > logic can go in there as well. Only filesystems which implement > > this call get affected by this change. Others continue to fallback > > to existing mechanism. > > That smells like a massive source of confusion down the road. I'd just > looked through the existing instances; many always return 0, but quite > a few sometimes try to return an error: > fs/btrfs/super.c:2412: .sync_fs = btrfs_sync_fs, > fs/exfat/super.c:204: .sync_fs = exfat_sync_fs, > fs/ext4/super.c:1674: .sync_fs = ext4_sync_fs, > fs/f2fs/super.c:2480: .sync_fs = f2fs_sync_fs, > fs/gfs2/super.c:1600: .sync_fs = gfs2_sync_fs, > fs/hfsplus/super.c:368: .sync_fs = hfsplus_sync_fs, > fs/nilfs2/super.c:689: .sync_fs = nilfs_sync_fs, > fs/ocfs2/super.c:139: .sync_fs = ocfs2_sync_fs, > fs/overlayfs/super.c:399: .sync_fs = ovl_sync_fs, > fs/ubifs/super.c:2052: .sync_fs = ubifs_sync_fs, > is the list of such. There are 4 method callers: > dquot_quota_sync(), dquot_disable(), __sync_filesystem() and > sync_fs_one_sb(). For sync_fs_one_sb() we want to ignore the > return value; for __sync_filesystem() we almost certainly > do *not* - it ends with return __sync_blockdev(sb->s_bdev, wait), > after all. The question for that one is whether we want > __sync_blockdev() called even in case of ->sync_fs() reporting > a failure, and I suspect that it's safer to call it anyway and > return the first error value we'd got. No idea about quota > situation. > The only problem there is that makes it a bit difficult to override the error return to syncfs, which is really what overlayfs wants to be able to do. Their syncfs syncs out the upper layer, so it makes sense to just have their file->f_sb_err track the upper layer's sb->s_wb_err. You can plumb the errors from sync_fs all the way through to the syncfs syscall, but we can't currently tell whether we're doing the sync_fs op on behalf of sync(), syncfs() or something else entirely. We need to ensure that if it does return an error, that it doesn't get dropped on the floor. I think it'd be simpler to just add f_op->syncfs and change s_op->sync_fs to a different name, to lessen the confusion. s_op->sync_fs sort of makes it look like you're implementing syncfs(2), but there's a bit more to it than that. Maybe s_op->sync_filesystem? There are only about 113 instances "sync_fs" in the tree. Changing the name might also help highlight the fact that the return code won't be ignored like it used to be.
diff --git a/fs/sync.c b/fs/sync.c index 1373a610dc78..06caa9758d93 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -155,27 +155,38 @@ void emergency_sync(void) } } +static int generic_syncfs(struct file *file) +{ + int ret, ret2; + struct super_block *sb = file->f_path.dentry->d_sb; + + down_read(&sb->s_umount); + ret = sync_filesystem(sb); + up_read(&sb->s_umount); + + ret2 = errseq_check_and_advance(&sb->s_wb_err, &file->f_sb_err); + + return ret ? ret : ret2; +} + /* * sync a single super */ SYSCALL_DEFINE1(syncfs, int, fd) { struct fd f = fdget(fd); - struct super_block *sb; - int ret, ret2; + int ret; if (!f.file) return -EBADF; - sb = f.file->f_path.dentry->d_sb; - - down_read(&sb->s_umount); - ret = sync_filesystem(sb); - up_read(&sb->s_umount); - ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err); + if (f.file->f_op->syncfs) + ret = f.file->f_op->syncfs(f.file); + else + ret = generic_syncfs(f.file); fdput(f); - return ret ? ret : ret2; + return ret; } /** diff --git a/include/linux/fs.h b/include/linux/fs.h index 8667d0cdc71e..6710469b7e33 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1859,6 +1859,7 @@ struct file_operations { struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); int (*fadvise)(struct file *, loff_t, loff_t, int); + int (*syncfs)(struct file *); } __randomize_layout; struct inode_operations {
Current implementation of __sync_filesystem() ignores the return code from ->sync_fs(). I am not sure why that's the case. There must have been some historical reason for this. Ignoring ->sync_fs() return code is problematic for overlayfs where it can return error if sync_filesystem() on upper super block failed. That error will simply be lost and sycnfs(overlay_fd), will get success (despite the fact it failed). If we modify existing implementation, there is a concern that it will lead to user space visible behavior changes and break things. So instead implement a new file_operations->syncfs() call which will be called in syncfs() syscall path. Return code from this new call will be captured. And all the writeback error detection logic can go in there as well. Only filesystems which implement this call get affected by this change. Others continue to fallback to existing mechanism. To be clear, I mean something like this (draft, untested) patch. You'd also need to add a new ->syncfs op for overlayfs, and that could just do a check_and_advance against the upper layer sb's errseq_t after calling sync_filesystem. Vivek, fixed couple of minor compile errors in original patch. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/sync.c | 29 ++++++++++++++++++++--------- include/linux/fs.h | 1 + 2 files changed, 21 insertions(+), 9 deletions(-)