Message ID | 20170616193427.13955-17-jlayton@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> error = filemap_write_and_wait_range(filp->f_mapping, start, end); > if (error) > - return error; > + goto out; > > /* > * There is no need to serialise calls to blkdev_issue_flush with > @@ -640,6 +640,10 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync) > if (error == -EOPNOTSUPP) > error = 0; > > +out: > + wberr = filemap_report_wb_err(filp); > + if (!error) > + error = wberr; Just curious: what's the reason filemap_write_and_wait_range couldn't query for the error using filemap_report_wb_err internally?
On Tue, 2017-06-20 at 05:35 -0700, Christoph Hellwig wrote: > > error = filemap_write_and_wait_range(filp->f_mapping, start, end); > > if (error) > > - return error; > > + goto out; > > > > /* > > * There is no need to serialise calls to blkdev_issue_flush with > > @@ -640,6 +640,10 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync) > > if (error == -EOPNOTSUPP) > > error = 0; > > > > +out: > > + wberr = filemap_report_wb_err(filp); > > + if (!error) > > + error = wberr; > > Just curious: what's the reason filemap_write_and_wait_range couldn't > query for the error using filemap_report_wb_err internally? In order to query for errors with errseq_t, you need a previously- sampled point from which to check. When you call filemap_write_and_wait_range though you don't have a struct file and so no previously-sampled value.
On Tue, Jun 20, 2017 at 01:44:44PM -0400, Jeff Layton wrote: > In order to query for errors with errseq_t, you need a previously- > sampled point from which to check. When you call > filemap_write_and_wait_range though you don't have a struct file and so > no previously-sampled value. So can we simply introduce variants of them that take a struct file? That would be: a) less churn b) less code c) less chance to get data integrity wrong
On Sat, 2017-06-24 at 04:59 -0700, Christoph Hellwig wrote: > On Tue, Jun 20, 2017 at 01:44:44PM -0400, Jeff Layton wrote: > > In order to query for errors with errseq_t, you need a previously- > > sampled point from which to check. When you call > > filemap_write_and_wait_range though you don't have a struct file and so > > no previously-sampled value. > > So can we simply introduce variants of them that take a struct file? > That would be: > > a) less churn > b) less code > c) less chance to get data integrity wrong Yeah, I had that thought after I sent the reply to you earlier. The main reason I didn't do that before was that I had myself convinced that we needed to do the check_and_advance as late as possible in the fsync process, after the metadata had been written. Now that I think about it more, I think you're probably correct. As long as we do the check and advance at some point after doing the write_and_wait, we're fine here and shouldn't violate exactly once semantics on the fsync return.
On Sat, 2017-06-24 at 09:16 -0400, Jeff Layton wrote: > On Sat, 2017-06-24 at 04:59 -0700, Christoph Hellwig wrote: > > On Tue, Jun 20, 2017 at 01:44:44PM -0400, Jeff Layton wrote: > > > In order to query for errors with errseq_t, you need a previously- > > > sampled point from which to check. When you call > > > filemap_write_and_wait_range though you don't have a struct file and so > > > no previously-sampled value. > > > > So can we simply introduce variants of them that take a struct file? > > That would be: > > > > a) less churn > > b) less code > > c) less chance to get data integrity wrong > > Yeah, I had that thought after I sent the reply to you earlier. > > The main reason I didn't do that before was that I had myself convinced > that we needed to do the check_and_advance as late as possible in the > fsync process, after the metadata had been written. > > Now that I think about it more, I think you're probably correct. As long > as we do the check and advance at some point after doing the > write_and_wait, we're fine here and shouldn't violate exactly once > semantics on the fsync return. So I have a file_write_and_wait_range now that should DTRT for this patch. The bigger question is -- what about more complex filesystems like ext4? There are a couple of cases where we can return -EIO or -EROFS on fsync before filemap_write_and_wait_range is ever called. Like this one for instance: if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; ...and the EXT4_MF_FS_ABORTED case. Are those conditions ever recoverable, such that a later fsync could succeed? IOW, could I do a remount or something such that the existing fds are left open and become usable again? If so, then we really ought to advance the errseq_t in the file when we catch those cases as well. If we have to do that, then it probably makes sense to leave the ext4 patch as-is.
On Mon, Jun 26, 2017 at 10:34:18AM -0400, Jeff Layton wrote: > The bigger question is -- what about more complex filesystems like > ext4? There are a couple of cases where we can return -EIO or -EROFS on > fsync before filemap_write_and_wait_range is ever called. Like this one > for instance: > > if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) > return -EIO; > > ...and the EXT4_MF_FS_ABORTED case. > > Are those conditions ever recoverable, such that a later fsync could > succeed? IOW, could I do a remount or something such that the existing > fds are left open and become usable again? This looks copied from the xfs forced shutdown code, and in that case it's final and permanent - you'll need an unmount to clear it. > If so, then we really ought to advance the errseq_t in the file when we > catch those cases as well. If we have to do that, then it probably makes > sense to leave the ext4 patch as-is. I think it can switch to the new file helper.
diff --git a/fs/block_dev.c b/fs/block_dev.c index dc839f8f0ba5..9e8e13b097ef 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -625,11 +625,11 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync) { struct inode *bd_inode = bdev_file_inode(filp); struct block_device *bdev = I_BDEV(bd_inode); - int error; + int error, wberr; error = filemap_write_and_wait_range(filp->f_mapping, start, end); if (error) - return error; + goto out; /* * There is no need to serialise calls to blkdev_issue_flush with @@ -640,6 +640,10 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync) if (error == -EOPNOTSUPP) error = 0; +out: + wberr = filemap_report_wb_err(filp); + if (!error) + error = wberr; return error; } EXPORT_SYMBOL(blkdev_fsync);
This is a very minimal conversion to errseq_t based error tracking for raw block device access. Only real change that is strictly required is that we must unconditionally call filemap_report_wb_err in blkdev_fsync. That ensures that the file's errseq_t is always advanced to the latest value in the mapping. Note that there are internal callers that call sync_blockdev and the like that are not affected by this. They'll continue to use the AS_EIO/AS_ENOSPC flags for error reporting like they always have for now. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/block_dev.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)