Message ID | 1440522583-32120-1-git-send-email-jbacik@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Aug 25, 2015 at 01:09:43PM -0400, Josef Bacik wrote: > If we have an fsync at the same time in two seperate subvolumes we could end up > with the tree log pointing at invalid blocks. We need to notice if our writeout Mind to share more details of the problem? Thanks, -liubo > failed in anyway, if it did then we need to do a full transaction commit and > return an error on the subvolume that gave us the io error. Thanks, > > Signed-off-by: Josef Bacik <jbacik@fb.com> > --- > fs/btrfs/file.c | 4 ++++ > fs/btrfs/tree-log.c | 4 ++++ > 2 files changed, 8 insertions(+) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index b823fac..c8f49f5 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -2054,6 +2054,10 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) > if (!ret) { > ret = btrfs_end_transaction(trans, root); > goto out; > + } else if (ctx.io_err) { > + btrfs_end_transaction(trans, root); > + ret = ctx.io_err; > + goto out; > } > } > if (!full_sync) { > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index 1bbaace..b4f15f5 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -2857,6 +2857,10 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, > ret = btrfs_wait_marked_extents(log, &log->dirty_log_pages, > mark); > btrfs_wait_logged_extents(trans, log, log_transid); > + if (ret) { > + btrfs_set_log_full_commit(root->fs_info, trans); > + ctx->io_err = ret; > + } > wait_log_commit(log_root_tree, > root_log_ctx.log_transid); > mutex_unlock(&log_root_tree->log_mutex); > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/25/2015 10:06 PM, Liu Bo wrote: > On Tue, Aug 25, 2015 at 01:09:43PM -0400, Josef Bacik wrote: >> If we have an fsync at the same time in two seperate subvolumes we could end up >> with the tree log pointing at invalid blocks. We need to notice if our writeout > > Mind to share more details of the problem? > It's the problem Filipe was trying to solve. Say we fsync() on two different subvols, one of them will race in and be the one who commits the log root tree. So process A waits on process B to add its log to the log root tree and write out its log. If we get an IO error while writing out the log the log root tree will be pointing to invalid crap, and we also won't return an error back to userspace. We need to notice if there was an error, turn on the transaction commit stuff since we've already updated the the log root tree with our subvol log so we don't get garbage on the disk, and we need to return an error to process B. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 25, 2015 at 01:09:43PM -0400, Josef Bacik wrote: > If we have an fsync at the same time in two seperate subvolumes we could end up > with the tree log pointing at invalid blocks. We need to notice if our writeout > failed in anyway, if it did then we need to do a full transaction commit and > return an error on the subvolume that gave us the io error. Thanks, Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Thanks, -liubo > > Signed-off-by: Josef Bacik <jbacik@fb.com> > --- > fs/btrfs/file.c | 4 ++++ > fs/btrfs/tree-log.c | 4 ++++ > 2 files changed, 8 insertions(+) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index b823fac..c8f49f5 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -2054,6 +2054,10 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) > if (!ret) { > ret = btrfs_end_transaction(trans, root); > goto out; > + } else if (ctx.io_err) { > + btrfs_end_transaction(trans, root); > + ret = ctx.io_err; > + goto out; > } > } > if (!full_sync) { > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index 1bbaace..b4f15f5 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -2857,6 +2857,10 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, > ret = btrfs_wait_marked_extents(log, &log->dirty_log_pages, > mark); > btrfs_wait_logged_extents(trans, log, log_transid); > + if (ret) { > + btrfs_set_log_full_commit(root->fs_info, trans); > + ctx->io_err = ret; > + } > wait_log_commit(log_root_tree, > root_log_ctx.log_transid); > mutex_unlock(&log_root_tree->log_mutex); > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 26, 2015 at 8:06 PM, Josef Bacik <jbacik@fb.com> wrote: > On 08/25/2015 10:06 PM, Liu Bo wrote: >> >> On Tue, Aug 25, 2015 at 01:09:43PM -0400, Josef Bacik wrote: >>> >>> If we have an fsync at the same time in two seperate subvolumes we could >>> end up >>> with the tree log pointing at invalid blocks. We need to notice if our >>> writeout >> >> >> Mind to share more details of the problem? >> > > It's the problem Filipe was trying to solve. Say we fsync() on two > different subvols, one of them will race in and be the one who commits the > log root tree. So process A waits on process B to add its log to the log > root tree and write out its log. If we get an IO error while writing out > the log the log root tree will be pointing to invalid crap, and we also > won't return an error back to userspace. We need to notice if there was an > error, turn on the transaction commit stuff since we've already updated the > the log root tree with our subvol log so we don't get garbage on the disk, > and we need to return an error to process B. Thanks, Josef, So the problem was that without forcing A to trigger a transaction commit if B gets an error when writing one or more log tree nodes/leafs, A could write a superblock pointing to a log root tree for which not all nodes/leafs were persisted. Then B would fall back to a transaction commit and everything would be ok - i.e. we would only have a "small" time window where a superblock points to an invalid log root tree. So the fix here shouldn't be only to force A do a transaction commit (call btrfs_set_log_full_commit())? Why do we need to make B return an error to userspace and not fallback to a transaction commit too (as it was before this change)? After all this is the kind of failure for which we can fallback to a transaction commit without losing any inode metadata (links, owner, group, xattrs, etc) nor file data. The change looks good, just puzzled why we need to return the error to userspace. thanks > > Josef > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/28/2015 09:23 AM, Filipe David Manana wrote: > On Wed, Aug 26, 2015 at 8:06 PM, Josef Bacik <jbacik@fb.com> wrote: >> On 08/25/2015 10:06 PM, Liu Bo wrote: >>> >>> On Tue, Aug 25, 2015 at 01:09:43PM -0400, Josef Bacik wrote: >>>> >>>> If we have an fsync at the same time in two seperate subvolumes we could >>>> end up >>>> with the tree log pointing at invalid blocks. We need to notice if our >>>> writeout >>> >>> >>> Mind to share more details of the problem? >>> >> >> It's the problem Filipe was trying to solve. Say we fsync() on two >> different subvols, one of them will race in and be the one who commits the >> log root tree. So process A waits on process B to add its log to the log >> root tree and write out its log. If we get an IO error while writing out >> the log the log root tree will be pointing to invalid crap, and we also >> won't return an error back to userspace. We need to notice if there was an >> error, turn on the transaction commit stuff since we've already updated the >> the log root tree with our subvol log so we don't get garbage on the disk, >> and we need to return an error to process B. Thanks, > > Josef, > > So the problem was that without forcing A to trigger a transaction > commit if B gets an error when writing one or more log tree > nodes/leafs, A could write a superblock pointing to a log root tree > for which not all nodes/leafs were persisted. Then B would fall back > to a transaction commit and everything would be ok - i.e. we would > only have a "small" time window where a superblock points to an > invalid log root tree. > > So the fix here shouldn't be only to force A do a transaction commit > (call btrfs_set_log_full_commit())? Why do we need to make B return an > error to userspace and not fallback to a transaction commit too (as it > was before this change)? After all this is the kind of failure for > which we can fallback to a transaction commit without losing any inode > metadata (links, owner, group, xattrs, etc) nor file data. > > The change looks good, just puzzled why we need to return the error to > userspace. > Ah well there's a reason in upcoming patches! Basically I'm moving the wait on ordered extents back into the sync log stuff so I can have my go fast stripes back, and when I do that we'll want to return an error since it could be from wait_ordered_extents. But you are right, if we only error out writing the metadata we can just commit the transaction and not return an error. I can change this bit to not return an error if we want, or wait for my other patches which will fix it up. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index b823fac..c8f49f5 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2054,6 +2054,10 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) if (!ret) { ret = btrfs_end_transaction(trans, root); goto out; + } else if (ctx.io_err) { + btrfs_end_transaction(trans, root); + ret = ctx.io_err; + goto out; } } if (!full_sync) { diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 1bbaace..b4f15f5 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2857,6 +2857,10 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, ret = btrfs_wait_marked_extents(log, &log->dirty_log_pages, mark); btrfs_wait_logged_extents(trans, log, log_transid); + if (ret) { + btrfs_set_log_full_commit(root->fs_info, trans); + ctx->io_err = ret; + } wait_log_commit(log_root_tree, root_log_ctx.log_transid); mutex_unlock(&log_root_tree->log_mutex);
If we have an fsync at the same time in two seperate subvolumes we could end up with the tree log pointing at invalid blocks. We need to notice if our writeout failed in anyway, if it did then we need to do a full transaction commit and return an error on the subvolume that gave us the io error. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> --- fs/btrfs/file.c | 4 ++++ fs/btrfs/tree-log.c | 4 ++++ 2 files changed, 8 insertions(+)