Message ID | 20240111073655.2095423-1-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | bcachefs: fix incorrect usage of REQ_OP_FLUSH | expand |
On Thu, Jan 11, 2024 at 08:36:55AM +0100, Christoph Hellwig wrote: > REQ_OP_FLUSH is only for internal use in the blk-mq and request based > drivers. File systems and other block layer consumers must use > REQ_OP_WRITE | REQ_PREFLUSH as documented in > Documentation/block/writeback_cache_control.rst. > > While REQ_OP_FLUSH appears to work for blk-mq drivers it does not > get the proper flush state machine handling, and completely fails > for any bio based drivers, including all the stacking drivers. The > block layer will also get a check in 6.8 to reject this use case > entirely. > > [Note: completely untested, but as this never got fixed since the > original bug report in November: > > https://bugzilla.kernel.org/show_bug.cgi?id=218184 > > and the the discussion in December: > > https://lore.kernel.org/all/20231221053016.72cqcfg46vxwohcj@moria.home.lan/T/ > > this seems to be best way to force it] > > Signed-off-by: Christoph Hellwig <hch@lst.de> Hey Christoph, thanks - applied.
On Thu, Jan 11, 2024 at 08:36:55AM +0100, Christoph Hellwig wrote: > REQ_OP_FLUSH is only for internal use in the blk-mq and request based > drivers. File systems and other block layer consumers must use > REQ_OP_WRITE | REQ_PREFLUSH as documented in > Documentation/block/writeback_cache_control.rst. > > While REQ_OP_FLUSH appears to work for blk-mq drivers it does not > get the proper flush state machine handling, and completely fails > for any bio based drivers, including all the stacking drivers. The > block layer will also get a check in 6.8 to reject this use case > entirely. > > [Note: completely untested, but as this never got fixed since the > original bug report in November: > > https://bugzilla.kernel.org/show_bug.cgi?id=218184 > > and the the discussion in December: > > https://lore.kernel.org/all/20231221053016.72cqcfg46vxwohcj@moria.home.lan/T/ > > this seems to be best way to force it] > > Signed-off-by: Christoph Hellwig <hch@lst.de> I've got a new bug report with this patch: https://www.reddit.com/r/bcachefs/comments/19a2u3c/error_writing_journal_entry/ We seem to be geting -EOPNOTSUPP back from REQ_OP_FLUSH?
On Fri, Jan 19, 2024 at 04:32:44PM -0500, Kent Overstreet wrote: > > I've got a new bug report with this patch: > https://www.reddit.com/r/bcachefs/comments/19a2u3c/error_writing_journal_entry/ > > We seem to be geting -EOPNOTSUPP back from REQ_OP_FLUSH? With the patch you are replying to you will not get -EOPNOTSUPP for REQ_OP_FLUSH?, because bcachefs won't send it as it's supposed to. Without this patch as in current mainline you will get -EOPNOTSUPP because sending REQ_OP_FLUSH and finally check for that to catch bugs like the one fixed with this patch.
On Mon, Jan 22, 2024 at 07:30:07AM +0100, Christoph Hellwig wrote: > On Fri, Jan 19, 2024 at 04:32:44PM -0500, Kent Overstreet wrote: > > > > I've got a new bug report with this patch: > > https://www.reddit.com/r/bcachefs/comments/19a2u3c/error_writing_journal_entry/ > > > > We seem to be geting -EOPNOTSUPP back from REQ_OP_FLUSH? > > With the patch you are replying to you will not get -EOPNOTSUPP > for REQ_OP_FLUSH?, because bcachefs won't send it as it's supposed to. > > Without this patch as in current mainline you will get -EOPNOTSUPP > because sending REQ_OP_FLUSH and finally check for that to catch bugs > like the one fixed with this patch. Then why did the user report -EOPNOTSUPP with the patch, which went away when reverted?
On Mon, Jan 22, 2024 at 01:37:45AM -0500, Kent Overstreet wrote: > > Without this patch as in current mainline you will get -EOPNOTSUPP > > because sending REQ_OP_FLUSH and finally check for that to catch bugs > > like the one fixed with this patch. > > Then why did the user report -EOPNOTSUPP with the patch, which went away > when reverted? I have no idea, as we never return -EOPNOTSUPP for REQ_OP_WRITE | REQ_PREFLUSH this would be odd. According to your report the users runs a Fedora kernel and this commit never went upstream or even into linux-next, which doesn't quite add a up to me either. But in the end just try it yourself. On current mainline with the buggy REQ_OP_FLUSH you should be seeing -EOPNOTSUPP every single time you try a data integrity operation, without it you should not. Even the most basic test should be catching this.
On Mon, Jan 22, 2024 at 07:50:38AM +0100, Christoph Hellwig wrote: > On Mon, Jan 22, 2024 at 01:37:45AM -0500, Kent Overstreet wrote: > > > Without this patch as in current mainline you will get -EOPNOTSUPP > > > because sending REQ_OP_FLUSH and finally check for that to catch bugs > > > like the one fixed with this patch. > > > > Then why did the user report -EOPNOTSUPP with the patch, which went away > > when reverted? > > I have no idea, as we never return -EOPNOTSUPP for > REQ_OP_WRITE | REQ_PREFLUSH this would be odd. According to your > report the users runs a Fedora kernel and this commit never went > upstream or even into linux-next, which doesn't quite add a up to me > either. Ahh - I misread the bug report (fedora puts out kernels before rc1!?). Thanks, your patch is back in :)
On Mon, Jan 22, 2024 at 12:37:10PM -0500, Kent Overstreet wrote: > Ahh - I misread the bug report (fedora puts out kernels before rc1!?). > Thanks, your patch is back in :) Please throw it in your test setup and watch the results carefully. I'm pretty sure it is correct, but I do not actually have a way to verify it right now.
On Mon, Jan 22, 2024 at 06:38:09PM +0100, Christoph Hellwig wrote: > On Mon, Jan 22, 2024 at 12:37:10PM -0500, Kent Overstreet wrote: > > Ahh - I misread the bug report (fedora puts out kernels before rc1!?). > > Thanks, your patch is back in :) > > Please throw it in your test setup and watch the results carefully. > I'm pretty sure it is correct, but I do not actually have a way to > verify it right now. updating my tests for the MD_FAULTY removal, then will do. Is there anything you want me to look for? considering most tests won't break or won't clearly break if flush/fua is being dropped (even generic/388 was passing reliably, of course, since virtual block devices aren't going to reorder writes...) maybe we could do some print statement sanity checking...
On Mon, Jan 22, 2024 at 12:42:24PM -0500, Kent Overstreet wrote: > updating my tests for the MD_FAULTY removal, then will do. Is there > anything you want me to look for? Nothing fancy. Just that you do data integrity operations and do not see an error. > considering most tests won't break or won't clearly break if flush/fua > is being dropped (even generic/388 was passing reliably, of course, > since virtual block devices aren't going to reorder writes...) maybe we > could do some print statement sanity checking... Well, xfstests is not very good about power fail testing, because it can't inject a power fail.. Which is a problem in it's own, but would need hardware power failing or at least some pretty good VM emulation of the same outside the scope of the actual xfstests runner. But basically what this patch needs is just seeing that you don't get I/O errors on anything that flushes, which on currently mainline without the patch you should be seeing.
On Mon, Jan 22, 2024 at 07:41:47PM +0100, Christoph Hellwig wrote: > On Mon, Jan 22, 2024 at 12:42:24PM -0500, Kent Overstreet wrote: > > updating my tests for the MD_FAULTY removal, then will do. Is there > > anything you want me to look for? > > Nothing fancy. Just that you do data integrity operations and do > not see an error. > > > considering most tests won't break or won't clearly break if flush/fua > > is being dropped (even generic/388 was passing reliably, of course, > > since virtual block devices aren't going to reorder writes...) maybe we > > could do some print statement sanity checking... > > Well, xfstests is not very good about power fail testing, because it > can't inject a power fail.. Which is a problem in it's own, but > would need hardware power failing or at least some pretty good VM > emulation of the same outside the scope of the actual xfstests runner. We do actually have stuff in fstests that checks write vs flush ordering as issued by the filesystem. We use dm-logwrites for tracking the writes and flushes and to be able to replay arbitrary groups of writes between flushes. generic/482 is one of those tests. These are the ordering problems that power failures expose, so we do actually have tests that cover the same conditions that pulling the power from a device would exercise. I even wrote a debugging script (tools/dm-logwrite-replay) to iterate write+fua groups in the test corpse to isolate the write groups where the consistency failure occurs when doing work to optimise flushes being issued by the XFS journal checkpoint writes. Cheers, Dave.
diff --git a/fs/bcachefs/fs-io.c b/fs/bcachefs/fs-io.c index b0e8144ec5500c..a8500af6c7438f 100644 --- a/fs/bcachefs/fs-io.c +++ b/fs/bcachefs/fs-io.c @@ -79,7 +79,7 @@ void bch2_inode_flush_nocow_writes_async(struct bch_fs *c, continue; bio = container_of(bio_alloc_bioset(ca->disk_sb.bdev, 0, - REQ_OP_FLUSH, + REQ_OP_WRITE | REQ_PREFLUSH, GFP_KERNEL, &c->nocow_flush_bioset), struct nocow_flush, bio); diff --git a/fs/bcachefs/journal_io.c b/fs/bcachefs/journal_io.c index 3eb6c3f62a811b..43c76c9ad9a316 100644 --- a/fs/bcachefs/journal_io.c +++ b/fs/bcachefs/journal_io.c @@ -1948,7 +1948,8 @@ CLOSURE_CALLBACK(bch2_journal_write) percpu_ref_get(&ca->io_ref); bio = ca->journal.bio; - bio_reset(bio, ca->disk_sb.bdev, REQ_OP_FLUSH); + bio_reset(bio, ca->disk_sb.bdev, + REQ_OP_WRITE | REQ_PREFLUSH), bio->bi_end_io = journal_write_endio; bio->bi_private = ca; closure_bio_submit(bio, cl);
REQ_OP_FLUSH is only for internal use in the blk-mq and request based drivers. File systems and other block layer consumers must use REQ_OP_WRITE | REQ_PREFLUSH as documented in Documentation/block/writeback_cache_control.rst. While REQ_OP_FLUSH appears to work for blk-mq drivers it does not get the proper flush state machine handling, and completely fails for any bio based drivers, including all the stacking drivers. The block layer will also get a check in 6.8 to reject this use case entirely. [Note: completely untested, but as this never got fixed since the original bug report in November: https://bugzilla.kernel.org/show_bug.cgi?id=218184 and the the discussion in December: https://lore.kernel.org/all/20231221053016.72cqcfg46vxwohcj@moria.home.lan/T/ this seems to be best way to force it] Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/bcachefs/fs-io.c | 2 +- fs/bcachefs/journal_io.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-)