Message ID | 20220209085243.3136536-1-lee.jones@linaro.org (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | [1/1] Revert "iomap: fall back to buffered writes for invalidation failures" | expand |
On 2022/02/09 17:52, Lee Jones wrote: > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7. > > Reverting since this commit opens a potential avenue for abuse. > > The C-reproducer and more information can be found at the link below. > > With this patch applied, I can no longer get the repro to trigger. > > kernel BUG at fs/ext4/inode.c:2647! > invalid opcode: 0000 [#1] PREEMPT SMP KASAN > CPU: 0 PID: 459 Comm: syz-executor359 Tainted: G W 5.10.93-syzkaller-01028-g0347b1658399 #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:mpage_prepare_extent_to_map+0xbe9/0xc00 fs/ext4/inode.c:2647 > Code: e8 fc bd c3 ff e9 80 f6 ff ff 44 89 e9 80 e1 07 38 c1 0f 8c a6 fe ff ff 4c 89 ef e8 e1 bd c3 ff e9 99 fe ff ff e8 87 c9 89 ff <0f> 0b e8 80 c9 89 ff 0f 0b e8 79 1e b8 02 66 0f 1f 84 00 00 00 00 > RSP: 0018:ffffc90000e2e9c0 EFLAGS: 00010293 > RAX: ffffffff81e321f9 RBX: 0000000000000000 RCX: ffff88810c12cf00 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: ffffc90000e2eb90 R08: ffffffff81e31e71 R09: fffff940008d68b1 > R10: fffff940008d68b1 R11: 0000000000000000 R12: ffffea00046b4580 > R13: ffffc90000e2ea80 R14: 000000000000011e R15: dffffc0000000000 > FS: 00007fcfd0717700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fcfd07d5520 CR3: 000000010a142000 CR4: 00000000003506b0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > ext4_writepages+0xcbb/0x3950 fs/ext4/inode.c:2776 > do_writepages+0x13a/0x280 mm/page-writeback.c:2358 > __filemap_fdatawrite_range+0x356/0x420 mm/filemap.c:427 > filemap_write_and_wait_range+0x64/0xe0 mm/filemap.c:660 > __iomap_dio_rw+0x621/0x10c0 fs/iomap/direct-io.c:495 > iomap_dio_rw+0x35/0x80 fs/iomap/direct-io.c:611 > ext4_dio_write_iter fs/ext4/file.c:571 [inline] > ext4_file_write_iter+0xfc5/0x1b70 fs/ext4/file.c:681 > do_iter_readv_writev+0x548/0x740 include/linux/fs.h:1941 > do_iter_write+0x182/0x660 fs/read_write.c:866 > vfs_iter_write+0x7c/0xa0 fs/read_write.c:907 > iter_file_splice_write+0x7fb/0xf70 fs/splice.c:686 > do_splice_from fs/splice.c:764 [inline] > direct_splice_actor+0xfe/0x130 fs/splice.c:933 > splice_direct_to_actor+0x4f4/0xbc0 fs/splice.c:888 > do_splice_direct+0x28b/0x3e0 fs/splice.c:976 > do_sendfile+0x914/0x1390 fs/read_write.c:1257 > > Link: https://syzkaller.appspot.com/bug?extid=41c966bf0729a530bd8d > > From the patch: > Cc: Stable <stable@vger.kernel.org> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Dave Chinner <dchinner@redhat.com> > Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> > Cc: Darrick J. Wong <darrick.wong@oracle.com> > Cc: Bob Peterson <rpeterso@redhat.com> > Cc: Damien Le Moal <damien.lemoal@wdc.com> > Cc: Theodore Ts'o <tytso@mit.edu> # for ext4 > Cc: Andreas Gruenbacher <agruenba@redhat.com> > Cc: Ritesh Harjani <riteshh@linux.ibm.com> > > Others: > Cc: Johannes Thumshirn <jth@kernel.org> > Cc: linux-xfs@vger.kernel.org > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-ext4@vger.kernel.org > Cc: cluster-devel@redhat.com > > Fixes: 60263d5889e6d ("iomap: fall back to buffered writes for invalidation failures") > Reported-by: syzbot+0ed9f769264276638893@syzkaller.appspotmail.com > Signed-off-by: Lee Jones <lee.jones@linaro.org> For zonefs: Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote: > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7. > > Reverting since this commit opens a potential avenue for abuse. > > The C-reproducer and more information can be found at the link below. > > With this patch applied, I can no longer get the repro to trigger. Well, maybe you should actually debug and try to understand what is going on before blindly reverting random commits.
On Wed, Feb 9, 2022 at 4:09 PM Christoph Hellwig <hch@lst.de> wrote: > On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote: > > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7. > > > > Reverting since this commit opens a potential avenue for abuse. > > > > The C-reproducer and more information can be found at the link below. This reproducer seems to be working fine on gfs2, but the locking in gfs2 differs hugely from that of other filesystems. > > With this patch applied, I can no longer get the repro to trigger. > > Well, maybe you should actually debug and try to understand what is > going on before blindly reverting random commits. Andreas
On Wed, 09 Feb 2022, Christoph Hellwig wrote: > On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote: > > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7. > > > > Reverting since this commit opens a potential avenue for abuse. > > > > The C-reproducer and more information can be found at the link below. > > > > With this patch applied, I can no longer get the repro to trigger. > > Well, maybe you should actually debug and try to understand what is > going on before blindly reverting random commits. That is not a reasonable suggestion. Requesting that someone becomes an area expert on a huge and complex subject such as file systems (various) in order to fix your broken code is not rational. If you'd like to use the PoC provided as a basis to test your own solution, then go right ahead. However, as it stands this API should be considered to contain security risk and should be patched as quickly as can be mustered. Reversion of the offending commit seems to be the fastest method to achieve that currently.
On Wed, Feb 09, 2022 at 03:59:48PM +0000, Lee Jones wrote: > On Wed, 09 Feb 2022, Christoph Hellwig wrote: > > > On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote: > > > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7. > > > > > > Reverting since this commit opens a potential avenue for abuse. > > > > > > The C-reproducer and more information can be found at the link below. > > > > > > With this patch applied, I can no longer get the repro to trigger. > > > > Well, maybe you should actually debug and try to understand what is > > going on before blindly reverting random commits. > > That is not a reasonable suggestion. > > Requesting that someone becomes an area expert on a huge and complex > subject such as file systems (various) in order to fix your broken > code is not rational. Sending a patch to revert a change you don't understand is also not rational. If you've bisected it to a single change -- great! If reverting the patch still fixes the bug -- also great! But don't send a patch when you clearly don't understand what the patch did. > If you'd like to use the PoC provided as a basis to test your own > solution, then go right ahead. However, as it stands this API should > be considered to contain security risk and should be patched as > quickly as can be mustered. Reversion of the offending commit seems > to be the fastest method to achieve that currently. This is incoherent. There is no security risk.
On Wed, 09 Feb 2022, Matthew Wilcox wrote: > On Wed, Feb 09, 2022 at 03:59:48PM +0000, Lee Jones wrote: > > On Wed, 09 Feb 2022, Christoph Hellwig wrote: > > > > > On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote: > > > > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7. > > > > > > > > Reverting since this commit opens a potential avenue for abuse. > > > > > > > > The C-reproducer and more information can be found at the link below. > > > > > > > > With this patch applied, I can no longer get the repro to trigger. > > > > > > Well, maybe you should actually debug and try to understand what is > > > going on before blindly reverting random commits. > > > > That is not a reasonable suggestion. > > > > Requesting that someone becomes an area expert on a huge and complex > > subject such as file systems (various) in order to fix your broken > > code is not rational. > > Sending a patch to revert a change you don't understand is also > not rational. If you've bisected it to a single change -- great! > If reverting the patch still fixes the bug -- also great! But > don't send a patch when you clearly don't understand what the > patch did. If reverting isn't the correct thing to do here, please consider this as a bug report.
On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote: > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7. > > Reverting since this commit opens a potential avenue for abuse. What kind of abuse? Did you conclude there's an avenue solely because some combination of userspace rigging produced a BUG warning? Or is this a real problem that someone found? > The C-reproducer and more information can be found at the link below. No. Post the information and your analysis here. I'm not going to dig into some Google site to find out what happened, and I'm not going to assume that future developers will be able to access that URL to learn why this patch was created. I am not convinced that this revert is a proper fix for ... whatever is the problem, because you never explained what happened. > With this patch applied, I can no longer get the repro to trigger. With ext4 completely reverted, I can no longer get the repro to trigger either. > kernel BUG at fs/ext4/inode.c:2647! > invalid opcode: 0000 [#1] PREEMPT SMP KASAN > CPU: 0 PID: 459 Comm: syz-executor359 Tainted: G W 5.10.93-syzkaller-01028-g0347b1658399 #0 What BUG on fs/ext4/inode.c:2647? https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/inode.c?h=v5.10.93#n2647 All I see is a call to pagevec_release()? There's a BUG_ON further up if we wait for page writeback but then it still has Writeback set. But I don't see anything in pagevec_release that would actually result in a BUG warning. Oh, right, this is one of those inscrutable syzkaller things, where a person can spend hours figuring out what the hell it set up. Yeah...no, you don't get to submit patches to core kernel code, claim it's not your responsibility to know anything about a subsystem that you want to patch, and then expect us to do the work for you. If you pick up a syzkaller report, you get to figure out what broke, why, and how to fix it in a reasonable manner. You're a maintainer, would you accept a patch like this? NAK. --D OH WAIT, you're running this on the Android 5.10 kernel, aren't you? The BUG report came from page_buffers failing to find any buffer heads attached to the page. https://android.googlesource.com/kernel/common/+/refs/heads/android12-5.10-2022-02/fs/ext4/inode.c#2647 Yeah, don't care. > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:mpage_prepare_extent_to_map+0xbe9/0xc00 fs/ext4/inode.c:2647 > Code: e8 fc bd c3 ff e9 80 f6 ff ff 44 89 e9 80 e1 07 38 c1 0f 8c a6 fe ff ff 4c 89 ef e8 e1 bd c3 ff e9 99 fe ff ff e8 87 c9 89 ff <0f> 0b e8 80 c9 89 ff 0f 0b e8 79 1e b8 02 66 0f 1f 84 00 00 00 00 > RSP: 0018:ffffc90000e2e9c0 EFLAGS: 00010293 > RAX: ffffffff81e321f9 RBX: 0000000000000000 RCX: ffff88810c12cf00 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: ffffc90000e2eb90 R08: ffffffff81e31e71 R09: fffff940008d68b1 > R10: fffff940008d68b1 R11: 0000000000000000 R12: ffffea00046b4580 > R13: ffffc90000e2ea80 R14: 000000000000011e R15: dffffc0000000000 > FS: 00007fcfd0717700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fcfd07d5520 CR3: 000000010a142000 CR4: 00000000003506b0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > ext4_writepages+0xcbb/0x3950 fs/ext4/inode.c:2776 > do_writepages+0x13a/0x280 mm/page-writeback.c:2358 > __filemap_fdatawrite_range+0x356/0x420 mm/filemap.c:427 > filemap_write_and_wait_range+0x64/0xe0 mm/filemap.c:660 > __iomap_dio_rw+0x621/0x10c0 fs/iomap/direct-io.c:495 > iomap_dio_rw+0x35/0x80 fs/iomap/direct-io.c:611 > ext4_dio_write_iter fs/ext4/file.c:571 [inline] > ext4_file_write_iter+0xfc5/0x1b70 fs/ext4/file.c:681 > do_iter_readv_writev+0x548/0x740 include/linux/fs.h:1941 > do_iter_write+0x182/0x660 fs/read_write.c:866 > vfs_iter_write+0x7c/0xa0 fs/read_write.c:907 > iter_file_splice_write+0x7fb/0xf70 fs/splice.c:686 > do_splice_from fs/splice.c:764 [inline] > direct_splice_actor+0xfe/0x130 fs/splice.c:933 > splice_direct_to_actor+0x4f4/0xbc0 fs/splice.c:888 > do_splice_direct+0x28b/0x3e0 fs/splice.c:976 > do_sendfile+0x914/0x1390 fs/read_write.c:1257 > > Link: https://syzkaller.appspot.com/bug?extid=41c966bf0729a530bd8d > > From the patch: > Cc: Stable <stable@vger.kernel.org> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Dave Chinner <dchinner@redhat.com> > Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> > Cc: Darrick J. Wong <darrick.wong@oracle.com> > Cc: Bob Peterson <rpeterso@redhat.com> > Cc: Damien Le Moal <damien.lemoal@wdc.com> > Cc: Theodore Ts'o <tytso@mit.edu> # for ext4 > Cc: Andreas Gruenbacher <agruenba@redhat.com> > Cc: Ritesh Harjani <riteshh@linux.ibm.com> > > Others: > Cc: Johannes Thumshirn <jth@kernel.org> > Cc: linux-xfs@vger.kernel.org > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-ext4@vger.kernel.org > Cc: cluster-devel@redhat.com > > Fixes: 60263d5889e6d ("iomap: fall back to buffered writes for invalidation failures") > Reported-by: syzbot+0ed9f769264276638893@syzkaller.appspotmail.com > Signed-off-by: Lee Jones <lee.jones@linaro.org> > --- > fs/ext4/file.c | 2 -- > fs/gfs2/file.c | 3 +-- > fs/iomap/direct-io.c | 16 +++++----------- > fs/iomap/trace.h | 1 - > fs/xfs/xfs_file.c | 4 ++-- > fs/zonefs/super.c | 7 ++----- > 6 files changed, 10 insertions(+), 23 deletions(-) > > diff --git a/fs/ext4/file.c b/fs/ext4/file.c > index 3ed8c048fb12c..cb347c3b35535 100644 > --- a/fs/ext4/file.c > +++ b/fs/ext4/file.c > @@ -551,8 +551,6 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) > iomap_ops = &ext4_iomap_overwrite_ops; > ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, > is_sync_kiocb(iocb) || unaligned_io || extend); > - if (ret == -ENOTBLK) > - ret = 0; > > if (extend) > ret = ext4_handle_inode_extension(inode, offset, ret, count); > diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c > index b39b339feddc9..847adb97380d3 100644 > --- a/fs/gfs2/file.c > +++ b/fs/gfs2/file.c > @@ -835,8 +835,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, > > ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, > is_sync_kiocb(iocb)); > - if (ret == -ENOTBLK) > - ret = 0; > + > out: > gfs2_glock_dq(gh); > out_uninit: > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index 933f234d5becd..ddcd06c0c452d 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -10,7 +10,6 @@ > #include <linux/backing-dev.h> > #include <linux/uio.h> > #include <linux/task_io_accounting_ops.h> > -#include "trace.h" > > #include "../internal.h" > > @@ -413,9 +412,6 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, > * can be mapped into multiple disjoint IOs and only a subset of the IOs issued > * may be pure data writes. In that case, we still need to do a full data sync > * completion. > - * > - * Returns -ENOTBLK In case of a page invalidation invalidation failure for > - * writes. The callers needs to fall back to buffered I/O in this case. > */ > struct iomap_dio * > __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > @@ -493,15 +489,13 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > if (iov_iter_rw(iter) == WRITE) { > /* > * Try to invalidate cache pages for the range we are writing. > - * If this invalidation fails, let the caller fall back to > - * buffered I/O. > + * If this invalidation fails, tough, the write will still work, > + * but racing two incompatible write paths is a pretty crazy > + * thing to do, so we don't support it 100%. > */ > if (invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, > - end >> PAGE_SHIFT)) { > - trace_iomap_dio_invalidate_fail(inode, pos, count); > - ret = -ENOTBLK; > - goto out_free_dio; > - } > + end >> PAGE_SHIFT)) > + dio_warn_stale_pagecache(iocb->ki_filp); > > if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) { > ret = sb_init_dio_done_wq(inode->i_sb); > diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h > index fdc7ae388476f..5693a39d52fb6 100644 > --- a/fs/iomap/trace.h > +++ b/fs/iomap/trace.h > @@ -74,7 +74,6 @@ DEFINE_EVENT(iomap_range_class, name, \ > DEFINE_RANGE_EVENT(iomap_writepage); > DEFINE_RANGE_EVENT(iomap_releasepage); > DEFINE_RANGE_EVENT(iomap_invalidatepage); > -DEFINE_RANGE_EVENT(iomap_dio_invalidate_fail); > > #define IOMAP_TYPE_STRINGS \ > { IOMAP_HOLE, "HOLE" }, \ > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 5b0f93f738372..43e2c057214d9 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -589,8 +589,8 @@ xfs_file_dio_aio_write( > xfs_iunlock(ip, iolock); > > /* > - * No fallback to buffered IO after short writes for XFS, direct I/O > - * will either complete fully or return an error. > + * No fallback to buffered IO on errors for XFS, direct IO will either > + * complete fully or fail. > */ > ASSERT(ret < 0 || ret == count); > return ret; > diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c > index bec47f2d074be..d54fbef3db4df 100644 > --- a/fs/zonefs/super.c > +++ b/fs/zonefs/super.c > @@ -851,11 +851,8 @@ static ssize_t zonefs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > if (iocb->ki_pos >= ZONEFS_I(inode)->i_max_size) > return -EFBIG; > > - if (iocb->ki_flags & IOCB_DIRECT) { > - ssize_t ret = zonefs_file_dio_write(iocb, from); > - if (ret != -ENOTBLK) > - return ret; > - } > + if (iocb->ki_flags & IOCB_DIRECT) > + return zonefs_file_dio_write(iocb, from); > > return zonefs_file_buffered_write(iocb, from); > } > -- > 2.35.0.263.gb82422642f-goog >
On Wed, Feb 09, 2022 at 03:59:48PM +0000, Lee Jones wrote: > > Well, maybe you should actually debug and try to understand what is > > going on before blindly reverting random commits. > > That is not a reasonable suggestion. In that case we fudamentally disagree what "reasonable" means. Sending a revert for a 2 year old commit based on a BUG in consumer of that subsystem based on a really old kernel without any explanation of why that revert is even directly related to the problem and not just a casuality is what I would define as completely unreasable. Please send a proper bug report to the ext4 maintainer first, and we can work from there.
On Wed, 09 Feb 2022, Darrick J. Wong wrote: > On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote: > > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7. > > > > Reverting since this commit opens a potential avenue for abuse. > > What kind of abuse? Did you conclude there's an avenue solely because > some combination of userspace rigging produced a BUG warning? Or is > this a real problem that someone found? Genuine question: Is the ability for userspace to crash the kernel not enough to cause concern? I would have thought that we'd want to prevent this. If by 'real problem' you mean; privilege escalation, memory corruption or data leakage, then no, I haven't found any evidence of that. However, that's not to say these aren't possible as a result of this issue, just that I do not have the skills or knowledge to be able to turn this into a demonstrable attack vector. However, if you say there is no issue, I'm happy to take your word. > > The C-reproducer and more information can be found at the link below. > > No. Post the information and your analysis here. I'm not going to dig > into some Google site to find out what happened, and I'm not going to > assume that future developers will be able to access that URL to learn > why this patch was created. The link provided doesn't contain any further analysis. Only the reproducer and kernel configuration used, which are both too large to enter into a Git commit. > > kernel BUG at fs/ext4/inode.c:2647! > > invalid opcode: 0000 [#1] PREEMPT SMP KASAN > > CPU: 0 PID: 459 Comm: syz-executor359 Tainted: G W 5.10.93-syzkaller-01028-g0347b1658399 #0 > > What BUG on fs/ext4/inode.c:2647? > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/inode.c?h=v5.10.93#n2647 > > All I see is a call to pagevec_release()? There's a BUG_ON further up > if we wait for page writeback but then it still has Writeback set. But > I don't see anything in pagevec_release that would actually result in a > BUG warning. Right, this BUG back-trace was taken from the kernel I received the bug report for. I should have used the one I triggered in Mainline, apologies for that. The real source of the BUG is in the inlined call to page_buffers(). Here is the link for the latest release kernel: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/inode.c?h=v5.16#n2620 #define page_buffers(page) \ ({ \ BUG_ON(!PagePrivate(page)); \ ((struct buffer_head *)page_private(page)); \ }) > Oh, right, this is one of those inscrutable syzkaller things, where a > person can spend hours figuring out what the hell it set up. A link to the config used (again too big to enter into a commit message), can be easily sourced from the link provided. > Yeah...no, you don't get to submit patches to core kernel code, claim > it's not your responsibility to know anything about a subsystem that you > want to patch, and then expect us to do the work for you. If you pick > up a syzkaller report, you get to figure out what broke, why, and how to > fix it in a reasonable manner. > > You're a maintainer, would you accept a patch like this? No. I would share my knowledge to provide a helpful review and work with the contributor to find a solution (if applicable). > OH WAIT, you're running this on the Android 5.10 kernel, aren't you? > The BUG report came from page_buffers failing to find any buffer heads > attached to the page. > https://android.googlesource.com/kernel/common/+/refs/heads/android12-5.10-2022-02/fs/ext4/inode.c#2647 Yes, the H/W I have to prototype these on is a phone and the report that came in was specifically built against the aforementioned kernel. > Yeah, don't care. "There is nothing to worry about, as it's intended behaviour"?
On Thu, Feb 10, 2022 at 10:15:52AM +0000, Lee Jones wrote: > On Wed, 09 Feb 2022, Darrick J. Wong wrote: > > > On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote: > > > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7. > > > > > > Reverting since this commit opens a potential avenue for abuse. > > > > What kind of abuse? Did you conclude there's an avenue solely because > > some combination of userspace rigging produced a BUG warning? Or is > > this a real problem that someone found? > > Genuine question: Is the ability for userspace to crash the kernel > not enough to cause concern? I would have thought that we'd want to > prevent this. The kernel doesn't crash. It's a BUG(). That means it kills the task which caused the BUG(). If you've specified that the kernel should crash on seeing a BUG(), well, you made that decision, and you get to live with the consequences. > The link provided doesn't contain any further analysis. Only the > reproducer and kernel configuration used, which are both too large to > enter into a Git commit. But not too large to put in an email. Which you should have sent to begin with, not a stupid reversion commit. > > OH WAIT, you're running this on the Android 5.10 kernel, aren't you? > > The BUG report came from page_buffers failing to find any buffer heads > > attached to the page. > > https://android.googlesource.com/kernel/common/+/refs/heads/android12-5.10-2022-02/fs/ext4/inode.c#2647 > > Yes, the H/W I have to prototype these on is a phone and the report > that came in was specifically built against the aforementioned > kernel. > > > Yeah, don't care. > > "There is nothing to worry about, as it's intended behaviour"? No. You've come in like a fucking meteorite full of arrogance and ignorance. Nobody's reacting well to you right now. Start again, write a good bug report in a new thread.
Let's attempt to seek beyond the mud slinging, swearing and the whiny amateur dramatics for just a brief moment and concentrate solely on the technicals please. On Fri, 11 Feb 2022, Matthew Wilcox wrote: > On Thu, Feb 10, 2022 at 10:15:52AM +0000, Lee Jones wrote: > > On Wed, 09 Feb 2022, Darrick J. Wong wrote: > > > > > On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote: > > > > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7. > > > > > > > > Reverting since this commit opens a potential avenue for abuse. > > > > > > What kind of abuse? Did you conclude there's an avenue solely because > > > some combination of userspace rigging produced a BUG warning? Or is > > > this a real problem that someone found? > > > > Genuine question: Is the ability for userspace to crash the kernel > > not enough to cause concern? I would have thought that we'd want to > > prevent this. > > The kernel doesn't crash. It's a BUG(). That means it kills the > task which caused the BUG(). If you've specified that the kernel should > crash on seeing a BUG(), well, you made that decision, and you get to > live with the consequences. BUG() calls are architecture specific. If no override is provided, the default appears to panic ("crash") the kernel: /* * Don't use BUG() or BUG_ON() unless there's really no way out; one * example might be detecting data structure corruption in the middle * of an operation that can't be backed out of. If the (sub)system * can somehow continue operating, perhaps with reduced functionality, * it's probably not BUG-worthy. * * If you're tempted to BUG(), think again: is completely giving up * really the *only* solution? There are usually better options, where * users don't need to reboot ASAP and can mostly shut down cleanly. */ #ifndef HAVE_ARCH_BUG #define BUG() do { \ printk("BUG: failure at %s:%d/%s()!\n", __FILE__, __LINE__, __func__); \ barrier_before_unreachable(); \ panic("BUG!"); \ } while (0) #endif The kernel I tested with panics and reboots: Kernel panic - not syncing: Fatal exception Here are the BUG related kernel configs I have set: CONFIG_BUG=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y # CONFIG_INPUT_EVBUG is not set CONFIG_BUG_ON_DATA_CORRUPTION=y Not seeing a "CONFIG_PANIC_ON_BUG" equivalent. What is missing? Unless of course you mean disabling BUG support entirely. In which case, this is strongly advised against in the help section and I'm not sure of many development or production kernels that do this. config BUG bool "BUG() support" if EXPERT default y help Disabling this option eliminates support for BUG and WARN, reducing the size of your kernel image and potentially quietly ignoring numerous fatal conditions. You should only consider disabling this option for embedded systems with no facilities for reporting errors. Just say Y. I've always been under the impression that a BUG() call should never be triggerable from userspace. However, I'm always happy to be incorrect and subsequently reeducated. In other words ... Is this a valid issue that you want me to report (in a different way): > Start again, write a good bug report in a new thread. Or is this expected behaviour and therefore not a concern: > > > The BUG report came from page_buffers failing to find any buffer heads > > > attached to the page. > > > > > Yeah, don't care.
Let me repeat myself: Please send a proper bug report to the linux-ext4 list. Thanks!
On Mon, 14 Feb 2022, Christoph Hellwig wrote: > Let me repeat myself: Please send a proper bug report to the linux-ext4 > list. Thanks! Okay, so it is valid. Question answered, thanks. I still believe that I am unqualified to attempt to debug this myself. In order to do so, I'll at least require some guidance from you SMEs. Please bear with me while I clear my desk - lots on currently. Bug report to follow.
On Mon, Feb 14, 2022 at 02:11:46PM +0000, Lee Jones wrote: > On Mon, 14 Feb 2022, Christoph Hellwig wrote: > > > Let me repeat myself: Please send a proper bug report to the linux-ext4 > > list. Thanks! > > Okay, so it is valid. Question answered, thanks. > > I still believe that I am unqualified to attempt to debug this myself. Nobody's asking you to debug it yourself. We're asking you to file a clear bug report instead of wasting everybody's time.
On Mon, 14 Feb 2022, Matthew Wilcox wrote: > On Mon, Feb 14, 2022 at 02:11:46PM +0000, Lee Jones wrote: > > On Mon, 14 Feb 2022, Christoph Hellwig wrote: > > > > > Let me repeat myself: Please send a proper bug report to the linux-ext4 > > > list. Thanks! > > > > Okay, so it is valid. Question answered, thanks. > > > > I still believe that I am unqualified to attempt to debug this myself. > > Nobody's asking you to debug it yourself. Not meaning to drag this out any longer than is absolutely necessary, but that's exactly what I was asked to do. I fully appreciate how complex this subsystem is and am aware of my inadequacy in the area. > instead of wasting everybody's time. That was never the intention. > We're asking you to file a clear bug report No problem. Will comply.
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 3ed8c048fb12c..cb347c3b35535 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -551,8 +551,6 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) iomap_ops = &ext4_iomap_overwrite_ops; ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, is_sync_kiocb(iocb) || unaligned_io || extend); - if (ret == -ENOTBLK) - ret = 0; if (extend) ret = ext4_handle_inode_extension(inode, offset, ret, count); diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index b39b339feddc9..847adb97380d3 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -835,8 +835,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, is_sync_kiocb(iocb)); - if (ret == -ENOTBLK) - ret = 0; + out: gfs2_glock_dq(gh); out_uninit: diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 933f234d5becd..ddcd06c0c452d 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -10,7 +10,6 @@ #include <linux/backing-dev.h> #include <linux/uio.h> #include <linux/task_io_accounting_ops.h> -#include "trace.h" #include "../internal.h" @@ -413,9 +412,6 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, * can be mapped into multiple disjoint IOs and only a subset of the IOs issued * may be pure data writes. In that case, we still need to do a full data sync * completion. - * - * Returns -ENOTBLK In case of a page invalidation invalidation failure for - * writes. The callers needs to fall back to buffered I/O in this case. */ struct iomap_dio * __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, @@ -493,15 +489,13 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (iov_iter_rw(iter) == WRITE) { /* * Try to invalidate cache pages for the range we are writing. - * If this invalidation fails, let the caller fall back to - * buffered I/O. + * If this invalidation fails, tough, the write will still work, + * but racing two incompatible write paths is a pretty crazy + * thing to do, so we don't support it 100%. */ if (invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, - end >> PAGE_SHIFT)) { - trace_iomap_dio_invalidate_fail(inode, pos, count); - ret = -ENOTBLK; - goto out_free_dio; - } + end >> PAGE_SHIFT)) + dio_warn_stale_pagecache(iocb->ki_filp); if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) { ret = sb_init_dio_done_wq(inode->i_sb); diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h index fdc7ae388476f..5693a39d52fb6 100644 --- a/fs/iomap/trace.h +++ b/fs/iomap/trace.h @@ -74,7 +74,6 @@ DEFINE_EVENT(iomap_range_class, name, \ DEFINE_RANGE_EVENT(iomap_writepage); DEFINE_RANGE_EVENT(iomap_releasepage); DEFINE_RANGE_EVENT(iomap_invalidatepage); -DEFINE_RANGE_EVENT(iomap_dio_invalidate_fail); #define IOMAP_TYPE_STRINGS \ { IOMAP_HOLE, "HOLE" }, \ diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 5b0f93f738372..43e2c057214d9 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -589,8 +589,8 @@ xfs_file_dio_aio_write( xfs_iunlock(ip, iolock); /* - * No fallback to buffered IO after short writes for XFS, direct I/O - * will either complete fully or return an error. + * No fallback to buffered IO on errors for XFS, direct IO will either + * complete fully or fail. */ ASSERT(ret < 0 || ret == count); return ret; diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index bec47f2d074be..d54fbef3db4df 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -851,11 +851,8 @@ static ssize_t zonefs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (iocb->ki_pos >= ZONEFS_I(inode)->i_max_size) return -EFBIG; - if (iocb->ki_flags & IOCB_DIRECT) { - ssize_t ret = zonefs_file_dio_write(iocb, from); - if (ret != -ENOTBLK) - return ret; - } + if (iocb->ki_flags & IOCB_DIRECT) + return zonefs_file_dio_write(iocb, from); return zonefs_file_buffered_write(iocb, from); }
This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7. Reverting since this commit opens a potential avenue for abuse. The C-reproducer and more information can be found at the link below. With this patch applied, I can no longer get the repro to trigger. kernel BUG at fs/ext4/inode.c:2647! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 459 Comm: syz-executor359 Tainted: G W 5.10.93-syzkaller-01028-g0347b1658399 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:mpage_prepare_extent_to_map+0xbe9/0xc00 fs/ext4/inode.c:2647 Code: e8 fc bd c3 ff e9 80 f6 ff ff 44 89 e9 80 e1 07 38 c1 0f 8c a6 fe ff ff 4c 89 ef e8 e1 bd c3 ff e9 99 fe ff ff e8 87 c9 89 ff <0f> 0b e8 80 c9 89 ff 0f 0b e8 79 1e b8 02 66 0f 1f 84 00 00 00 00 RSP: 0018:ffffc90000e2e9c0 EFLAGS: 00010293 RAX: ffffffff81e321f9 RBX: 0000000000000000 RCX: ffff88810c12cf00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90000e2eb90 R08: ffffffff81e31e71 R09: fffff940008d68b1 R10: fffff940008d68b1 R11: 0000000000000000 R12: ffffea00046b4580 R13: ffffc90000e2ea80 R14: 000000000000011e R15: dffffc0000000000 FS: 00007fcfd0717700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fcfd07d5520 CR3: 000000010a142000 CR4: 00000000003506b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ext4_writepages+0xcbb/0x3950 fs/ext4/inode.c:2776 do_writepages+0x13a/0x280 mm/page-writeback.c:2358 __filemap_fdatawrite_range+0x356/0x420 mm/filemap.c:427 filemap_write_and_wait_range+0x64/0xe0 mm/filemap.c:660 __iomap_dio_rw+0x621/0x10c0 fs/iomap/direct-io.c:495 iomap_dio_rw+0x35/0x80 fs/iomap/direct-io.c:611 ext4_dio_write_iter fs/ext4/file.c:571 [inline] ext4_file_write_iter+0xfc5/0x1b70 fs/ext4/file.c:681 do_iter_readv_writev+0x548/0x740 include/linux/fs.h:1941 do_iter_write+0x182/0x660 fs/read_write.c:866 vfs_iter_write+0x7c/0xa0 fs/read_write.c:907 iter_file_splice_write+0x7fb/0xf70 fs/splice.c:686 do_splice_from fs/splice.c:764 [inline] direct_splice_actor+0xfe/0x130 fs/splice.c:933 splice_direct_to_actor+0x4f4/0xbc0 fs/splice.c:888 do_splice_direct+0x28b/0x3e0 fs/splice.c:976 do_sendfile+0x914/0x1390 fs/read_write.c:1257 Link: https://syzkaller.appspot.com/bug?extid=41c966bf0729a530bd8d From the patch: Cc: Stable <stable@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Theodore Ts'o <tytso@mit.edu> # for ext4 Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Ritesh Harjani <riteshh@linux.ibm.com> Others: Cc: Johannes Thumshirn <jth@kernel.org> Cc: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-ext4@vger.kernel.org Cc: cluster-devel@redhat.com Fixes: 60263d5889e6d ("iomap: fall back to buffered writes for invalidation failures") Reported-by: syzbot+0ed9f769264276638893@syzkaller.appspotmail.com Signed-off-by: Lee Jones <lee.jones@linaro.org> --- fs/ext4/file.c | 2 -- fs/gfs2/file.c | 3 +-- fs/iomap/direct-io.c | 16 +++++----------- fs/iomap/trace.h | 1 - fs/xfs/xfs_file.c | 4 ++-- fs/zonefs/super.c | 7 ++----- 6 files changed, 10 insertions(+), 23 deletions(-)