diff mbox series

[1/1] Revert "iomap: fall back to buffered writes for invalidation failures"

Message ID 20220209085243.3136536-1-lee.jones@linaro.org (mailing list archive)
State Deferred, archived
Headers show
Series [1/1] Revert "iomap: fall back to buffered writes for invalidation failures" | expand

Commit Message

Lee Jones Feb. 9, 2022, 8:52 a.m. UTC
This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.

Reverting since this commit opens a potential avenue for abuse.

The C-reproducer and more information can be found at the link below.

With this patch applied, I can no longer get the repro to trigger.

  kernel BUG at fs/ext4/inode.c:2647!
  invalid opcode: 0000 [#1] PREEMPT SMP KASAN
  CPU: 0 PID: 459 Comm: syz-executor359 Tainted: G        W         5.10.93-syzkaller-01028-g0347b1658399 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:mpage_prepare_extent_to_map+0xbe9/0xc00 fs/ext4/inode.c:2647
  Code: e8 fc bd c3 ff e9 80 f6 ff ff 44 89 e9 80 e1 07 38 c1 0f 8c a6 fe ff ff 4c 89 ef e8 e1 bd c3 ff e9 99 fe ff ff e8 87 c9 89 ff <0f> 0b e8 80 c9 89 ff 0f 0b e8 79 1e b8 02 66 0f 1f 84 00 00 00 00
  RSP: 0018:ffffc90000e2e9c0 EFLAGS: 00010293
  RAX: ffffffff81e321f9 RBX: 0000000000000000 RCX: ffff88810c12cf00
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffffc90000e2eb90 R08: ffffffff81e31e71 R09: fffff940008d68b1
  R10: fffff940008d68b1 R11: 0000000000000000 R12: ffffea00046b4580
  R13: ffffc90000e2ea80 R14: 000000000000011e R15: dffffc0000000000
  FS:  00007fcfd0717700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fcfd07d5520 CR3: 000000010a142000 CR4: 00000000003506b0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ext4_writepages+0xcbb/0x3950 fs/ext4/inode.c:2776
   do_writepages+0x13a/0x280 mm/page-writeback.c:2358
   __filemap_fdatawrite_range+0x356/0x420 mm/filemap.c:427
   filemap_write_and_wait_range+0x64/0xe0 mm/filemap.c:660
   __iomap_dio_rw+0x621/0x10c0 fs/iomap/direct-io.c:495
   iomap_dio_rw+0x35/0x80 fs/iomap/direct-io.c:611
   ext4_dio_write_iter fs/ext4/file.c:571 [inline]
   ext4_file_write_iter+0xfc5/0x1b70 fs/ext4/file.c:681
   do_iter_readv_writev+0x548/0x740 include/linux/fs.h:1941
   do_iter_write+0x182/0x660 fs/read_write.c:866
   vfs_iter_write+0x7c/0xa0 fs/read_write.c:907
   iter_file_splice_write+0x7fb/0xf70 fs/splice.c:686
   do_splice_from fs/splice.c:764 [inline]
   direct_splice_actor+0xfe/0x130 fs/splice.c:933
   splice_direct_to_actor+0x4f4/0xbc0 fs/splice.c:888
   do_splice_direct+0x28b/0x3e0 fs/splice.c:976
   do_sendfile+0x914/0x1390 fs/read_write.c:1257

Link: https://syzkaller.appspot.com/bug?extid=41c966bf0729a530bd8d

From the patch:
Cc: Stable <stable@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Theodore Ts'o <tytso@mit.edu> # for ext4
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>

Others:
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-ext4@vger.kernel.org
Cc: cluster-devel@redhat.com

Fixes: 60263d5889e6d ("iomap: fall back to buffered writes for invalidation failures")
Reported-by: syzbot+0ed9f769264276638893@syzkaller.appspotmail.com
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 fs/ext4/file.c       |  2 --
 fs/gfs2/file.c       |  3 +--
 fs/iomap/direct-io.c | 16 +++++-----------
 fs/iomap/trace.h     |  1 -
 fs/xfs/xfs_file.c    |  4 ++--
 fs/zonefs/super.c    |  7 ++-----
 6 files changed, 10 insertions(+), 23 deletions(-)

Comments

Damien Le Moal Feb. 9, 2022, 10:45 a.m. UTC | #1
On 2022/02/09 17:52, Lee Jones wrote:
> This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.
> 
> Reverting since this commit opens a potential avenue for abuse.
> 
> The C-reproducer and more information can be found at the link below.
> 
> With this patch applied, I can no longer get the repro to trigger.
> 
>   kernel BUG at fs/ext4/inode.c:2647!
>   invalid opcode: 0000 [#1] PREEMPT SMP KASAN
>   CPU: 0 PID: 459 Comm: syz-executor359 Tainted: G        W         5.10.93-syzkaller-01028-g0347b1658399 #0
>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>   RIP: 0010:mpage_prepare_extent_to_map+0xbe9/0xc00 fs/ext4/inode.c:2647
>   Code: e8 fc bd c3 ff e9 80 f6 ff ff 44 89 e9 80 e1 07 38 c1 0f 8c a6 fe ff ff 4c 89 ef e8 e1 bd c3 ff e9 99 fe ff ff e8 87 c9 89 ff <0f> 0b e8 80 c9 89 ff 0f 0b e8 79 1e b8 02 66 0f 1f 84 00 00 00 00
>   RSP: 0018:ffffc90000e2e9c0 EFLAGS: 00010293
>   RAX: ffffffff81e321f9 RBX: 0000000000000000 RCX: ffff88810c12cf00
>   RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>   RBP: ffffc90000e2eb90 R08: ffffffff81e31e71 R09: fffff940008d68b1
>   R10: fffff940008d68b1 R11: 0000000000000000 R12: ffffea00046b4580
>   R13: ffffc90000e2ea80 R14: 000000000000011e R15: dffffc0000000000
>   FS:  00007fcfd0717700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007fcfd07d5520 CR3: 000000010a142000 CR4: 00000000003506b0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    ext4_writepages+0xcbb/0x3950 fs/ext4/inode.c:2776
>    do_writepages+0x13a/0x280 mm/page-writeback.c:2358
>    __filemap_fdatawrite_range+0x356/0x420 mm/filemap.c:427
>    filemap_write_and_wait_range+0x64/0xe0 mm/filemap.c:660
>    __iomap_dio_rw+0x621/0x10c0 fs/iomap/direct-io.c:495
>    iomap_dio_rw+0x35/0x80 fs/iomap/direct-io.c:611
>    ext4_dio_write_iter fs/ext4/file.c:571 [inline]
>    ext4_file_write_iter+0xfc5/0x1b70 fs/ext4/file.c:681
>    do_iter_readv_writev+0x548/0x740 include/linux/fs.h:1941
>    do_iter_write+0x182/0x660 fs/read_write.c:866
>    vfs_iter_write+0x7c/0xa0 fs/read_write.c:907
>    iter_file_splice_write+0x7fb/0xf70 fs/splice.c:686
>    do_splice_from fs/splice.c:764 [inline]
>    direct_splice_actor+0xfe/0x130 fs/splice.c:933
>    splice_direct_to_actor+0x4f4/0xbc0 fs/splice.c:888
>    do_splice_direct+0x28b/0x3e0 fs/splice.c:976
>    do_sendfile+0x914/0x1390 fs/read_write.c:1257
> 
> Link: https://syzkaller.appspot.com/bug?extid=41c966bf0729a530bd8d
> 
> From the patch:
> Cc: Stable <stable@vger.kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: Bob Peterson <rpeterso@redhat.com>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Theodore Ts'o <tytso@mit.edu> # for ext4
> Cc: Andreas Gruenbacher <agruenba@redhat.com>
> Cc: Ritesh Harjani <riteshh@linux.ibm.com>
> 
> Others:
> Cc: Johannes Thumshirn <jth@kernel.org>
> Cc: linux-xfs@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-ext4@vger.kernel.org
> Cc: cluster-devel@redhat.com
> 
> Fixes: 60263d5889e6d ("iomap: fall back to buffered writes for invalidation failures")
> Reported-by: syzbot+0ed9f769264276638893@syzkaller.appspotmail.com
> Signed-off-by: Lee Jones <lee.jones@linaro.org>

For zonefs:

Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Christoph Hellwig Feb. 9, 2022, 3:09 p.m. UTC | #2
On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote:
> This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.
> 
> Reverting since this commit opens a potential avenue for abuse.
> 
> The C-reproducer and more information can be found at the link below.
> 
> With this patch applied, I can no longer get the repro to trigger.

Well, maybe you should actually debug and try to understand what is
going on before blindly reverting random commits.
Andreas Gruenbacher Feb. 9, 2022, 3:20 p.m. UTC | #3
On Wed, Feb 9, 2022 at 4:09 PM Christoph Hellwig <hch@lst.de> wrote:
> On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote:
> > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.
> >
> > Reverting since this commit opens a potential avenue for abuse.
> >
> > The C-reproducer and more information can be found at the link below.

This reproducer seems to be working fine on gfs2, but the locking in
gfs2 differs hugely from that of other filesystems.

> > With this patch applied, I can no longer get the repro to trigger.
>
> Well, maybe you should actually debug and try to understand what is
> going on before blindly reverting random commits.

Andreas
Lee Jones Feb. 9, 2022, 3:59 p.m. UTC | #4
On Wed, 09 Feb 2022, Christoph Hellwig wrote:

> On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote:
> > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.
> > 
> > Reverting since this commit opens a potential avenue for abuse.
> > 
> > The C-reproducer and more information can be found at the link below.
> > 
> > With this patch applied, I can no longer get the repro to trigger.
> 
> Well, maybe you should actually debug and try to understand what is
> going on before blindly reverting random commits.

That is not a reasonable suggestion.

Requesting that someone becomes an area expert on a huge and complex
subject such as file systems (various) in order to fix your broken
code is not rational.

If you'd like to use the PoC provided as a basis to test your own
solution, then go right ahead.  However, as it stands this API should
be considered to contain security risk and should be patched as
quickly as can be mustered.  Reversion of the offending commit seems
to be the fastest method to achieve that currently.
Matthew Wilcox Feb. 9, 2022, 7:12 p.m. UTC | #5
On Wed, Feb 09, 2022 at 03:59:48PM +0000, Lee Jones wrote:
> On Wed, 09 Feb 2022, Christoph Hellwig wrote:
> 
> > On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote:
> > > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.
> > > 
> > > Reverting since this commit opens a potential avenue for abuse.
> > > 
> > > The C-reproducer and more information can be found at the link below.
> > > 
> > > With this patch applied, I can no longer get the repro to trigger.
> > 
> > Well, maybe you should actually debug and try to understand what is
> > going on before blindly reverting random commits.
> 
> That is not a reasonable suggestion.
> 
> Requesting that someone becomes an area expert on a huge and complex
> subject such as file systems (various) in order to fix your broken
> code is not rational.

Sending a patch to revert a change you don't understand is also
not rational.  If you've bisected it to a single change -- great!
If reverting the patch still fixes the bug -- also great!  But
don't send a patch when you clearly don't understand what the
patch did.

> If you'd like to use the PoC provided as a basis to test your own
> solution, then go right ahead.  However, as it stands this API should
> be considered to contain security risk and should be patched as
> quickly as can be mustered.  Reversion of the offending commit seems
> to be the fastest method to achieve that currently.

This is incoherent.  There is no security risk.
Lee Jones Feb. 9, 2022, 7:25 p.m. UTC | #6
On Wed, 09 Feb 2022, Matthew Wilcox wrote:

> On Wed, Feb 09, 2022 at 03:59:48PM +0000, Lee Jones wrote:
> > On Wed, 09 Feb 2022, Christoph Hellwig wrote:
> > 
> > > On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote:
> > > > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.
> > > > 
> > > > Reverting since this commit opens a potential avenue for abuse.
> > > > 
> > > > The C-reproducer and more information can be found at the link below.
> > > > 
> > > > With this patch applied, I can no longer get the repro to trigger.
> > > 
> > > Well, maybe you should actually debug and try to understand what is
> > > going on before blindly reverting random commits.
> > 
> > That is not a reasonable suggestion.
> > 
> > Requesting that someone becomes an area expert on a huge and complex
> > subject such as file systems (various) in order to fix your broken
> > code is not rational.
> 
> Sending a patch to revert a change you don't understand is also
> not rational.  If you've bisected it to a single change -- great!
> If reverting the patch still fixes the bug -- also great!  But
> don't send a patch when you clearly don't understand what the
> patch did.

If reverting isn't the correct thing to do here, please consider this
as a bug report.
Darrick J. Wong Feb. 10, 2022, 4:59 a.m. UTC | #7
On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote:
> This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.
> 
> Reverting since this commit opens a potential avenue for abuse.

What kind of abuse?  Did you conclude there's an avenue solely because
some combination of userspace rigging produced a BUG warning?  Or is
this a real problem that someone found?

> The C-reproducer and more information can be found at the link below.

No.  Post the information and your analysis here.  I'm not going to dig
into some Google site to find out what happened, and I'm not going to
assume that future developers will be able to access that URL to learn
why this patch was created.

I am not convinced that this revert is a proper fix for ... whatever is
the problem, because you never explained what happened.

> With this patch applied, I can no longer get the repro to trigger.

With ext4 completely reverted, I can no longer get the repro to trigger
either.

>   kernel BUG at fs/ext4/inode.c:2647!
>   invalid opcode: 0000 [#1] PREEMPT SMP KASAN
>   CPU: 0 PID: 459 Comm: syz-executor359 Tainted: G        W         5.10.93-syzkaller-01028-g0347b1658399 #0

What BUG on fs/ext4/inode.c:2647?

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/inode.c?h=v5.10.93#n2647

All I see is a call to pagevec_release()?  There's a BUG_ON further up
if we wait for page writeback but then it still has Writeback set.  But
I don't see anything in pagevec_release that would actually result in a
BUG warning.

Oh, right, this is one of those inscrutable syzkaller things, where a
person can spend hours figuring out what the hell it set up.

Yeah...no, you don't get to submit patches to core kernel code, claim
it's not your responsibility to know anything about a subsystem that you
want to patch, and then expect us to do the work for you.  If you pick
up a syzkaller report, you get to figure out what broke, why, and how to
fix it in a reasonable manner.

You're a maintainer, would you accept a patch like this?

NAK.

--D

OH WAIT, you're running this on the Android 5.10 kernel, aren't you?
The BUG report came from page_buffers failing to find any buffer heads
attached to the page.
https://android.googlesource.com/kernel/common/+/refs/heads/android12-5.10-2022-02/fs/ext4/inode.c#2647

Yeah, don't care.

>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>   RIP: 0010:mpage_prepare_extent_to_map+0xbe9/0xc00 fs/ext4/inode.c:2647
>   Code: e8 fc bd c3 ff e9 80 f6 ff ff 44 89 e9 80 e1 07 38 c1 0f 8c a6 fe ff ff 4c 89 ef e8 e1 bd c3 ff e9 99 fe ff ff e8 87 c9 89 ff <0f> 0b e8 80 c9 89 ff 0f 0b e8 79 1e b8 02 66 0f 1f 84 00 00 00 00
>   RSP: 0018:ffffc90000e2e9c0 EFLAGS: 00010293
>   RAX: ffffffff81e321f9 RBX: 0000000000000000 RCX: ffff88810c12cf00
>   RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>   RBP: ffffc90000e2eb90 R08: ffffffff81e31e71 R09: fffff940008d68b1
>   R10: fffff940008d68b1 R11: 0000000000000000 R12: ffffea00046b4580
>   R13: ffffc90000e2ea80 R14: 000000000000011e R15: dffffc0000000000
>   FS:  00007fcfd0717700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007fcfd07d5520 CR3: 000000010a142000 CR4: 00000000003506b0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    ext4_writepages+0xcbb/0x3950 fs/ext4/inode.c:2776
>    do_writepages+0x13a/0x280 mm/page-writeback.c:2358
>    __filemap_fdatawrite_range+0x356/0x420 mm/filemap.c:427
>    filemap_write_and_wait_range+0x64/0xe0 mm/filemap.c:660
>    __iomap_dio_rw+0x621/0x10c0 fs/iomap/direct-io.c:495
>    iomap_dio_rw+0x35/0x80 fs/iomap/direct-io.c:611
>    ext4_dio_write_iter fs/ext4/file.c:571 [inline]
>    ext4_file_write_iter+0xfc5/0x1b70 fs/ext4/file.c:681
>    do_iter_readv_writev+0x548/0x740 include/linux/fs.h:1941
>    do_iter_write+0x182/0x660 fs/read_write.c:866
>    vfs_iter_write+0x7c/0xa0 fs/read_write.c:907
>    iter_file_splice_write+0x7fb/0xf70 fs/splice.c:686
>    do_splice_from fs/splice.c:764 [inline]
>    direct_splice_actor+0xfe/0x130 fs/splice.c:933
>    splice_direct_to_actor+0x4f4/0xbc0 fs/splice.c:888
>    do_splice_direct+0x28b/0x3e0 fs/splice.c:976
>    do_sendfile+0x914/0x1390 fs/read_write.c:1257
> 
> Link: https://syzkaller.appspot.com/bug?extid=41c966bf0729a530bd8d
> 
> From the patch:
> Cc: Stable <stable@vger.kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: Bob Peterson <rpeterso@redhat.com>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Theodore Ts'o <tytso@mit.edu> # for ext4
> Cc: Andreas Gruenbacher <agruenba@redhat.com>
> Cc: Ritesh Harjani <riteshh@linux.ibm.com>
> 
> Others:
> Cc: Johannes Thumshirn <jth@kernel.org>
> Cc: linux-xfs@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-ext4@vger.kernel.org
> Cc: cluster-devel@redhat.com
> 
> Fixes: 60263d5889e6d ("iomap: fall back to buffered writes for invalidation failures")
> Reported-by: syzbot+0ed9f769264276638893@syzkaller.appspotmail.com
> Signed-off-by: Lee Jones <lee.jones@linaro.org>
> ---
>  fs/ext4/file.c       |  2 --
>  fs/gfs2/file.c       |  3 +--
>  fs/iomap/direct-io.c | 16 +++++-----------
>  fs/iomap/trace.h     |  1 -
>  fs/xfs/xfs_file.c    |  4 ++--
>  fs/zonefs/super.c    |  7 ++-----
>  6 files changed, 10 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 3ed8c048fb12c..cb347c3b35535 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -551,8 +551,6 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  		iomap_ops = &ext4_iomap_overwrite_ops;
>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>  			   is_sync_kiocb(iocb) || unaligned_io || extend);
> -	if (ret == -ENOTBLK)
> -		ret = 0;
>  
>  	if (extend)
>  		ret = ext4_handle_inode_extension(inode, offset, ret, count);
> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index b39b339feddc9..847adb97380d3 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -835,8 +835,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
>  
>  	ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL,
>  			   is_sync_kiocb(iocb));
> -	if (ret == -ENOTBLK)
> -		ret = 0;
> +
>  out:
>  	gfs2_glock_dq(gh);
>  out_uninit:
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 933f234d5becd..ddcd06c0c452d 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -10,7 +10,6 @@
>  #include <linux/backing-dev.h>
>  #include <linux/uio.h>
>  #include <linux/task_io_accounting_ops.h>
> -#include "trace.h"
>  
>  #include "../internal.h"
>  
> @@ -413,9 +412,6 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
>   * can be mapped into multiple disjoint IOs and only a subset of the IOs issued
>   * may be pure data writes. In that case, we still need to do a full data sync
>   * completion.
> - *
> - * Returns -ENOTBLK In case of a page invalidation invalidation failure for
> - * writes.  The callers needs to fall back to buffered I/O in this case.
>   */
>  struct iomap_dio *
>  __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> @@ -493,15 +489,13 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	if (iov_iter_rw(iter) == WRITE) {
>  		/*
>  		 * Try to invalidate cache pages for the range we are writing.
> -		 * If this invalidation fails, let the caller fall back to
> -		 * buffered I/O.
> +		 * If this invalidation fails, tough, the write will still work,
> +		 * but racing two incompatible write paths is a pretty crazy
> +		 * thing to do, so we don't support it 100%.
>  		 */
>  		if (invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
> -				end >> PAGE_SHIFT)) {
> -			trace_iomap_dio_invalidate_fail(inode, pos, count);
> -			ret = -ENOTBLK;
> -			goto out_free_dio;
> -		}
> +				end >> PAGE_SHIFT))
> +			dio_warn_stale_pagecache(iocb->ki_filp);
>  
>  		if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) {
>  			ret = sb_init_dio_done_wq(inode->i_sb);
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index fdc7ae388476f..5693a39d52fb6 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -74,7 +74,6 @@ DEFINE_EVENT(iomap_range_class, name,	\
>  DEFINE_RANGE_EVENT(iomap_writepage);
>  DEFINE_RANGE_EVENT(iomap_releasepage);
>  DEFINE_RANGE_EVENT(iomap_invalidatepage);
> -DEFINE_RANGE_EVENT(iomap_dio_invalidate_fail);
>  
>  #define IOMAP_TYPE_STRINGS \
>  	{ IOMAP_HOLE,		"HOLE" }, \
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5b0f93f738372..43e2c057214d9 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -589,8 +589,8 @@ xfs_file_dio_aio_write(
>  	xfs_iunlock(ip, iolock);
>  
>  	/*
> -	 * No fallback to buffered IO after short writes for XFS, direct I/O
> -	 * will either complete fully or return an error.
> +	 * No fallback to buffered IO on errors for XFS, direct IO will either
> +	 * complete fully or fail.
>  	 */
>  	ASSERT(ret < 0 || ret == count);
>  	return ret;
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index bec47f2d074be..d54fbef3db4df 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -851,11 +851,8 @@ static ssize_t zonefs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  	if (iocb->ki_pos >= ZONEFS_I(inode)->i_max_size)
>  		return -EFBIG;
>  
> -	if (iocb->ki_flags & IOCB_DIRECT) {
> -		ssize_t ret = zonefs_file_dio_write(iocb, from);
> -		if (ret != -ENOTBLK)
> -			return ret;
> -	}
> +	if (iocb->ki_flags & IOCB_DIRECT)
> +		return zonefs_file_dio_write(iocb, from);
>  
>  	return zonefs_file_buffered_write(iocb, from);
>  }
> -- 
> 2.35.0.263.gb82422642f-goog
>
Christoph Hellwig Feb. 10, 2022, 5:47 a.m. UTC | #8
On Wed, Feb 09, 2022 at 03:59:48PM +0000, Lee Jones wrote:
> > Well, maybe you should actually debug and try to understand what is
> > going on before blindly reverting random commits.
> 
> That is not a reasonable suggestion.

In that case we fudamentally disagree what "reasonable" means.

Sending a revert for a 2 year old commit based on a BUG in consumer of
that subsystem based on a really old kernel without any explanation
of why that revert is even directly related to the problem and not just
a casuality is what I would define as completely unreasable.

Please send a proper bug report to the ext4 maintainer first, and we
can work from there.
Lee Jones Feb. 10, 2022, 10:15 a.m. UTC | #9
On Wed, 09 Feb 2022, Darrick J. Wong wrote:

> On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote:
> > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.
> > 
> > Reverting since this commit opens a potential avenue for abuse.
> 
> What kind of abuse?  Did you conclude there's an avenue solely because
> some combination of userspace rigging produced a BUG warning?  Or is
> this a real problem that someone found?

Genuine question: Is the ability for userspace to crash the kernel
not enough to cause concern?  I would have thought that we'd want to
prevent this.

If by 'real problem' you mean; privilege escalation, memory corruption
or data leakage, then no, I haven't found any evidence of that.
However, that's not to say these aren't possible as a result of this
issue, just that I do not have the skills or knowledge to be able to
turn this into a demonstrable attack vector.

However, if you say there is no issue, I'm happy to take your word.

> > The C-reproducer and more information can be found at the link below.
> 
> No.  Post the information and your analysis here.  I'm not going to dig
> into some Google site to find out what happened, and I'm not going to
> assume that future developers will be able to access that URL to learn
> why this patch was created.

The link provided doesn't contain any further analysis.  Only the
reproducer and kernel configuration used, which are both too large to
enter into a Git commit.

> >   kernel BUG at fs/ext4/inode.c:2647!
> >   invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> >   CPU: 0 PID: 459 Comm: syz-executor359 Tainted: G        W         5.10.93-syzkaller-01028-g0347b1658399 #0
> 
> What BUG on fs/ext4/inode.c:2647?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/inode.c?h=v5.10.93#n2647
> 
> All I see is a call to pagevec_release()?  There's a BUG_ON further up
> if we wait for page writeback but then it still has Writeback set.  But
> I don't see anything in pagevec_release that would actually result in a
> BUG warning.

Right, this BUG back-trace was taken from the kernel I received the
bug report for.  I should have used the one I triggered in Mainline,
apologies for that.

The real source of the BUG is in the inlined call to page_buffers().

Here is the link for the latest release kernel:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/inode.c?h=v5.16#n2620

#define page_buffers(page)                                      \
        ({                                                      \
                BUG_ON(!PagePrivate(page));                     \
                ((struct buffer_head *)page_private(page));     \
        })

> Oh, right, this is one of those inscrutable syzkaller things, where a
> person can spend hours figuring out what the hell it set up.

A link to the config used (again too big to enter into a commit
message), can be easily sourced from the link provided.

> Yeah...no, you don't get to submit patches to core kernel code, claim
> it's not your responsibility to know anything about a subsystem that you
> want to patch, and then expect us to do the work for you.  If you pick
> up a syzkaller report, you get to figure out what broke, why, and how to
> fix it in a reasonable manner.
> 
> You're a maintainer, would you accept a patch like this?

No.  I would share my knowledge to provide a helpful review and work
with the contributor to find a solution (if applicable).

> OH WAIT, you're running this on the Android 5.10 kernel, aren't you?
> The BUG report came from page_buffers failing to find any buffer heads
> attached to the page.
> https://android.googlesource.com/kernel/common/+/refs/heads/android12-5.10-2022-02/fs/ext4/inode.c#2647

Yes, the H/W I have to prototype these on is a phone and the report
that came in was specifically built against the aforementioned
kernel.

> Yeah, don't care.

"There is nothing to worry about, as it's intended behaviour"?
Matthew Wilcox Feb. 11, 2022, 2:37 p.m. UTC | #10
On Thu, Feb 10, 2022 at 10:15:52AM +0000, Lee Jones wrote:
> On Wed, 09 Feb 2022, Darrick J. Wong wrote:
> 
> > On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote:
> > > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.
> > > 
> > > Reverting since this commit opens a potential avenue for abuse.
> > 
> > What kind of abuse?  Did you conclude there's an avenue solely because
> > some combination of userspace rigging produced a BUG warning?  Or is
> > this a real problem that someone found?
> 
> Genuine question: Is the ability for userspace to crash the kernel
> not enough to cause concern?  I would have thought that we'd want to
> prevent this.

The kernel doesn't crash.  It's a BUG().  That means it kills the
task which caused the BUG().  If you've specified that the kernel should
crash on seeing a BUG(), well, you made that decision, and you get to
live with the consequences.

> The link provided doesn't contain any further analysis.  Only the
> reproducer and kernel configuration used, which are both too large to
> enter into a Git commit.

But not too large to put in an email.  Which you should have sent to
begin with, not a stupid reversion commit.

> > OH WAIT, you're running this on the Android 5.10 kernel, aren't you?
> > The BUG report came from page_buffers failing to find any buffer heads
> > attached to the page.
> > https://android.googlesource.com/kernel/common/+/refs/heads/android12-5.10-2022-02/fs/ext4/inode.c#2647
> 
> Yes, the H/W I have to prototype these on is a phone and the report
> that came in was specifically built against the aforementioned
> kernel.
> 
> > Yeah, don't care.
> 
> "There is nothing to worry about, as it's intended behaviour"?

No.  You've come in like a fucking meteorite full of arrogance and
ignorance.  Nobody's reacting well to you right now.  Start again,
write a good bug report in a new thread.
Lee Jones Feb. 14, 2022, 10:33 a.m. UTC | #11
Let's attempt to seek beyond the mud slinging, swearing and the whiny
amateur dramatics for just a brief moment and concentrate solely on
the technicals please.

On Fri, 11 Feb 2022, Matthew Wilcox wrote:
> On Thu, Feb 10, 2022 at 10:15:52AM +0000, Lee Jones wrote:
> > On Wed, 09 Feb 2022, Darrick J. Wong wrote:
> > 
> > > On Wed, Feb 09, 2022 at 08:52:43AM +0000, Lee Jones wrote:
> > > > This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.
> > > > 
> > > > Reverting since this commit opens a potential avenue for abuse.
> > > 
> > > What kind of abuse?  Did you conclude there's an avenue solely because
> > > some combination of userspace rigging produced a BUG warning?  Or is
> > > this a real problem that someone found?
> > 
> > Genuine question: Is the ability for userspace to crash the kernel
> > not enough to cause concern?  I would have thought that we'd want to
> > prevent this.
> 
> The kernel doesn't crash.  It's a BUG().  That means it kills the
> task which caused the BUG().  If you've specified that the kernel should
> crash on seeing a BUG(), well, you made that decision, and you get to
> live with the consequences.

BUG() calls are architecture specific.  If no override is provided,
the default appears to panic ("crash") the kernel:

 /*
  * Don't use BUG() or BUG_ON() unless there's really no way out; one
  * example might be detecting data structure corruption in the middle
  * of an operation that can't be backed out of.  If the (sub)system
  * can somehow continue operating, perhaps with reduced functionality,
  * it's probably not BUG-worthy.
  *
  * If you're tempted to BUG(), think again:  is completely giving up
  * really the *only* solution?  There are usually better options, where
  * users don't need to reboot ASAP and can mostly shut down cleanly.
  */
 #ifndef HAVE_ARCH_BUG
 #define BUG() do { \
         printk("BUG: failure at %s:%d/%s()!\n", __FILE__, __LINE__, __func__); \
         barrier_before_unreachable(); \
         panic("BUG!"); \
 } while (0)
 #endif

The kernel I tested with panics and reboots:

 Kernel panic - not syncing: Fatal exception

Here are the BUG related kernel configs I have set:

 CONFIG_BUG=y                          
 CONFIG_GENERIC_BUG=y                  
 CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
 # CONFIG_INPUT_EVBUG is not set       
 CONFIG_BUG_ON_DATA_CORRUPTION=y       

Not seeing a "CONFIG_PANIC_ON_BUG" equivalent.  What is missing?

Unless of course you mean disabling BUG support entirely.  In which
case, this is strongly advised against in the help section and I'm not
sure of many development or production kernels that do this.

 config BUG
        bool "BUG() support" if EXPERT
        default y
        help
          Disabling this option eliminates support for BUG and WARN, reducing
          the size of your kernel image and potentially quietly ignoring
          numerous fatal conditions. You should only consider disabling this
          option for embedded systems with no facilities for reporting errors.
          Just say Y.

I've always been under the impression that a BUG() call should never
be triggerable from userspace.  However, I'm always happy to be
incorrect and subsequently reeducated.

In other words ...

Is this a valid issue that you want me to report (in a different way):

> Start again, write a good bug report in a new thread.

Or is this expected behaviour and therefore not a concern:

> > > The BUG report came from page_buffers failing to find any buffer heads
> > > attached to the page.
> > 
> > > Yeah, don't care.
Christoph Hellwig Feb. 14, 2022, 1:42 p.m. UTC | #12
Let me repeat myself:  Please send a proper bug report to the linux-ext4
list.  Thanks!
Lee Jones Feb. 14, 2022, 2:11 p.m. UTC | #13
On Mon, 14 Feb 2022, Christoph Hellwig wrote:

> Let me repeat myself:  Please send a proper bug report to the linux-ext4
> list.  Thanks!

Okay, so it is valid.  Question answered, thanks.

I still believe that I am unqualified to attempt to debug this myself.

In order to do so, I'll at least require some guidance from you SMEs.

Please bear with me while I clear my desk - lots on currently.

Bug report to follow.
Matthew Wilcox Feb. 14, 2022, 2:24 p.m. UTC | #14
On Mon, Feb 14, 2022 at 02:11:46PM +0000, Lee Jones wrote:
> On Mon, 14 Feb 2022, Christoph Hellwig wrote:
> 
> > Let me repeat myself:  Please send a proper bug report to the linux-ext4
> > list.  Thanks!
> 
> Okay, so it is valid.  Question answered, thanks.
> 
> I still believe that I am unqualified to attempt to debug this myself.

Nobody's asking you to debug it yourself.  We're asking you to
file a clear bug report instead of wasting everybody's time.
Lee Jones Feb. 14, 2022, 2:43 p.m. UTC | #15
On Mon, 14 Feb 2022, Matthew Wilcox wrote:
> On Mon, Feb 14, 2022 at 02:11:46PM +0000, Lee Jones wrote:
> > On Mon, 14 Feb 2022, Christoph Hellwig wrote:
> > 
> > > Let me repeat myself:  Please send a proper bug report to the linux-ext4
> > > list.  Thanks!
> > 
> > Okay, so it is valid.  Question answered, thanks.
> > 
> > I still believe that I am unqualified to attempt to debug this myself.
> 
> Nobody's asking you to debug it yourself.

Not meaning to drag this out any longer than is absolutely necessary,
but that's exactly what I was asked to do.

I fully appreciate how complex this subsystem is and am aware of my
inadequacy in the area.

> instead of wasting everybody's time.

That was never the intention.

> We're asking you to file a clear bug report

No problem.  Will comply.
diff mbox series

Patch

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 3ed8c048fb12c..cb347c3b35535 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -551,8 +551,6 @@  static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		iomap_ops = &ext4_iomap_overwrite_ops;
 	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
 			   is_sync_kiocb(iocb) || unaligned_io || extend);
-	if (ret == -ENOTBLK)
-		ret = 0;
 
 	if (extend)
 		ret = ext4_handle_inode_extension(inode, offset, ret, count);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index b39b339feddc9..847adb97380d3 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -835,8 +835,7 @@  static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
 
 	ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL,
 			   is_sync_kiocb(iocb));
-	if (ret == -ENOTBLK)
-		ret = 0;
+
 out:
 	gfs2_glock_dq(gh);
 out_uninit:
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 933f234d5becd..ddcd06c0c452d 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -10,7 +10,6 @@ 
 #include <linux/backing-dev.h>
 #include <linux/uio.h>
 #include <linux/task_io_accounting_ops.h>
-#include "trace.h"
 
 #include "../internal.h"
 
@@ -413,9 +412,6 @@  iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
  * can be mapped into multiple disjoint IOs and only a subset of the IOs issued
  * may be pure data writes. In that case, we still need to do a full data sync
  * completion.
- *
- * Returns -ENOTBLK In case of a page invalidation invalidation failure for
- * writes.  The callers needs to fall back to buffered I/O in this case.
  */
 struct iomap_dio *
 __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
@@ -493,15 +489,13 @@  __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (iov_iter_rw(iter) == WRITE) {
 		/*
 		 * Try to invalidate cache pages for the range we are writing.
-		 * If this invalidation fails, let the caller fall back to
-		 * buffered I/O.
+		 * If this invalidation fails, tough, the write will still work,
+		 * but racing two incompatible write paths is a pretty crazy
+		 * thing to do, so we don't support it 100%.
 		 */
 		if (invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
-				end >> PAGE_SHIFT)) {
-			trace_iomap_dio_invalidate_fail(inode, pos, count);
-			ret = -ENOTBLK;
-			goto out_free_dio;
-		}
+				end >> PAGE_SHIFT))
+			dio_warn_stale_pagecache(iocb->ki_filp);
 
 		if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) {
 			ret = sb_init_dio_done_wq(inode->i_sb);
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index fdc7ae388476f..5693a39d52fb6 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -74,7 +74,6 @@  DEFINE_EVENT(iomap_range_class, name,	\
 DEFINE_RANGE_EVENT(iomap_writepage);
 DEFINE_RANGE_EVENT(iomap_releasepage);
 DEFINE_RANGE_EVENT(iomap_invalidatepage);
-DEFINE_RANGE_EVENT(iomap_dio_invalidate_fail);
 
 #define IOMAP_TYPE_STRINGS \
 	{ IOMAP_HOLE,		"HOLE" }, \
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5b0f93f738372..43e2c057214d9 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -589,8 +589,8 @@  xfs_file_dio_aio_write(
 	xfs_iunlock(ip, iolock);
 
 	/*
-	 * No fallback to buffered IO after short writes for XFS, direct I/O
-	 * will either complete fully or return an error.
+	 * No fallback to buffered IO on errors for XFS, direct IO will either
+	 * complete fully or fail.
 	 */
 	ASSERT(ret < 0 || ret == count);
 	return ret;
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index bec47f2d074be..d54fbef3db4df 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -851,11 +851,8 @@  static ssize_t zonefs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (iocb->ki_pos >= ZONEFS_I(inode)->i_max_size)
 		return -EFBIG;
 
-	if (iocb->ki_flags & IOCB_DIRECT) {
-		ssize_t ret = zonefs_file_dio_write(iocb, from);
-		if (ret != -ENOTBLK)
-			return ret;
-	}
+	if (iocb->ki_flags & IOCB_DIRECT)
+		return zonefs_file_dio_write(iocb, from);
 
 	return zonefs_file_buffered_write(iocb, from);
 }