[v2] Btrfs: fix crash on endio of reading corrupted block

Message ID	1408524679-13583-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Liu Bo <bo.li.liu@oracle.com> To: linux-btrfs <linux-btrfs@vger.kernel.org> Subject: [PATCH v2] Btrfs: fix crash on endio of reading corrupted block Date: Wed, 20 Aug 2014 16:51:19 +0800 Message-Id: <1408524679-13583-1-git-send-email-bo.li.liu@oracle.com> In-Reply-To: <1408462393-3291-1-git-send-email-bo.li.liu@oracle.com> References: <1408462393-3291-1-git-send-email-bo.li.liu@oracle.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk

Message ID

1408524679-13583-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)

State

Superseded

Headers

From: Liu Bo <bo.li.liu@oracle.com>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: [PATCH v2] Btrfs: fix crash on endio of reading corrupted block
Date: Wed, 20 Aug 2014 16:51:19 +0800
Message-Id: <1408524679-13583-1-git-send-email-bo.li.liu@oracle.com>
In-Reply-To: <1408462393-3291-1-git-send-email-bo.li.liu@oracle.com>
References: <1408462393-3291-1-git-send-email-bo.li.liu@oracle.com>
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Liu Bo Aug. 20, 2014, 8:51 a.m. UTC

The crash is

------------[ cut here ]------------
kernel BUG at fs/btrfs/extent_io.c:2124!
invalid opcode: 0000 [#1] SMP
...
CPU: 3 PID: 88 Comm: kworker/u8:7 Not tainted 3.17.0-0.rc1.git0.1.fc22.x86_64 #1
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Workqueue: btrfs-endio normal_work_helper [btrfs]
task: ffff8800d7152700 ti: ffff8800d729c000 task.ti: ffff8800d729c000
RIP: 0010:[<ffffffffa02d6055>]  [<ffffffffa02d6055>] end_bio_extent_readpage+0xb45/0xcd0 [btrfs]
Call Trace:
  [<ffffffff810c3ef8>] ? __enqueue_entity+0x78/0x80
  [<ffffffff810ca969>] ? enqueue_entity+0x2e9/0x990
  [<ffffffff813464ab>] bio_endio+0x6b/0xa0
  [<ffffffff813464f2>] bio_endio_nodec+0x12/0x20
  [<ffffffffa02ab217>] end_workqueue_fn+0x37/0x40 [btrfs]
  [<ffffffffa02e4b5d>] normal_work_helper+0xbd/0x280 [btrfs]
  [<ffffffff810ac4fe>] process_one_work+0x17e/0x430
  [<ffffffff810ace8b>] worker_thread+0x6b/0x4a0
  [<ffffffff810ace20>] ? rescuer_thread+0x2a0/0x2a0
  [<ffffffff810b1fca>] kthread+0xea/0x100
  [<ffffffff810b1ee0>] ? kthread_create_on_node+0x1a0/0x1a0
  [<ffffffff8173dd7c>] ret_from_fork+0x7c/0xb0
  [<ffffffff810b1ee0>] ? kthread_create_on_node+0x1a0/0x1a0

This is in fact a regression.

It is because we forgot to increase @offset properly in reading corrupted block,
so that the @offset remains unchanged, and it leads to checksum errors while
reading left blocks queued up in the same bio, and then btrfs tries to
iterate copies for those blocks in order to get good data, and hits the
BUG_ON() which we set to avoid finding good copies for blocks without problems.

Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2:
   - Improve the commit log to be clear, suggested by Eric.

 fs/btrfs/extent_io.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Eric Sandeen Aug. 22, 2014, 3:32 p.m. UTC | #1

On 8/20/14, 3:51 AM, Liu Bo wrote:
> The crash is
> 
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/extent_io.c:2124!
> invalid opcode: 0000 [#1] SMP

...

> ---
> v2:
>    - Improve the commit log to be clear, suggested by Eric.

Well, I had specifically asked for it to include details on when
the regression occurred, but didn't get that...  ;)

If you state that it's a regression, people may start wondering
if their kernels are vulnerable, how far back into -stable it
should go, which distros should pick up the fix, etc.  If you
don't say when it regressed, we're all left wondering...

Thanks,

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo Aug. 23, 2014, 3:53 a.m. UTC | #2

On Fri, Aug 22, 2014 at 10:32:13AM -0500, Eric Sandeen wrote:
> On 8/20/14, 3:51 AM, Liu Bo wrote:
> > The crash is
> > 
> > ------------[ cut here ]------------
> > kernel BUG at fs/btrfs/extent_io.c:2124!
> > invalid opcode: 0000 [#1] SMP
> 
> ...
> 
> > ---
> > v2:
> >    - Improve the commit log to be clear, suggested by Eric.
> 
> Well, I had specifically asked for it to include details on when
> the regression occurred, but didn't get that...  ;)
> 
> If you state that it's a regression, people may start wondering
> if their kernels are vulnerable, how far back into -stable it
> should go, which distros should pick up the fix, etc.  If you
> don't say when it regressed, we're all left wondering...

Oh yeah, now I get it :)

thanks,
-liubo

> 
> Thanks,
> 
> -Eric
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3af4966..be41e4d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2602,6 +2602,7 @@  static void end_bio_extent_readpage(struct bio *bio, int err)
 					test_bit(BIO_UPTODATE, &bio->bi_flags);
 				if (err)
 					uptodate = 0;
+				offset += len;
 				continue;
 			}
 		}

[v2] Btrfs: fix crash on endio of reading corrupted block

Commit Message

Comments

Patch