From patchwork Wed Feb 15 06:41:49 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 9573421 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CFBB960493 for ; Wed, 15 Feb 2017 06:42:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B35CD28417 for ; Wed, 15 Feb 2017 06:42:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A60F02842E; Wed, 15 Feb 2017 06:42:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1E34B28417 for ; Wed, 15 Feb 2017 06:42:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751294AbdBOGl7 (ORCPT ); Wed, 15 Feb 2017 01:41:59 -0500 Received: from cn.fujitsu.com ([59.151.112.132]:6888 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1750800AbdBOGl6 (ORCPT ); Wed, 15 Feb 2017 01:41:58 -0500 X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; d="scan'208";a="15606803" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 15 Feb 2017 14:41:54 +0800 Received: from G08CNEXCHPEKD01.g08.fujitsu.local (unknown [10.167.33.80]) by cn.fujitsu.com (Postfix) with ESMTP id 9CB76477AE90; Wed, 15 Feb 2017 14:41:51 +0800 (CST) Received: from [172.16.0.100] (10.167.226.34) by G08CNEXCHPEKD01.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 15 Feb 2017 14:41:51 +0800 Subject: Re: btrfs/125 deadlock using nospace_cache or space_cache=v2 To: Anand Jain , btrfs References: <0daf31e9-d666-5044-f9a9-fcf54576a144@cn.fujitsu.com> <7ecb7b66-72d1-2bbd-b6f4-91550f2e1ab3@oracle.com> From: Qu Wenruo Message-ID: <955bdf2a-6d44-01d1-a19d-10fad8f7760b@cn.fujitsu.com> Date: Wed, 15 Feb 2017 14:41:49 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <7ecb7b66-72d1-2bbd-b6f4-91550f2e1ab3@oracle.com> X-Originating-IP: [10.167.226.34] X-yoursite-MailScanner-ID: 9CB76477AE90.ADB68 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: quwenruo@cn.fujitsu.com Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP State updated: The deadlock seems to be caused by 2 bugs: 1) Bad error handling in run_delalloc_nocow() The direct cause is, btrfs_reloc_clone_csums() fails and return -EIO. Then error handler will call extent_clear_unlock_delalloc() to clear dirty flag and end writeback of the resting pages in the extent. However this makes the ordered extent not happy, as it just skips IO of the remaining pages, which ordered extent relies on to finish. This is quite easy to reproduce using the following modification: btrfs_end_write_no_snapshoting(root); Then any balance will cause btrfs to wait ordered extent which will never finish, just like what we encountered. 2) RAID5/6 recover not working in some tree operation. In fact, btrfs succeeded to mount the fs, so RAID5/6 recover code is working, at least for some trees. And btrfs succeeded in recovering all the data with correct checksum, if using normal read(cat works here) before balance. However it fails to read csum tree and causes run_delalloc_nocow() to return -EIO, which leads to above bug. So there is something related to RAID5/6 code, maybe readahead? which contributes to this bug. I'll continue digging and keep the state updated if anyone is interested in this bug. Thanks, Qu At 02/07/2017 04:02 PM, Anand Jain wrote: > > Hi Qu, > > I don't think I have seen this before, I don't know the reason > why I wrote this, may be to test encryption, however it was all > with default options. > > But now I could reproduce and, looks like balance fails to > start with IO error though the mount is successful. > ------------------ > # tail -f ./results/btrfs/125.full > intense and takes potentially very long. It is recommended to > use the balance filters to narrow down the balanced data. > Use 'btrfs balance start --full-balance' option to skip this > warning. The operation will start in 10 seconds. > Use Ctrl-C to stop it. > 10 9 8 7 6 5 4 3 2 1ERROR: error during balancing '/scratch': > Input/output error > There may be more info in syslog - try dmesg | tail > > Starting balance without any filters. > failed: '/root/bin/btrfs balance start /scratch' > -------------------- > > This must be fixed. For debugging if I add a sync before previous > unmount, the problem isn't reproduced. just fyi. Strange. > > ------- > diff --git a/tests/btrfs/125 b/tests/btrfs/125 > index 91aa8d8c3f4d..4d4316ca9f6e 100755 > --- a/tests/btrfs/125 > +++ b/tests/btrfs/125 > @@ -133,6 +133,7 @@ echo "-----Mount normal-----" >> $seqres.full > echo > echo "Mount normal and balance" > > +_run_btrfs_util_prog filesystem sync $SCRATCH_MNT > _scratch_unmount > _run_btrfs_util_prog device scan > _scratch_mount >> $seqres.full 2>&1 > ------ > > HTH. > > Thanks, Anand > > > On 02/07/17 14:09, Qu Wenruo wrote: >> Hi Anand, >> >> I found that btrfs/125 test case can only pass if we enabled space cache. >> >> If using nospace_cache or space_cache=v2 mount option, it will get >> blocked forever with the following callstack(the only blocked process): >> >> [11382.046978] btrfs D11128 6705 6057 0x00000000 >> [11382.047356] Call Trace: >> [11382.047668] __schedule+0x2d4/0xae0 >> [11382.047956] schedule+0x3d/0x90 >> [11382.048283] btrfs_start_ordered_extent+0x160/0x200 [btrfs] >> [11382.048630] ? wake_atomic_t_function+0x60/0x60 >> [11382.048958] btrfs_wait_ordered_range+0x113/0x210 [btrfs] >> [11382.049360] btrfs_relocate_block_group+0x260/0x2b0 [btrfs] >> [11382.049703] btrfs_relocate_chunk+0x51/0xf0 [btrfs] >> [11382.050073] btrfs_balance+0xaa9/0x1610 [btrfs] >> [11382.050404] ? btrfs_ioctl_balance+0x3a0/0x3b0 [btrfs] >> [11382.050739] btrfs_ioctl_balance+0x3a0/0x3b0 [btrfs] >> [11382.051109] btrfs_ioctl+0xbe7/0x27f0 [btrfs] >> [11382.051430] ? trace_hardirqs_on+0xd/0x10 >> [11382.051747] ? free_object+0x74/0xa0 >> [11382.052084] ? debug_object_free+0xf2/0x130 >> [11382.052413] do_vfs_ioctl+0x94/0x710 >> [11382.052750] ? enqueue_hrtimer+0x160/0x160 >> [11382.053090] ? do_nanosleep+0x71/0x130 >> [11382.053431] SyS_ioctl+0x79/0x90 >> [11382.053735] entry_SYSCALL_64_fastpath+0x18/0xad >> [11382.054570] RIP: 0033:0x7f397d7a6787 >> >> I also found in the test case, we only have 3 continuous data extents, >> whose sizes are 1M, 68.5M and 31.5M respectively. >> >> Original data block group: >> 0 1M 64M 69.5M 101M 128M >> | Ext A | Extent B(68.5M) | Extent C(31.5M) | >> >> >> While relocation write them in 4 extents: >> 0~1M :same as Extent A. (1st) >> 1M~68.3438M :smaller than Extent B (2nd) >> 68.3438M~69.5M :tail part of Extent B (3rd) >> 69.5M~ 101M :same as Extent C. (4th) >> >> However only ordered extent of (3rd) and (4th) get finished. >> While ordered extent of (1st) and (2nd) never reached >> finish_ordered_io(). >> >> So relocation will wait for no one to finish the these two ordered >> extent, and get blocked. >> >> Did you experienced the same bug submitting the test case? >> Is there any known fix for it? >> >> Thanks, >> Qu >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1e861a0..b9d0bcb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1497,8 +1497,11 @@ static noinline int run_delalloc_nocow(struct inode *inode, if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID) { + ret = -EIO; + /* ret = btrfs_reloc_clone_csums(inode, cur_offset, num_bytes); + */ if (ret) { if (!nolock && nocow)