From patchwork Mon Apr 23 06:13:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10356205 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id ED2F6601BE for ; Mon, 23 Apr 2018 06:13:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA4B8289BB for ; Mon, 23 Apr 2018 06:13:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CF11A289BE; Mon, 23 Apr 2018 06:13:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,FREEMAIL_FROM, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4210E289BB for ; Mon, 23 Apr 2018 06:13:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751392AbeDWGNM (ORCPT ); Mon, 23 Apr 2018 02:13:12 -0400 Received: from mout.gmx.net ([212.227.15.19]:39513 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751318AbeDWGNK (ORCPT ); Mon, 23 Apr 2018 02:13:10 -0400 Received: from [0.0.0.0] ([207.148.91.157]) by mail.gmx.com (mrgmx001 [212.227.17.184]) with ESMTPSA (Nemesis) id 0Lkwc9-1ea3yT1IDV-00aqTB; Mon, 23 Apr 2018 08:13:09 +0200 Subject: Re: 4.17-rc1 FS went read-only during balance To: Dmitrii Tcvetkov , linux-btrfs@vger.kernel.org References: <20180421175548.4b07dffc@demfloro.ru> <5775f38a-5f17-1f6d-a6cd-289e18188a26@gmx.com> <20180423080745.5a9dc6be@demfloro.ru> From: Qu Wenruo Openpgp: preference=signencrypt Autocrypt: addr=quwenruo.btrfs@gmx.com; prefer-encrypt=mutual; keydata= xsBNBFnVga8BCACyhFP3ExcTIuB73jDIBA/vSoYcTyysFQzPvez64TUSCv1SgXEByR7fju3o 8RfaWuHCnkkea5luuTZMqfgTXrun2dqNVYDNOV6RIVrc4YuG20yhC1epnV55fJCThqij0MRL 1NxPKXIlEdHvN0Kov3CtWA+R1iNN0RCeVun7rmOrrjBK573aWC5sgP7YsBOLK79H3tmUtz6b 9Imuj0ZyEsa76Xg9PX9Hn2myKj1hfWGS+5og9Va4hrwQC8ipjXik6NKR5GDV+hOZkktU81G5 gkQtGB9jOAYRs86QG/b7PtIlbd3+pppT0gaS+wvwMs8cuNG+Pu6KO1oC4jgdseFLu7NpABEB AAHNIlF1IFdlbnJ1byA8cXV3ZW5ydW8uYnRyZnNAZ214LmNvbT7CwJQEEwEIAD4CGwMFCwkI BwIGFQgJCgsCBBYCAwECHgECF4AWIQQt33LlpaVbqJ2qQuHCPZHzoSX+qAUCWdWCnQUJCWYC bgAKCRDCPZHzoSX+qAR8B/94VAsSNygx1C6dhb1u1Wp1Jr/lfO7QIOK/nf1PF0VpYjTQ2au8 ihf/RApTna31sVjBx3jzlmpy+lDoPdXwbI3Czx1PwDbdhAAjdRbvBmwM6cUWyqD+zjVm4RTG rFTPi3E7828YJ71Vpda2qghOYdnC45xCcjmHh8FwReLzsV2A6FtXsvd87bq6Iw2axOHVUax2 FGSbardMsHrya1dC2jF2R6n0uxaIc1bWGweYsq0LXvLcvjWH+zDgzYCUB0cfb+6Ib/ipSCYp 3i8BevMsTs62MOBmKz7til6Zdz0kkqDdSNOq8LgWGLOwUTqBh71+lqN2XBpTDu1eLZaNbxSI ilaVzsBNBFnVga8BCACqU+th4Esy/c8BnvliFAjAfpzhI1wH76FD1MJPmAhA3DnX5JDORcga CbPEwhLj1xlwTgpeT+QfDmGJ5B5BlrrQFZVE1fChEjiJvyiSAO4yQPkrPVYTI7Xj34FnscPj /IrRUUka68MlHxPtFnAHr25VIuOS41lmYKYNwPNLRz9Ik6DmeTG3WJO2BQRNvXA0pXrJH1fN GSsRb+pKEKHKtL1803x71zQxCwLh+zLP1iXHVM5j8gX9zqupigQR/Cel2XPS44zWcDW8r7B0 q1eW4Jrv0x19p4P923voqn+joIAostyNTUjCeSrUdKth9jcdlam9X2DziA/DHDFfS5eq4fEv ABEBAAHCwHwEGAEIACYWIQQt33LlpaVbqJ2qQuHCPZHzoSX+qAUCWdWBrwIbDAUJA8JnAAAK CRDCPZHzoSX+qA3xB/4zS8zYh3Cbm3FllKz7+RKBw/ETBibFSKedQkbJzRlZhBc+XRwF61mi f0SXSdqKMbM1a98fEg8H5kV6GTo62BzvynVrf/FyT+zWbIVEuuZttMk2gWLIvbmWNyrQnzPl mnjK4AEvZGIt1pk+3+N/CMEfAZH5Aqnp0PaoytRZ/1vtMXNgMxlfNnb96giC3KMR6U0E+siA 4V7biIoyNoaN33t8m5FwEwd2FQDG9dAXWhG13zcm9gnk63BN3wyCQR+X5+jsfBaS4dvNzvQv h8Uq/YGjCoV1ofKYh3WKMY8avjq25nlrhzD/Nto9jHp8niwr21K//pXVA81R2qaXqGbql+zo Message-ID: <3d2443c8-0b34-2eea-3adc-2f33570f75b1@gmx.com> Date: Mon, 23 Apr 2018 14:13:03 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180423080745.5a9dc6be@demfloro.ru> X-Provags-ID: V03:K1:wTbLGQCXiu+UVaejjjwJaWpy2BZrGsNGJIpZ2Z3vNYdgUwsM9rF Sq23DkpzBIVgdOsELJzMPVuAtHyPnCZVuboItp9bjT612mJ+rHHiuDSWKKNhNUmH8hq2phm IaH1JdYbCgyceI60jhRBtIfBBvTDixwtlPA5Lz8rrev31xBEuuYg5INtk2xxbWv10FkMrGe +2g4m1xENFjS9FjPNdiEg== X-UI-Out-Filterresults: notjunk:1; V01:K0:EsfN1tFVJlU=:S3r8u+xTsQEoIrfgwyj43p dpRwY+c2IdZ84PTPJFbHU6BvJW5R6rm1N4xOx0nvmv9eWjHAKIy5/2T7n/pM8MtdNQDzeFrZt yDnc8OMLb9qzxBPUkQ/55HGR3GvwqqFmGIPg0HpPd25igaJJNZ9PB5L+kk1CzNHrDR5pCJwPR +kbpHcHDaTYfOwzjU0NIFhf2EIwf4J8ZH54CfZGiU7XYysQpSvwPIwsK5f47sEQH1sCzGk6IH yHVG0ysjOXENNjgxlIN+YAXFJsBF7dUKaFFumgZ1xdGszmQ+s2+6xBECU1NpD55tLu2N/kvAa NqnMPHWfYcQhVU9fhspaGbxKIuProvw62qRF8mrdiP4/unZg1gOqNrkGfiqu8cVlNqewsakM3 BClvMCPFOb29QsvPNJyM3XvPWek4qZtINNnqywVG8dRQtUgmeiB1v4HPY46RmHIpHqRhpTEwq nA6LZCO/tTk/AIPCiVSqB49h4CL0XIS7QA+nGAeAzqc2DF5LFqAr9dcM7sIm+0wzSh7paJ82+ Ud9NIHfw2FCi8q066X3RTn6MpVPY3vum34DuyB8Jv7Vp78ezyGGKzC395Ba3GRnmtCg2arD4s nYmDE2YLIs1QKAvVrC4kskbqnM+VQPyBaLl8JHGyLe7tHyHSkmg8Yd30sAWBV7tzTPUVbQFp+ DT5IM2Ttjt/o76NDXwncRddQ6LQIdMSHxwG2EmWaLbmYtU9IAyVmKAAMnq1VjZrSbxfdeOUHR fgn1gYkv6GKg3U5nrdS90PO+sKSd5l1JkrZ5+Hia5OS1kMSJ+QOlq3jX5MfVa7W/M4SVY+oqz 54sOQR7OK2Qbb3A19hTpTTglF/WDQ== Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 2018年04月23日 13:08, Dmitrii Tcvetkov wrote: > On Mon, 23 Apr 2018 09:23:53 +0800 > Qu Wenruo wrote: > >> On 2018年04月21日 22:55, Dmitrii Tcvetkov wrote: >>> TL;DR It seems as regression in 4.17, but I managed to find a >>> workaround to make filesystem rw mountable again. >>> >>> Kernel built from tag v4.17-rc1 >>> btrfs-progs 4.16 >>> >>> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were >>> doing usual weekly balance with this command via cron: >>> btrfs balance start -musage=50 -dusage=50 >>> Both machines run same kernel version. >>> >>> On PC that caused root and "data" filesystems to go readonly. Root >>> is on an SSD with data single and metadata DUP, "data" filesystem >>> is on 2 HDDs with RAID1 for data and metadata. >>> >>> On laptop only /home went ro, it's on NVMe SSD with data single and >>> metadata DUP. >>> >>> Btrfs check of PC rootfs was without any errors in both modes, I did >>> them once each before reboot on readonly filesystem with --force >>> flag and then from live usb. Same output without any errors. >>> >>> After reboot kernel refused rw mount rootfs with the same error as >>> during cron balance, ro mount was accepted, error during rw mount: >>> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117 > >> 117 means EUCLEAN, which could be caused by the newly introduced >> first_key and level check. > >> Please apply this hotfix to fix it. >> btrfs: Only check first key for committed tree blocks >> (Which is included in latest pull request) > >> Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra >> debug info. > >> Thanks, >> Qu > > I tried 4.17-rc2 (as the pull request was pulled) with > CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb) > in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg > attached. Thanks for the info and your previous btrfs-image. The image itself shows nothing wrong, so it should be runtime problem. Would you please apply these two debug patches? https://patchwork.kernel.org/patch/10335133/ https://patchwork.kernel.org/patch/10335135/ And the attached diff file? My guess is the parent node is not initialized correctly in this case. Thanks, Qu diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 60caa68c3618..79f482578e02 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -458,6 +458,7 @@ static int verify_level_key(struct btrfs_fs_info *fs_info, eb->start, first_key->objectid, first_key->type, first_key->offset, found_key.objectid, found_key.type, found_key.offset); + btrfs_print_tree(eb, false); } #endif return ret; diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 00b7d3231821..cde0cb6c9786 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1870,6 +1870,8 @@ int replace_path(struct btrfs_trans_handle *trans, level - 1, &first_key); if (IS_ERR(eb)) { ret = PTR_ERR(eb); + btrfs_err(fs_info, "parent leaf, slot: %d:", slot); + btrfs_print_tree(parent, false); break; } else if (!extent_buffer_uptodate(eb)) { ret = -EIO;