From patchwork Fri Mar 23 05:26:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eryu Guan X-Patchwork-Id: 10302737 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B3F28605F7 for ; Fri, 23 Mar 2018 05:26:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 944D928BD5 for ; Fri, 23 Mar 2018 05:26:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 88AD128BD3; Fri, 23 Mar 2018 05:26:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,FREEMAIL_FROM,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E5DBF28BD0 for ; Fri, 23 Mar 2018 05:26:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751388AbeCWF0R (ORCPT ); Fri, 23 Mar 2018 01:26:17 -0400 Received: from mail-pg0-f66.google.com ([74.125.83.66]:44451 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751303AbeCWF0Q (ORCPT ); Fri, 23 Mar 2018 01:26:16 -0400 Received: by mail-pg0-f66.google.com with SMTP id v26so1688701pge.11; Thu, 22 Mar 2018 22:26:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=ttiNGZm4qD+zc4ccOFx1B/XATrJgTV447rS/gsNgDF4=; b=dg1ckW3RFp6iLunjr1U4ny84Uv28fwN5XZjkmxzgu+gQRZPiEa5zDP9BzaE2knZAte nM6leh+B+4VoicSmpEmd2ZoLCFKSaL1jYh6NBr9npM49phX9HJ49p0WD6uxhJNMjAGqa ff1Zkzw8b2zty9xE/SFuR0XhAmll0TSN92UY2Fs4XDifVWGVLJ1Ml0pd4zFVvNt1IrQY aZ8GgArY7Nucv9QoZSIemmQNAwFljKHf5aBI78zf1URAQtfDvBuqW3PI05ppfL/5mAbL QhSO4KV0y2p5a8G0XY5WTos14aYOoXFaij32sT+GP2dC3+OL9K1WG1d7ufTEdF80c+iI 4gMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=ttiNGZm4qD+zc4ccOFx1B/XATrJgTV447rS/gsNgDF4=; b=s2xeVLUBsvn2xF928PBuQnL03GPd+cnuyZKffjiEZzneCRXdJ/e+45drwiCZ6gBtCJ M+9+8p+MxeM4n+xGtqgTO4ZplxdAF+QlZ3BeXOXObXOgHI54vxmSGTKaC1b4JCr9v5gO 3ZzHCr3uM7prQ/hbfVopAyYB63YaapQYV95LXDCaYQSv/IvkMfF3umC3m0MsqEAeE5wG sqjc1o39cugTPd1nHGNY0HyLbopNgprS4XFhGQJycyQEn9fq4+r6now+2lIUX+ZW8x/0 2iXGEPCE4WrObXB0T85G0u1S6lKGycnInww3C8deWr6G20J4jMKw33FeP8cF7tgQolDr jFOg== X-Gm-Message-State: AElRT7GwOkbjYJkwat1nzROczJ45P+ny0r1vaj0xH2XsBjf8igjXWsxy WOT+93TfbJDmqJ/Nko3fjBw= X-Google-Smtp-Source: AG47ELub4vyFd9jxxkdmhFZZJatKmyNCRkc5UZefKOASUpuVgMqvC7ckRq415EHr07wDjj24rhobGg== X-Received: by 10.98.5.71 with SMTP id 68mr22550183pff.241.1521782775208; Thu, 22 Mar 2018 22:26:15 -0700 (PDT) Received: from localhost ([128.199.137.77]) by smtp.gmail.com with ESMTPSA id l90sm16626060pfg.5.2018.03.22.22.26.12 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 22 Mar 2018 22:26:14 -0700 (PDT) Date: Fri, 23 Mar 2018 13:26:06 +0800 From: Eryu Guan To: "Darrick J. Wong" Cc: Brian Foster , linux-xfs@vger.kernel.org, david@fromorbit.com, fstests Subject: Re: [PATCH v3] xfs: test agfl reset on bad list wrapping Message-ID: <20180323052540.GZ30836@localhost.localdomain> References: <20180321165716.GB4818@magnolia> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20180321165716.GB4818@magnolia> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Mar 21, 2018 at 09:57:16AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong > > From the kernel patch that this test examines ("xfs: detect agfl count > corruption and reset agfl"): > > "The struct xfs_agfl v5 header was originally introduced with > unexpected padding that caused the AGFL to operate with one less > slot than intended. The header has since been packed, but the fix > left an incompatibility for users who upgrade from an old kernel > with the unpacked header to a newer kernel with the packed header > while the AGFL happens to wrap around the end. The newer kernel > recognizes one extra slot at the physical end of the AGFL that the > previous kernel did not. The new kernel will eventually attempt to > allocate a block from that slot, which contains invalid data, and > cause a crash. > > "This condition can be detected by comparing the active range of the > AGFL to the count. While this detects a padding mismatch, it can > also trigger false positives for unrelated flcount corruption. Since > we cannot distinguish a size mismatch due to padding from unrelated > corruption, we can't trust the AGFL enough to simply repopulate the > empty slot. > > "Instead, avoid unnecessarily complex detection logic and and use a > solution that can handle any form of flcount corruption that slips > through read verifiers: distrust the entire AGFL and reset it to an > empty state. Any valid blocks within the AGFL are intentionally > leaked. This requires xfs_repair to rectify (which was already > necessary based on the state the AGFL was found in). The reset > mitigates the side effect of the padding mismatch problem from a > filesystem crash to a free space accounting inconsistency." > > This test exercises the reset code by mutating a fresh filesystem to > contain an agfl with various list configurations of correctly wrapped, > incorrectly wrapped, not wrapped, and actually corrupt free lists; then > checks the success of the reset operation by fragmenting the free space > btrees to exercise the agfl. Kernels without this reset fix will shut > down the filesystem with corruption errors. > > Signed-off-by: Darrick J. Wong > --- > v3: use fallocate instead of dd write, more factoring of common code > v2: remove unncessary umounts, refactor long lines into helpers > --- > common/rc | 23 ++++- > tests/xfs/709 | 258 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/xfs/709.out | 13 +++ > tests/xfs/group | 1 > 4 files changed, 293 insertions(+), 2 deletions(-) > create mode 100755 tests/xfs/709 > create mode 100644 tests/xfs/709.out > > diff --git a/common/rc b/common/rc > index 2c29d55..f7eb72d 100644 > --- a/common/rc > +++ b/common/rc > @@ -3440,6 +3440,26 @@ _get_device_size() > grep `_short_dev $1` /proc/partitions | awk '{print $3}' > } > > +# Make sure we actually have dmesg checking set up. > +_require_check_dmesg() { > + test -w /dev/kmsg || \ > + _notrun "Test requires writable /dev/kmsg." > +} > + > +# Return the dmesg log since the start of this test. Caller must ensure that > +# /dev/kmsg was writable when the test was started so that we can find the > +# beginning of this test's log messages; _require_check_dmesg does this. > +_dmesg_since_test_start() { > + dmesg | tac | sed -ne "0,\#run fstests $seqnum at $date_time#p" | \ > + tac > +} > + > +# check dmesg log for a specific string, subject to the same requirements as > +# _dmesg_since_test_start. > +_check_dmesg_for() { > + _dmesg_since_test_start | egrep -q "$1" > +} > + > # check dmesg log for WARNING/Oops/etc. > _check_dmesg() > { > @@ -3455,8 +3475,7 @@ _check_dmesg() > > # search the dmesg log of last run of $seqnum for possible failures > # use sed \cregexpc address type, since $seqnum contains "/" The comments about sed usage probably should go to _dmesg_since_test_start() too. > - dmesg | tac | sed -ne "0,\#run fstests $seqnum at $date_time#p" | \ > - tac | $filter >$seqres.dmesg > + _dmesg_since_test_start | $filter >$seqres.dmesg > egrep -q -e "kernel BUG at" \ > -e "WARNING:" \ > -e "BUG:" \ > diff --git a/tests/xfs/709 b/tests/xfs/709 > new file mode 100755 > index 0000000..78cefe5 > --- /dev/null > +++ b/tests/xfs/709 > @@ -0,0 +1,258 @@ > +#! /bin/bash > +# FS QA Test No. 709 > +# > +# Make sure XFS can fix a v5 AGFL that wraps over the last block. > +# Refer to commit 96f859d52bcb ("libxfs: pack the agfl header structure so > +# XFS_AGFL_SIZE is correct") for details on the original on-disk format error > +# and the patch "xfs: detect agfl count corruption and reset agfl") for details > +# about the fix. > +# > +#----------------------------------------------------------------------- > +# Copyright (c) 2018 Oracle, Inc. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +# > +#----------------------------------------------------------------------- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 > +trap "_cleanup; rm -f $tmp.*; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +rm -f $seqres.full > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# real QA test starts here > +_supported_fs xfs > +_supported_os Linux > + > +_require_check_dmesg > +_require_scratch > +_require_test_program "punch-alternating" > + > +# This is only a v5 filesystem problem > +_require_scratch_xfs_crc > + > +mount_loop() { > + if ! _try_scratch_mount >> $seqres.full 2>&1; then > + echo "scratch mount failed" >> $seqres.full > + return > + fi > + > + # Trigger agfl fixing by fragmenting free space enough to cause > + # a bnobt split > + blksz=$(_get_file_block_size ${SCRATCH_MNT}) > + bno_maxrecs=$(( blksz / 8 )) > + filesz=$((bno_maxrecs * 3 * blksz)) > + rm -rf $SCRATCH_MNT/a > + $XFS_IO_PROG -f -c "falloc 0 $filesz" $SCRATCH_MNT/a And I noticed test failure with patch "xfs: detect agfl count corruption and reset agfl" applied on top of 4.16-rc5 kernel. Looks like we should dump xfs_io output to $seqres.full, as in v2 patch dd if=/dev/zero of=$SCRATCH_MNT/a bs=8192k >> $seqres.full 2>&1 > + test -e $SCRATCH_MNT/a && ./src/punch-alternating $SCRATCH_MNT/a > + rm -rf $SCRATCH_MNT/a > + > + _scratch_unmount 2>&1 | _filter_scratch > +} > + > +dump_ag0() { > + _scratch_xfs_db -c 'sb 0' -c 'p' -c 'agf 0' -c 'p' -c 'agfl 0' -c 'p' > +} > + > +runtest() { > + cmd="$1" > + > + # Format filesystem > + echo "TEST $cmd" | tee /dev/ttyprintk What's the purpose of writing to /dev/ttyprintk? I don't see how it's used in the test. Thanks, Eryu --- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- tests/xfs/709.out 2018-03-23 12:45:16.831011711 +0800 +++ /root/workspace/xfstests/results//xfs_4k_crc/xfs/709.out.bad 2018-03-23 13:12:10.083980820 +0800 @@ -7,6 +7,7 @@ TEST good_start TEST good_wrap TEST bad_start +fallocate: Structure needs cleaning ASSERT flfirst < good_agfl_size - 1 ASSERT flfirst < fllast TEST no_move