[v2,02/11] btrfs: fix swap file activation failure due to extents that used to be shared

From: Filipe Manana <fdmanana@suse.com>

From: Filipe Manana <fdmanana@suse.com>

When activating a swap file, to determine if an extent is shared we use
can_nocow_extent(), which ends up at btrfs_cross_ref_exist(). That helper
is meant to be quick because it's used in the NOCOW write path, when
flushing delalloc and when doing a direct IO write, however it does return
some false positives, meaning it may indicate that an extent is shared
even if it's no longer the case. For the write path this is fine, we just
do a unnecessary COW operation instead of doing a more rigorous check
which would be too heavy (calling btrfs_is_data_extent_shared()).

However when activating a swap file, the false positives simply result
in a failure, which is confusing for users/applications. One particular
case where this happens is when a data extent only has 1 reference but
that reference is not inlined in the extent item located in the extent
tree - this happens when we create more than 33 references for an extent
and then delete those 33 references plus every other non-inline reference
except one. The function check_committed_ref() assumes that if the size
of an extent item doesn't match the size of struct btrfs_extent_item
plus the size of an inline reference (plus an owner reference in case
simple quotas are enabled), then the extent is shared - that is not the
case however, we can have a single reference but it's not inlined - the
reason we do this is to be fast and avoid inspecting non-inline references
which may be located in another leaf of the extent tree, slowing down
write paths.

The following test script reproduces the bug:

   $ cat test.sh
   #!/bin/bash

   DEV=/dev/sdi
   MNT=/mnt/sdi
   NUM_CLONES=50

   umount $DEV &> /dev/null

   run_test()
   {
        local sync_after_add_reflinks=$1
        local sync_after_remove_reflinks=$2

        mkfs.btrfs -f $DEV > /dev/null
        #mkfs.xfs -f $DEV > /dev/null
        mount $DEV $MNT

        touch $MNT/foo
        chmod 0600 $MNT/foo
   	# On btrfs the file must be NOCOW.
        chattr +C $MNT/foo &> /dev/null
        xfs_io -s -c "pwrite -b 1M 0 1M" $MNT/foo
        mkswap $MNT/foo

        for ((i = 1; i <= $NUM_CLONES; i++)); do
            touch $MNT/foo_clone_$i
            chmod 0600 $MNT/foo_clone_$i
            # On btrfs the file must be NOCOW.
            chattr +C $MNT/foo_clone_$i &> /dev/null
            cp --reflink=always $MNT/foo $MNT/foo_clone_$i
        done

        if [ $sync_after_add_reflinks -ne 0 ]; then
            # Flush delayed refs and commit current transaction.
            sync -f $MNT
        fi

        # Remove the original file and all clones except the last.
        rm -f $MNT/foo
        for ((i = 1; i < $NUM_CLONES; i++)); do
            rm -f $MNT/foo_clone_$i
        done

        if [ $sync_after_remove_reflinks -ne 0 ]; then
            # Flush delayed refs and commit current transaction.
            sync -f $MNT
        fi

        # Now use the last clone as a swap file. It should work since
        # its extent are not shared anymore.
        swapon $MNT/foo_clone_${NUM_CLONES}
        swapoff $MNT/foo_clone_${NUM_CLONES}

        umount $MNT
   }

   echo -e "\nTest without sync after creating and removing clones"
   run_test 0 0

   echo -e "\nTest with sync after creating clones"
   run_test 1 0

   echo -e "\nTest with sync after removing clones"
   run_test 0 1

   echo -e "\nTest with sync after creating and removing clones"
   run_test 1 1

Running the test:

   $ ./test.sh
   Test without sync after creating and removing clones
   wrote 1048576/1048576 bytes at offset 0
   1 MiB, 1 ops; 0.0017 sec (556.793 MiB/sec and 556.7929 ops/sec)
   Setting up swapspace version 1, size = 1020 KiB (1044480 bytes)
   no label, UUID=a6b9c29e-5ef4-4689-a8ac-bc199c750f02
   swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument
   swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument

   Test with sync after creating clones
   wrote 1048576/1048576 bytes at offset 0
   1 MiB, 1 ops; 0.0036 sec (271.739 MiB/sec and 271.7391 ops/sec)
   Setting up swapspace version 1, size = 1020 KiB (1044480 bytes)
   no label, UUID=5e9008d6-1f7a-4948-a1b4-3f30aba20a33
   swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument
   swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument

   Test with sync after removing clones
   wrote 1048576/1048576 bytes at offset 0
   1 MiB, 1 ops; 0.0103 sec (96.665 MiB/sec and 96.6651 ops/sec)
   Setting up swapspace version 1, size = 1020 KiB (1044480 bytes)
   no label, UUID=916c2740-fa9f-4385-9f06-29c3f89e4764

   Test with sync after creating and removing clones
   wrote 1048576/1048576 bytes at offset 0
   1 MiB, 1 ops; 0.0031 sec (314.268 MiB/sec and 314.2678 ops/sec)
   Setting up swapspace version 1, size = 1020 KiB (1044480 bytes)
   no label, UUID=06aab1dd-4d90-49c0-bd9f-3a8db4e2f912
   swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument
   swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument

Fix this by reworking btrfs_swap_activate() to instead of using extent
maps and checking for shared extents with can_nocow_extent(), iterate
over the inode's file extent items and use the accurate
btrfs_is_data_extent_shared().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/inode.c | 96 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 69 insertions(+), 27 deletions(-)

Message ID	bda8a1de78c3d938a71a816401f96f0e0d6c3f72.1733929328.git.fdmanana@suse.com (mailing list archive)
State	New
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F10419F131 for <linux-btrfs@vger.kernel.org>; Wed, 11 Dec 2024 15:05:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733929514; cv=none; b=tFZrQ2O1g67PP0K0tY46Iymmifs7V27QFuNSxbqb/aM98rrT2ZDtEmcXlFsAY/9XX/bx0OyU1K0gx5bCU5y2OqljW2OdwswTDWm7COkUfoH+x8JYehGbzz9eioL3yg3js3dmsv+amW0/xarkxyJH0dvvWtA5WeefNIXBbsjOgBU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733929514; c=relaxed/simple; bh=FfqGGrIJ8xqJ4u62KUDF387J3v4JKcvwUsLGzjUoBuA=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VttsiFV6bRqQdMsF9aREiuH9bBEacExCFVrx+LK/55FH0YWDWubNpRgHHfcebbt7rggQpELBixhmk3LJNnEaTYwalVXs0gVrntYrkCQ96Iq2owzMdsiGg1tlYcojfMIrpQ8XdBsljJwy/IcGLn1gSRpFfcGkPG73i/4e57JYNm0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NlfD/3Em; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NlfD/3Em" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5E3A1C4CED7 for <linux-btrfs@vger.kernel.org>; Wed, 11 Dec 2024 15:05:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733929513; bh=FfqGGrIJ8xqJ4u62KUDF387J3v4JKcvwUsLGzjUoBuA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=NlfD/3EmGAU7xK66R4wsAvqGezkB3Bcech0YwnCoMVvG6NR5kSlIcaCmkFdOKXB5a /oxW02QJWc55nMxLkRMtLyoeQ5tFj/0obJHUOAmL2JqlPQ8slzJNtBw0YGSyKkq8wt UPyFw5btJ/n5wnb2CswW4xWVGdMHDs2i0ZN0jEq3erojxwgy+vYdf7FMpf+Z/StQxd WyezsDGaZU16H0z7Cns+o8IpGmSh6RFEALUVo6DNmaEKfwOVWZLDL/pU7og4ic3O35 4AxAiph8+fNFpittQ9nXnboyOzI5Ydhz3Cl5SKVI7btZDAH0fEnXb7fso4psqio/nj Z57DtAGC8m6zw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 02/11] btrfs: fix swap file activation failure due to extents that used to be shared Date: Wed, 11 Dec 2024 15:04:59 +0000 Message-Id: <bda8a1de78c3d938a71a816401f96f0e0d6c3f72.1733929328.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <cover.1733929327.git.fdmanana@suse.com> References: <cover.1733929327.git.fdmanana@suse.com> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: <linux-btrfs.vger.kernel.org> List-Subscribe: <mailto:linux-btrfs+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-btrfs+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	btrfs: fixes around swap file activation and cleanups \| expand [v2,00/11] btrfs: fixes around swap file activation and cleanups [v2,01/11] btrfs: fix race with memory mapped writes when activating swap file [v2,02/11] btrfs: fix swap file activation failure due to extents that used to be shared [v2,03/11] btrfs: allow swap activation to be interruptible [v2,04/11] btrfs: avoid monopolizing a core when activating a swap file [v2,05/11] btrfs: remove no longer needed strict argument from can_nocow_extent() [v2,06/11] btrfs: remove the snapshot check from check_committed_ref() [v2,07/11] btrfs: avoid redundant call to get inline ref type at check_committed_ref() [v2,08/11] btrfs: simplify return logic at check_committed_ref() [v2,09/11] btrfs: simplify arguments for btrfs_cross_ref_exist() [v2,10/11] btrfs: add function comment for check_committed_ref() [v2,11/11] btrfs: add assertions and comment about path expectations to btrfs_cross_ref_exist()

[v2,02/11] btrfs: fix swap file activation failure due to extents that used to be shared

Commit Message

Patch