From patchwork Tue Feb 25 10:57:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13989799 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E36D267B81 for ; Tue, 25 Feb 2025 10:57:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740481035; cv=none; b=B7MW4PJvDbztxkpso0J1BzHkov3hvGeZxbWQcAvHQF0730fp/MtIhDxXo5l1ywlXgdCkMq5/TssV5A3AzOuiR2uKgqLRt52B0dOmcgK/XeScCY4f3Q4O2ClqmTp70X0b3pFN8k3IaBSB9X+/9ugNJBQXJIK4tft1tKZHio+No20= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740481035; c=relaxed/simple; bh=3e7HJO5pm7tc/hMqRj1pzs9iKBBUDGyVsuHPSbjaiVA=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=E95kRTYBXwHvmUkT+0mVfVKikaaOV/zMY6H4fEEpx71Q6CLwC6MJ8RPvWFa6/oYKWHWfArk2+vXth24HHd4fjA0NPdzJCYHOkAkpew6YBe3e5fLblEIx8ayJ3/rpXMQiFg1PTn1XckmtC1MFuNmgWkVykunQzjpHxwPqo67YVdg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LLfyTycd; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LLfyTycd" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 139A5C4CEE8 for ; Tue, 25 Feb 2025 10:57:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1740481034; bh=3e7HJO5pm7tc/hMqRj1pzs9iKBBUDGyVsuHPSbjaiVA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=LLfyTycdlQNVrJUjTmv/DXPCN3pF9h22srGmG2iSjdenKOZM5VgP7ZWKM7UUYt41m LOVx97Ya6LxJq6RkBiF7y8YeOhOfu/et/FinqixkB7FXsVHWCAM/L3O6FbJnh7YKC3 bhn9Z2MlVu836NmEM64A+ok/8FT5RAfakon+OtjZxn3zqQnph7FSFWj82IYlWne9wO j3TJz4tWQ65joVY6JHxWshfXrEFi3ezoOBNppD+ULxTj3aqt+fLnwyVTTZWCZBkJ73 3DZn0q/6f0NRG4E6ZbsjbCjFDIddKFm1jEmJmjH5zF1WrKxvz3lAprXlM7twSPWl+5 mvZ2R+haKRW0Q== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/3] btrfs: get zone unusable bytes while holding lock at btrfs_reclaim_bgs_work() Date: Tue, 25 Feb 2025 10:57:07 +0000 Message-Id: <2e713972ad284809bb889a5bd52da87777bb885b.1740427964.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana At btrfs_reclaim_bgs_work(), we are grabbing a block group's zone unusable bytes while not under the protection of the block group's spinlock, so this can trigger race reports from KCSAN (or similar tools) since that field is typically updated while holding the lock, such as at __btrfs_add_free_space_zoned() for example. Fix this by grabbing the zone unusable bytes while we are still in the critical section holding the block group's spinlock, which is right above where we are currently grabbing it. Signed-off-by: Filipe Manana Reviewed-by: Johannes Thumshirn --- fs/btrfs/block-group.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 18f58674a16c..2e174c14ca0a 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1887,6 +1887,17 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) up_write(&space_info->groups_sem); goto next; } + + /* + * Cache the zone_unusable value before turning the block group + * to read only. As soon as the blog group is read only it's + * zone_unusable value gets moved to the block group's read-only + * bytes and isn't available for calculations anymore. We also + * cache it before unlocking the block group, to prevent races + * (reports from KCSAN and such tools) with tasks updating it. + */ + zone_unusable = bg->zone_unusable; + spin_unlock(&bg->lock); spin_unlock(&space_info->lock); @@ -1903,13 +1914,6 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) goto next; } - /* - * Cache the zone_unusable value before turning the block group - * to read only. As soon as the blog group is read only it's - * zone_unusable value gets moved to the block group's read-only - * bytes and isn't available for calculations anymore. - */ - zone_unusable = bg->zone_unusable; ret = inc_block_group_ro(bg, 0); up_write(&space_info->groups_sem); if (ret < 0) From patchwork Tue Feb 25 10:57:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13989800 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 27297267B95 for ; Tue, 25 Feb 2025 10:57:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740481037; cv=none; b=SjpPY0PL2ka5pb/0pIJ8Qer9GH0P2F0gXA4YYzRT/egg27WXrrsAWCGAkxZqaKR+xiAjETah9JLPU8IrOJau+/FsamSpgn8CbmtFYM6UUlD2HVLrBRJ63mcZF1rFUKBCepOYRvw6jk2A5/BUb6XXA0LqKg0y06Tv4/MdTdLmLJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740481037; c=relaxed/simple; bh=X5P0Y2CBY6pwJ6RBDUxBbkURSWw55q0QZ35SxpAX2b8=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=h0FbcjsFQ2TjYq6cnhDN1VhzvTLgmLFsLCPNmAPbHdkID2kE02JdR5IGu1ijMX2bbHjgcaRfCNdbJq8tVUXRITkECDKPhArjZOp1RywTqFNPpA9Tc+4GjgeF1ikL+H5pE8pVcodTCX7VeTzkFfEBxonEodFyrhFyFaAdLHpBA8E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WibkTWfs; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WibkTWfs" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 19068C4CEE9 for ; Tue, 25 Feb 2025 10:57:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1740481035; bh=X5P0Y2CBY6pwJ6RBDUxBbkURSWw55q0QZ35SxpAX2b8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=WibkTWfsS5uL1kN7OMKJSPnZpqZGX0/CvlNu9CNNXkMJ7IBNp+fDH0ykgez5U717d NUqRG6lEgfGCpILvdv+LADI6hKYLB8hy8xEqGO4JZLlOYCWRyugAnKIn2POTdMBbTh kNoeCtOFQg5szBE9hzFGX7p05GFDPfCo0MPJBuEoZsvAJBZ86WgBRnBFXhVvJwRN6U N+c0K2qVO+WEF8FCLiojIA+Ca2dGrPD0GhCJie8+A7MxCnF9QpB4MW7fTfNNZZLnea mNXir85IJYfnMxBkpGI+8U+JW9d0owTz3R0Lz0ydCI4oz0lNKvEJmwgDpyPd88Mtai t9FXfuBHG20LQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/3] btrfs: get used bytes while holding lock at btrfs_reclaim_bgs_work() Date: Tue, 25 Feb 2025 10:57:08 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana At btrfs_reclaim_bgs_work(), we are grabbing twice the used bytes counter of the block group while not holding the block group's spinlock. This can result in races, reported by KCSAN and similar tools, since a concurrent task can be updating that counter while at btrfs_update_block_group(). So avoid these races by grabbing the counter in the critical section right above that is delimited by the block group's spinlock. This also avoids using two different values of the counter in case it changes in between each read. This silences KCSAN and is required for the next patch in the series too. Fixes: 243192b67649 ("btrfs: report reclaim stats in sysfs") Signed-off-by: Filipe Manana --- fs/btrfs/block-group.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 2e174c14ca0a..1200c5efeb3d 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1823,7 +1823,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) list_sort(NULL, &fs_info->reclaim_bgs, reclaim_bgs_cmp); while (!list_empty(&fs_info->reclaim_bgs)) { u64 zone_unusable; - u64 reclaimed; + u64 used; int ret = 0; bg = list_first_entry(&fs_info->reclaim_bgs, @@ -1919,19 +1919,30 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) if (ret < 0) goto next; + /* + * Grab the used bytes counter while holding the block groups's + * spinlock to prevent races with tasks concurrently updating it + * due to extent allocation and deallocation (running + * btrfs_update_block_group()) - we have set the block group to + * RO but that only prevents extent reservation, allocation + * happens after reservation. + */ + spin_lock(&bg->lock); + used = bg->used; + spin_unlock(&bg->lock); + btrfs_info(fs_info, "reclaiming chunk %llu with %llu%% used %llu%% unusable", bg->start, - div64_u64(bg->used * 100, bg->length), + div64_u64(used * 100, bg->length), div64_u64(zone_unusable * 100, bg->length)); trace_btrfs_reclaim_block_group(bg); - reclaimed = bg->used; ret = btrfs_relocate_chunk(fs_info, bg->start); if (ret) { btrfs_dec_block_group_ro(bg); btrfs_err(fs_info, "error relocating chunk %llu", bg->start); - reclaimed = 0; + used = 0; spin_lock(&space_info->lock); space_info->reclaim_errors++; if (READ_ONCE(space_info->periodic_reclaim)) @@ -1940,7 +1951,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) } spin_lock(&space_info->lock); space_info->reclaim_count++; - space_info->reclaim_bytes += reclaimed; + space_info->reclaim_bytes += used; spin_unlock(&space_info->lock); next: From patchwork Tue Feb 25 10:57:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13989801 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 272F9267B97 for ; Tue, 25 Feb 2025 10:57:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740481037; cv=none; b=IKrbtYeMDmik8VN0bpkov/+Med/mqlMcnGuqOWxgNvYxZS32s+H92KXRwttNOGd/Nyzxv57JaZKeLkdCXB0ABsWb2oz7pVHQyC7HmrgDnLZwFeGY75sodaCdMBve4Mq1ApTjX5lGIRn49gePsF+O+4pxgr1uf7TKcsZAuAxLVF4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740481037; c=relaxed/simple; bh=nvmRoddYwCo9VnpYjEOMMVhKKNithGCmf3o/eCXIyb4=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nQNoNJZpV0FiytjJv1yEPP+A9WEO0l9nZXH3A1fxb+PW+djHp/7s7brzHdSzSkldVGadORjz4vleBwfmdvTuYeeXpPVc4WJpnfZzQwKgeNDzi2uX7RtP/oVFpbRd23uVVQmixgyYPUFGA7CP1sHPVOx5U/S9/OTn/9E1SZkIea8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DP2AwMv3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DP2AwMv3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1F791C4AF09 for ; Tue, 25 Feb 2025 10:57:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1740481036; bh=nvmRoddYwCo9VnpYjEOMMVhKKNithGCmf3o/eCXIyb4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=DP2AwMv3n407U7Ju/QG7CjwIRB/J9YkiYXzZ6vK4QKttCC41GETJJ/5q2znzGyliv QwCsBqYz+fPSDSHJFLBIuDezhM8A7azuDNB1khyIUSHhMk7yTaHWalwamNlDx9qr+Z ebQuAM7F0zM668LoSYUsD75R9iWQ21NN+7SrOC/0T7GiMgDU4Esx4BOcM/Uh1XaTWY r6M+2LujKB7hvRw34wE7YWPXtz7PzxuwWxGGRgUIYWjn4YGoPd5VcikyqXIXP6Hpe6 Pz+xIChAT1lxwXn0Hs30sDzIrEOuGsMFHWYgEyIkZEtY1dvdmy6gKLq+3r7BQ7KbIR fhs3JTbNOI3cA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/3] btrfs: fix reclaimed bytes accounting after automatic block group reclaim Date: Tue, 25 Feb 2025 10:57:09 +0000 Message-Id: <1890b9fc004d4c889f5b8b10a61ccc1395ded318.1740427964.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana We are considering the used bytes counter of a block group as the amount to update the space info's reclaim bytes counter after relocating the block group, but this value alone is often not enough. This is because we may have a reserved extent (or more) and in that case its size is reflected in the reserved counter of the block group - the size of the extent is only transferred from the reserved counter to the used counter of the block group when the delayed ref for the extent is run - typically when committing the transaction (or when flushing delayed refs due to ENOSPC on space reservation). Such call chain for data extents is: btrfs_run_delayed_refs_for_head() run_one_delayed_ref() run_delayed_data_ref() alloc_reserved_file_extent() alloc_reserved_extent() btrfs_update_block_group() -> transfers the extent size from the reserved counter to the used counter For metadata extents: btrfs_run_delayed_refs_for_head() run_one_delayed_ref() run_delayed_tree_ref() alloc_reserved_tree_block() alloc_reserved_extent() btrfs_update_block_group() -> transfers the extent size from the reserved counter to the used counter Since relocation flushes delalloc, waits for ordered extent completion and commits the current transaction before doing the actual relocation work, the correct amount of reclaimed space is therefore the sum of the "used" and "reserved" counters of the block group before we call btrfs_relocate_chunk() at btrfs_reclaim_bgs_work(). So fix this by taking the "reserved" counter into consideration. Fixes: 243192b67649 ("btrfs: report reclaim stats in sysfs") Signed-off-by: Filipe Manana --- fs/btrfs/block-group.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 1200c5efeb3d..c0c247ecbe9a 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1824,6 +1824,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) while (!list_empty(&fs_info->reclaim_bgs)) { u64 zone_unusable; u64 used; + u64 reserved; int ret = 0; bg = list_first_entry(&fs_info->reclaim_bgs, @@ -1920,21 +1921,32 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) goto next; /* - * Grab the used bytes counter while holding the block groups's - * spinlock to prevent races with tasks concurrently updating it - * due to extent allocation and deallocation (running - * btrfs_update_block_group()) - we have set the block group to - * RO but that only prevents extent reservation, allocation - * happens after reservation. + * The amount of bytes reclaimed corresponds to the sum of the + * "used" and "reserved" counters. We have set the block group + * to RO above, which prevents reservations from happening but + * we may have existing reservations for which allocation has + * not yet been done - btrfs_update_block_group() was not yet + * called, which is where we will transfer a reserved extent's + * size from the "reserved" counter to the "used" counter - this + * happens when running delayed references. When we relocate the + * chunk below, relocation first flushes dellaloc, waits for + * ordered extent completion (which is where we create delayed + * references for data extents) and commits the current + * transaction (which runs delayed references), and only after + * it does the actual work to move extents out of the block + * group. So the reported amount of reclaimed bytes is + * effectively the sum of the 'used' and 'reserved' counters. */ spin_lock(&bg->lock); used = bg->used; + reserved = bg->reserved; spin_unlock(&bg->lock); btrfs_info(fs_info, - "reclaiming chunk %llu with %llu%% used %llu%% unusable", + "reclaiming chunk %llu with %llu%% used %llu%% reserved %llu%% unusable", bg->start, div64_u64(used * 100, bg->length), + div64_u64(reserved * 100, bg->length), div64_u64(zone_unusable * 100, bg->length)); trace_btrfs_reclaim_block_group(bg); ret = btrfs_relocate_chunk(fs_info, bg->start); @@ -1943,6 +1955,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) btrfs_err(fs_info, "error relocating chunk %llu", bg->start); used = 0; + reserved = 0; spin_lock(&space_info->lock); space_info->reclaim_errors++; if (READ_ONCE(space_info->periodic_reclaim)) @@ -1952,6 +1965,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) spin_lock(&space_info->lock); space_info->reclaim_count++; space_info->reclaim_bytes += used; + space_info->reclaim_bytes += reserved; spin_unlock(&space_info->lock); next: