From patchwork Fri Sep 8 12:09:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377394 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77DA5EC8748 for ; Fri, 8 Sep 2023 12:09:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242880AbjIHMJd (ORCPT ); Fri, 8 Sep 2023 08:09:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237407AbjIHMJc (ORCPT ); Fri, 8 Sep 2023 08:09:32 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABB611BE7 for ; Fri, 8 Sep 2023 05:09:28 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CE84CC433CA for ; Fri, 8 Sep 2023 12:09:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174968; bh=3r+8KXdnEbNUFoJkeo/6HM/e/oJDp2xbWJBvwB9oz0Y=; h=From:To:Subject:Date:In-Reply-To:References:From; b=BvRWpYBJRWiom/+nDLRVm19RfCtuCeVYy80UNFSuZqss1DUvUg4ZhAJsYPcxR9X5H mzwJ14WJtcmCndNmzPk98ONQihM2ChNI7IIOnKJ0//ABgZlRMs/JLhfL66hFG1lvAZ ocMxkm22/qZKEXiBSVPp7Oku3XAbv7QZDmyDSAN+QrWllYDyyBxYkKYoi4Hcfon+rM tgokc8RS3jo4VUx59BRGA2A9LzcQxcR9s0aTmgYIV2lrEsjYCzjf8oizS1xGCfPbFj Gdci1yaLSG8ZFePHDDtX47VTMUJajc/Jga4W6KN69/Hc+3YAI8QZMHhXI8logQR7lF 4X379llsORlZg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 01/21] btrfs: fix race when refilling delayed refs block reserve Date: Fri, 8 Sep 2023 13:09:03 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana If we have two (or more) tasks attempting to refill the delayed refs block reserve we can end up with the delayed block reserve being over reserved, that is, with a reserved space greater than its size. If this happens, we are holding to more reserved space than necessary, however it may or may not be harmless. In case the delayed refs block reserve size later increases to match or go beyond the reserved space, then nothing bad happens, we end up simply avoiding doing a space reservation later at the expense of doing it now. However if the delayed refs block reserve size never increases to match its reserved size, then at unmount time we will trigger the following warning from btrfs_release_global_block_rsv(): WARN_ON(fs_info->delayed_block_rsv.reserved > 0); The race happens like this: 1) The delayed refs block reserve has a size of 8M and a reserved space of 6M for example; 2) Task A calls btrfs_delayed_refs_rsv_refill(); 3) Task B also calls btrfs_delayed_refs_rsv_refill(); 4) Task A sees there's a 2M difference between the size and the reserved space of the delayed refs rsv, so it will reserve 2M of space by calling btrfs_reserve_metadata_bytes(); 5) Task B also sees that 2M difference, and like task A, it reserves another 2M of metadata space; 6) Both task A and task B increase the reserved space of block reserve by 2M, by calling btrfs_block_rsv_add_bytes(), so the block reserve ends up with a size of 8M and a reserved space of 10M; 7) As delayed refs are run and their space released, the delayed refs block reserve ends up with a size of 0 and a reserved space of 2M; 8) The fs is unmounted and the warning at btrfs_release_global_block_rsv() is triggered because the size the reserved space is not zero. So fix this by checking if we still need to add space to the delayed refs block reserve after reserving the metadata space, and if we don't, just release that space. Signed-off-by: Filipe Manana --- fs/btrfs/delayed-ref.c | 37 ++++++++++++++++++++++++++++++++++--- 1 file changed, 34 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 6a13cf00218b..1043f66cc130 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -163,6 +163,8 @@ int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info, struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv; u64 limit = btrfs_calc_delayed_ref_bytes(fs_info, 1); u64 num_bytes = 0; + u64 refilled_bytes; + u64 to_free; int ret = -ENOSPC; spin_lock(&block_rsv->lock); @@ -178,9 +180,38 @@ int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info, ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, num_bytes, flush); if (ret) return ret; - btrfs_block_rsv_add_bytes(block_rsv, num_bytes, false); - trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv", - 0, num_bytes, 1); + + /* + * We may have raced with someone else, so check again if we the block + * reserve is still not full and release any excess space. + */ + spin_lock(&block_rsv->lock); + if (block_rsv->reserved < block_rsv->size) { + u64 needed = block_rsv->size - block_rsv->reserved; + + if (num_bytes >= needed) { + block_rsv->reserved += needed; + block_rsv->full = true; + to_free = num_bytes - needed; + refilled_bytes = needed; + } else { + block_rsv->reserved += num_bytes; + to_free = 0; + refilled_bytes = num_bytes; + } + } else { + to_free = num_bytes; + refilled_bytes = 0; + } + spin_unlock(&block_rsv->lock); + + if (to_free > 0) + btrfs_space_info_free_bytes_may_use(fs_info, block_rsv->space_info, + to_free); + + if (refilled_bytes > 0) + trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv", 0, + refilled_bytes, 1); return 0; } From patchwork Fri Sep 8 12:09:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90B1DEE14D8 for ; Fri, 8 Sep 2023 12:09:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243095AbjIHMJe (ORCPT ); Fri, 8 Sep 2023 08:09:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242948AbjIHMJd (ORCPT ); Fri, 8 Sep 2023 08:09:33 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1EE61BC5 for ; Fri, 8 Sep 2023 05:09:29 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C42FEC433C8 for ; Fri, 8 Sep 2023 12:09:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174969; bh=Lr3xvpWvi/Aew01BqNleTBK9qAssO7sJDT5dCoWJSAU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=SRPHhc6eX6eJ3uA+5ku15hlWEHNgI6I+SuXdDW+meL+hibTfMjN378OVGf7Hb7fHq zx1trK9y6UM8O2tbBfxD7OyZBK5KEnLOo0hI4je9SJ7G72Z67RSz1Dei5Tg77hGsZL AJJVo62RJxV2BnUt+tyfSzUCEI0frcfh4ca4KxmXlAdJs9pmv0eFa45n9dVJxM7F54 2qBIEJbWvyutj+Xke3IiS5tK1L7RIiM9NAlXMfSmdF97PtVULev92jZsqNwGhzqnlB yBPxL+M11//1nLtN9dzI4Yc6593ZePnRV9VNWnQciYX0D7QXtTh10JZJQl5synkVax Jx5Mr1Pqze3lg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 02/21] btrfs: prevent transaction block reserve underflow when starting transaction Date: Fri, 8 Sep 2023 13:09:04 +0100 Message-Id: <22af99f197c4926165d4f556546379a81ae8a44f.1694174371.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When starting a transaction, with a non-zero number of items, we reserve metadata space for that number of items and for delayed refs by doing a call to btrfs_block_rsv_add(), with the transaction block reserve passed as the block reserve argument. This reserves metadata space and adds it to the transaction block reserve. Later we migrate the space we reserved for delayed references from the transaction block reserve into the delayed refs block reserve, by calling btrfs_migrate_to_delayed_refs_rsv(). btrfs_migrate_to_delayed_refs_rsv() decrements the number of bytes to migrate from the source block reserve, and this however may result in an underflow in case the space added to the transaction block reserve ended up being used by another task that has not reserved enough space for its own use - examples are tasks doing reflinks or hole punching because they end up calling btrfs_replace_file_extents() -> btrfs_drop_extents() and may need to modify/COW a variable number of leaves/paths, so they keep trying to use space from the transaction block reserve when they need to COW an extent buffer, and may end up trying to use more space then they have reserved (1 unit/path only for removing file extent items). This can be avoided by simply reserving space first without adding it to the transaction block reserve, then add the space for delayed refs to the delayed refs block reserve and finally add the remaining reserved space to the transaction block reserve. This also makes the code a bit shorter and simpler. So just do that. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/delayed-ref.c | 9 +-------- fs/btrfs/delayed-ref.h | 1 - fs/btrfs/transaction.c | 5 +++-- 3 files changed, 4 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 1043f66cc130..9fe4ccca50a0 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -103,24 +103,17 @@ void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans) * Transfer bytes to our delayed refs rsv. * * @fs_info: the filesystem - * @src: source block rsv to transfer from * @num_bytes: number of bytes to transfer * - * This transfers up to the num_bytes amount from the src rsv to the + * This transfers up to the num_bytes amount, previously reserved, to the * delayed_refs_rsv. Any extra bytes are returned to the space info. */ void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info, - struct btrfs_block_rsv *src, u64 num_bytes) { struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv; u64 to_free = 0; - spin_lock(&src->lock); - src->reserved -= num_bytes; - src->size -= num_bytes; - spin_unlock(&src->lock); - spin_lock(&delayed_refs_rsv->lock); if (delayed_refs_rsv->size > delayed_refs_rsv->reserved) { u64 delta = delayed_refs_rsv->size - diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index b8e14b0ba5f1..fd9bf2b709c0 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -407,7 +407,6 @@ void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans); int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info, enum btrfs_reserve_flush_enum flush); void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info, - struct btrfs_block_rsv *src, u64 num_bytes); bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index ab09542f2170..1985ab543ad2 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -625,14 +625,15 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, reloc_reserved = true; } - ret = btrfs_block_rsv_add(fs_info, rsv, num_bytes, flush); + ret = btrfs_reserve_metadata_bytes(fs_info, rsv, num_bytes, flush); if (ret) goto reserve_fail; if (delayed_refs_bytes) { - btrfs_migrate_to_delayed_refs_rsv(fs_info, rsv, + btrfs_migrate_to_delayed_refs_rsv(fs_info, delayed_refs_bytes); num_bytes -= delayed_refs_bytes; } + btrfs_block_rsv_add_bytes(rsv, num_bytes, true); if (rsv->space_info->force_alloc) do_chunk_alloc = true; From patchwork Fri Sep 8 12:09:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377396 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A49B6EE49BF for ; Fri, 8 Sep 2023 12:09:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243138AbjIHMJf (ORCPT ); Fri, 8 Sep 2023 08:09:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242948AbjIHMJe (ORCPT ); Fri, 8 Sep 2023 08:09:34 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AE871BC5 for ; Fri, 8 Sep 2023 05:09:30 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BC2D4C433C9 for ; Fri, 8 Sep 2023 12:09:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174970; bh=ofqceQzX14mxrffpoH28c0AGhxFUN/GHjjygki5N98g=; h=From:To:Subject:Date:In-Reply-To:References:From; b=PVw+8bVkgYx9eWSy8sNShP6Ef+LkQv1UgcMyK8Oanx2Ruk61OxNfll4FRHd7nQCQF XZqyZGh2vwcvCQgFzEBKjR0YZW0QhPNuM8Qe1RX9bIW+bkqDpWy4zliVqNyYhtUSo9 IhK1zZ1Av4DA3nYDD876+VEJN6F0eAZB4HTE/h9tK9brTbTSCP5y7PHm4znOgHsjb6 K1YIQPhhBqzRGthn8IqMn4vFN2GK8+L5jhdBh+1egCp2bUSSrTT2hLvm1GN9iKT2rc wCxn2Hu+W4JXJ4qVs0lt0Y4OaYrZDKC9Ykry6fMwPDhy78q5ke64hPSNtyKIWCsJhW 5GnN7QdmOiKcA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 03/21] btrfs: pass a space_info argument to btrfs_reserve_metadata_bytes() Date: Fri, 8 Sep 2023 13:09:05 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana We are passing a block reserve argument to btrfs_reserve_metadata_bytes() which is not really used, all we need is to pass the space_info associated to the block reserve, we don't change the block reserve at all. Not only it's pointless to pass the block reserve, it's also confusing as one might think that the reserved bytes will end up being added to the passed block reserve, when that's not the case. The pattern for reserving space and adding it to a block reserve is to first reserve space with btrfs_reserve_metadata_bytes() and if that succeeds, then add the space to a block reserve by calling btrfs_block_rsv_add_bytes(). Also the reverse of btrfs_reserve_metadata_bytes(), which is btrfs_space_info_free_bytes_may_use(), takes a space_info argument and not a block reserve, so one more reason to pass a space_info and not a block reserve to btrfs_reserve_metadata_bytes(). So change btrfs_reserve_metadata_bytes() and its callers to pass a space_info argument instead of a block reserve argument. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/block-rsv.c | 12 +++++++----- fs/btrfs/delalloc-space.c | 3 ++- fs/btrfs/delayed-ref.c | 6 +++--- fs/btrfs/space-info.c | 12 +++++------- fs/btrfs/space-info.h | 2 +- fs/btrfs/transaction.c | 3 ++- 6 files changed, 20 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c index 77684c5e0c8b..6ccd91bbff3e 100644 --- a/fs/btrfs/block-rsv.c +++ b/fs/btrfs/block-rsv.c @@ -221,7 +221,8 @@ int btrfs_block_rsv_add(struct btrfs_fs_info *fs_info, if (num_bytes == 0) return 0; - ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, num_bytes, flush); + ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv->space_info, + num_bytes, flush); if (!ret) btrfs_block_rsv_add_bytes(block_rsv, num_bytes, true); @@ -261,7 +262,8 @@ int btrfs_block_rsv_refill(struct btrfs_fs_info *fs_info, if (!ret) return 0; - ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, num_bytes, flush); + ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv->space_info, + num_bytes, flush); if (!ret) { btrfs_block_rsv_add_bytes(block_rsv, num_bytes, false); return 0; @@ -517,8 +519,8 @@ struct btrfs_block_rsv *btrfs_use_block_rsv(struct btrfs_trans_handle *trans, block_rsv->type, ret); } try_reserve: - ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, blocksize, - BTRFS_RESERVE_NO_FLUSH); + ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv->space_info, + blocksize, BTRFS_RESERVE_NO_FLUSH); if (!ret) return block_rsv; /* @@ -539,7 +541,7 @@ struct btrfs_block_rsv *btrfs_use_block_rsv(struct btrfs_trans_handle *trans, * one last time to force a reservation if there's enough actual space * on disk to make the reservation. */ - ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, blocksize, + ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv->space_info, blocksize, BTRFS_RESERVE_FLUSH_EMERGENCY); if (!ret) return block_rsv; diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c index 427abaf608b8..a764db67c033 100644 --- a/fs/btrfs/delalloc-space.c +++ b/fs/btrfs/delalloc-space.c @@ -346,7 +346,8 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes, noflush); if (ret) return ret; - ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, meta_reserve, flush); + ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv->space_info, + meta_reserve, flush); if (ret) { btrfs_qgroup_free_meta_prealloc(root, qgroup_reserve); return ret; diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 9fe4ccca50a0..0991f66471ff 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -154,6 +154,7 @@ int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info, enum btrfs_reserve_flush_enum flush) { struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv; + struct btrfs_space_info *space_info = block_rsv->space_info; u64 limit = btrfs_calc_delayed_ref_bytes(fs_info, 1); u64 num_bytes = 0; u64 refilled_bytes; @@ -170,7 +171,7 @@ int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info, if (!num_bytes) return 0; - ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, num_bytes, flush); + ret = btrfs_reserve_metadata_bytes(fs_info, space_info, num_bytes, flush); if (ret) return ret; @@ -199,8 +200,7 @@ int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info, spin_unlock(&block_rsv->lock); if (to_free > 0) - btrfs_space_info_free_bytes_may_use(fs_info, block_rsv->space_info, - to_free); + btrfs_space_info_free_bytes_may_use(fs_info, space_info, to_free); if (refilled_bytes > 0) trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv", 0, diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index d7e8cd4f140c..ca7c702172fd 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1742,7 +1742,7 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, * Try to reserve metadata bytes from the block_rsv's space. * * @fs_info: the filesystem - * @block_rsv: block_rsv we're allocating for + * @space_info: the space_info we're allocating for * @orig_bytes: number of bytes we want * @flush: whether or not we can flush to make our reservation * @@ -1754,21 +1754,19 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, * space already. */ int btrfs_reserve_metadata_bytes(struct btrfs_fs_info *fs_info, - struct btrfs_block_rsv *block_rsv, + struct btrfs_space_info *space_info, u64 orig_bytes, enum btrfs_reserve_flush_enum flush) { int ret; - ret = __reserve_bytes(fs_info, block_rsv->space_info, orig_bytes, flush); + ret = __reserve_bytes(fs_info, space_info, orig_bytes, flush); if (ret == -ENOSPC) { trace_btrfs_space_reservation(fs_info, "space_info:enospc", - block_rsv->space_info->flags, - orig_bytes, 1); + space_info->flags, orig_bytes, 1); if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) - btrfs_dump_space_info(fs_info, block_rsv->space_info, - orig_bytes, 0); + btrfs_dump_space_info(fs_info, space_info, orig_bytes, 0); } return ret; } diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 0bb9d14e60a8..ef0fac9c70ae 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -212,7 +212,7 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info, struct btrfs_space_info *info, u64 bytes, int dump_block_groups); int btrfs_reserve_metadata_bytes(struct btrfs_fs_info *fs_info, - struct btrfs_block_rsv *block_rsv, + struct btrfs_space_info *space_info, u64 orig_bytes, enum btrfs_reserve_flush_enum flush); void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info, diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1985ab543ad2..ee63d92d66b4 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -625,7 +625,8 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, reloc_reserved = true; } - ret = btrfs_reserve_metadata_bytes(fs_info, rsv, num_bytes, flush); + ret = btrfs_reserve_metadata_bytes(fs_info, rsv->space_info, + num_bytes, flush); if (ret) goto reserve_fail; if (delayed_refs_bytes) { From patchwork Fri Sep 8 12:09:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42C24CCFA0A for ; Fri, 8 Sep 2023 12:09:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243153AbjIHMJg (ORCPT ); Fri, 8 Sep 2023 08:09:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243101AbjIHMJf (ORCPT ); Fri, 8 Sep 2023 08:09:35 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 999C31BE7 for ; Fri, 8 Sep 2023 05:09:31 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B3D95C433C8 for ; Fri, 8 Sep 2023 12:09:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174971; bh=4rSp1bl4srJlcoc8eK6Hkfg6+ySetUte9rVjPTPcvlU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=WMQn3Se54lGig0DdBfeter1quSmSQRqXEhe5QREyxSZwFNi8MPqI4oq2+/Y05jlJ2 Qi+b09LJ40c4j4G2egUYqMfPYkg4wRKC77w06LBjZklHzFdQCn0TRGjx0KktaGyWC8 SXsKZAkYn4Yzjiz7ysmJfwKM3l0L0d0BW7Rg/Hh0agY0jKs7AWgssxjTRP/MSy9x+s 96UkzUycelMMenPrDjJ5LeGXAXnUKioKffQbw0TiKfR8ays9kdr7tHFbOoWAj3DVjf MFI1Mq9OZ3P3OVcmuf7Kkfww87hf2fa7Qruwt64vWa2YG2MBcPQrSI5FhUK8YTpRjW 9lHG8eFqG93/A== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 04/21] btrfs: remove unnecessary logic when running new delayed references Date: Fri, 8 Sep 2023 13:09:06 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When running delayed references, at btrfs_run_delayed_refs(), we have this logic to run any new delayed references that might have been added just after we ran all delayed references. This logic grabs the first delayed reference, then locks it to wait for any contention on it before running all new delayed references. This however is pointless and not necessary because at __btrfs_run_delayed_refs() when we start running delayed references, we pick the first reference with btrfs_obtain_ref_head() and then we will lock it (with btrfs_delayed_ref_lock()). So remove the duplicate and unnecessary logic at btrfs_run_delayed_refs(). Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 07d3896e6824..929fbb620d68 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2135,9 +2135,7 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long count) { struct btrfs_fs_info *fs_info = trans->fs_info; - struct rb_node *node; struct btrfs_delayed_ref_root *delayed_refs; - struct btrfs_delayed_ref_head *head; int ret; int run_all = count == (unsigned long)-1; @@ -2166,25 +2164,16 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, btrfs_create_pending_block_groups(trans); spin_lock(&delayed_refs->lock); - node = rb_first_cached(&delayed_refs->href_root); - if (!node) { + if (RB_EMPTY_ROOT(&delayed_refs->href_root.rb_root)) { spin_unlock(&delayed_refs->lock); - goto out; + return 0; } - head = rb_entry(node, struct btrfs_delayed_ref_head, - href_node); - refcount_inc(&head->refs); spin_unlock(&delayed_refs->lock); - /* Mutex was contended, block until it's released and retry. */ - mutex_lock(&head->mutex); - mutex_unlock(&head->mutex); - - btrfs_put_delayed_ref_head(head); cond_resched(); goto again; } -out: + return 0; } From patchwork Fri Sep 8 12:09:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377398 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02132EE7FE1 for ; Fri, 8 Sep 2023 12:09:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243165AbjIHMJg (ORCPT ); Fri, 8 Sep 2023 08:09:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243164AbjIHMJg (ORCPT ); Fri, 8 Sep 2023 08:09:36 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BF041BC5 for ; Fri, 8 Sep 2023 05:09:32 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB4ADC433CD for ; Fri, 8 Sep 2023 12:09:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174972; bh=g4IN8hTod7NDwJC7H55Jx+a6egC01OH6fqCkgOCIreM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=QFhKxLZ3wcyCYVWAIHgSJjYKzrR6NsNjjK9HzJpBgTsOlZ/xBHUE6R5khtRR5ngCs qAzjVgwQWE6aeNKMBD2bWZv95UY/HKKtZw5FiiynhyxYIISB9SjA+pIxLH+Llm3KgY C8RQQBg1umrMrSjf1bvliWY4E9vPwUYM4hO6ksvePnXzIHjkJPm2Iba2x4SXRxGcnZ /q4KID0iDWxcm8cQLPgIWcLUOCl8vYvybkSXomrP+49rqonakJTLL4YDbSrJ5DF9tm tSER7IzEYPD/ZIF4FGZ1hcLJiOxIVZQlpafNZkkehMduFat5uB38oRQZfVmeeSYA5F nJrhNdsmGstUg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 05/21] btrfs: remove the refcount warning/check at btrfs_put_delayed_ref() Date: Fri, 8 Sep 2023 13:09:07 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At btrfs_put_delayed_ref(), it's pointless to have a WARN_ON() to check if the refcount of the delayed ref is zero. Such check is already done by the refcount_t module and refcount_dec_and_test(), which loudly complains if we try to decrement a reference count that is currently 0. The WARN_ON() dates back to the time when used a regular atomic_t type for the reference counter, before we switched to the refcount_t type. The main goal of the refcount_t type/module is precisely to catch such types of bugs and loudly complain if they happen. This also reduces a bit the module's text size. Before this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1612483 167145 16864 1796492 1b698c fs/btrfs/btrfs.ko After this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1612371 167073 16864 1796308 1b68d4 fs/btrfs/btrfs.ko Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/delayed-ref.h | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index fd9bf2b709c0..46a1421cd99d 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -338,7 +338,6 @@ btrfs_free_delayed_extent_op(struct btrfs_delayed_extent_op *op) static inline void btrfs_put_delayed_ref(struct btrfs_delayed_ref_node *ref) { - WARN_ON(refcount_read(&ref->refs) == 0); if (refcount_dec_and_test(&ref->refs)) { WARN_ON(!RB_EMPTY_NODE(&ref->ref_node)); switch (ref->type) { From patchwork Fri Sep 8 12:09:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D094EEE7FE2 for ; Fri, 8 Sep 2023 12:09:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243188AbjIHMJh (ORCPT ); Fri, 8 Sep 2023 08:09:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243164AbjIHMJg (ORCPT ); Fri, 8 Sep 2023 08:09:36 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F4571BE7 for ; Fri, 8 Sep 2023 05:09:33 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A1E3BC433C8 for ; Fri, 8 Sep 2023 12:09:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174973; bh=GTYnnmyLnu0IU8cRjayrFZxrRQzQmEQx7o/wt+0N+iM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=oooNllZzAf8oy2szWj7WeIjhQ1x7CWEjy2XUH3WdgneOLkL9fQyKpv7jNzZJAhCTa ouUJc9XHXlALrQNSGb3QTVInKg4iOlX6p3412e8K16RpII36Q1J7gosATVRf/JE0t2 ifqygkHYstb78vwDaY/7JNL4qDpJoVmb9+KUZH0L8NRhXS5rQKIi8JBNrW6mxcAQFU LKaI050MorlOBjyo8hX32+25x0svslq+xvPJxJDj/bA8K4u8OsMlL3F/ajfUnjvA2q rww1U9NUNnBCw6ul2SiFtOJZLri23pEWHU3VuwnGTgnDW6KCtEdB9gaXyP6X/VYg4h VHabEeD+qarCQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 06/21] btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1 Date: Fri, 8 Sep 2023 13:09:08 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When running a delayed tree reference, if we find a ref count different from 1, we return -EIO. This isn't an IO error, as it indicates either a bug in the delayed refs code or a memory corruption, so change the error code from -EIO to -EUCLEAN. Also tag the branch as 'unlikely' as this is not expected to ever happen, and change the error message to print the tree block's bytenr without the parenthesis (and there was a missing space between the 'block' word and the opening parenthesis), for consistency as that's the style we used everywhere else. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 929fbb620d68..8fca9c2b8917 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1699,12 +1699,12 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, parent = ref->parent; ref_root = ref->root; - if (node->ref_mod != 1) { + if (unlikely(node->ref_mod != 1)) { btrfs_err(trans->fs_info, - "btree block(%llu) has %d references rather than 1: action %d ref_root %llu parent %llu", + "btree block %llu has %d references rather than 1: action %d ref_root %llu parent %llu", node->bytenr, node->ref_mod, node->action, ref_root, parent); - return -EIO; + return -EUCLEAN; } if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) { BUG_ON(!extent_op || !extent_op->update_flags); From patchwork Fri Sep 8 12:09:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377400 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EA3AEE7FE0 for ; Fri, 8 Sep 2023 12:09:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243190AbjIHMJi (ORCPT ); Fri, 8 Sep 2023 08:09:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243164AbjIHMJh (ORCPT ); Fri, 8 Sep 2023 08:09:37 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 777AA1BC5 for ; Fri, 8 Sep 2023 05:09:34 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 98D1DC433C7 for ; Fri, 8 Sep 2023 12:09:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174974; bh=0GMofqGbIZJ8DFuHdu7AWlye8IftUAHMKFaODoGNy/A=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ZP75v1cU/jKq9MdJ6TRonmHw4BsdM244maLjG98icZujP9Q8gyLv9JyYgFOrveTh3 qgVbuc7YGXlu0kg74pvdeBokXOZRv/CY+pDDuEIe+CANixlrocWDna6IrfgW6IyPjC skzDrrvUPBUaR4vTIzcskuuNB3xeho7Iid6fvntHT6Arl4twxDWH1CKKmM5AJA0E4G XsbXw8XnUb5tYzsC4RJyFJi61JqOFcIU9fuYK54d3BVQVE/yZDDG9EN8PRobiazcKM fbi/6FuLV5xJ9aDuwp6q3ZFdTTeJY1ZVagC7VxYmNEl8K9uyD5W7/lltFa82TIdaAc DUOune7y9AN6w== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 07/21] btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref() Date: Fri, 8 Sep 2023 13:09:09 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At __btrfs_inc_extent_ref() we are doing a BUG_ON() if we are dealing with a tree block reference that has a reference count that is different from 1, but we have already dealt with this case at run_delayed_tree_ref(), making it useless. So remove the BUG_ON(). Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8fca9c2b8917..cf503f2972a1 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1514,15 +1514,14 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, btrfs_release_path(path); /* now insert the actual backref */ - if (owner < BTRFS_FIRST_FREE_OBJECTID) { - BUG_ON(refs_to_add != 1); + if (owner < BTRFS_FIRST_FREE_OBJECTID) ret = insert_tree_block_ref(trans, path, bytenr, parent, root_objectid); - } else { + else ret = insert_extent_data_ref(trans, path, bytenr, parent, root_objectid, owner, offset, refs_to_add); - } + if (ret) btrfs_abort_transaction(trans, ret); out: From patchwork Fri Sep 8 12:09:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79208EE7FE3 for ; Fri, 8 Sep 2023 12:09:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243199AbjIHMJj (ORCPT ); Fri, 8 Sep 2023 08:09:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243164AbjIHMJj (ORCPT ); Fri, 8 Sep 2023 08:09:39 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CE991BC5 for ; Fri, 8 Sep 2023 05:09:35 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 90F2BC433C8 for ; Fri, 8 Sep 2023 12:09:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174975; bh=aiwc3+PamXF4i1nUGTHtwy2uw159CpOb9gZSGpC1mYI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=QykUCIOZof5DfBHRds3FluXfT6/LEneyZtBW33xbxZkKT6xBgFX8eyo0Rhtn77eZG x0WocXtQcXmZt5y8kImb4I541FCUhvnBcw/n8S5fRjPvs2Dbfge0we+JczWcUd8NnU 6lVH9KHZXOS0PVigXCCiWoVFt5IycFngYi/yVXnqB9hf0uycQKgLxAzjYUfnX/OBP7 yPFdEcESNcrZEV5o52n/O88xItnCnSXHMTkYA0xSCut9XVHNxJcle2u3gjRO1JcCWk HhjrYLaAp91iPoFNYl9thQMz/4JwOnzIWTa5HEi2B1EEwSCI2zhUxL7N0VjMijGMPe Kqt6gIXHDMo5Q== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 08/21] btrfs: remove refs_to_add argument from __btrfs_inc_extent_ref() Date: Fri, 8 Sep 2023 13:09:10 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Currently the 'refs_to_add' argument of __btrfs_inc_extent_ref() always matches the value of node->ref_mod, so remove the argument and use node->ref_mod at __btrfs_inc_extent_ref(). Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index cf503f2972a1..16e511f3d24b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1465,8 +1465,6 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, * always passed as 0. For data extents it is the fileoffset * this extent belongs to. * - * @refs_to_add Number of references to add - * * @extent_op Pointer to a structure, holding information necessary when * updating a tree block's flags * @@ -1474,7 +1472,7 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_node *node, u64 parent, u64 root_objectid, - u64 owner, u64 offset, int refs_to_add, + u64 owner, u64 offset, struct btrfs_delayed_extent_op *extent_op) { struct btrfs_path *path; @@ -1484,6 +1482,7 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, u64 bytenr = node->bytenr; u64 num_bytes = node->num_bytes; u64 refs; + int refs_to_add = node->ref_mod; int ret; path = btrfs_alloc_path(); @@ -1562,7 +1561,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, } else if (node->action == BTRFS_ADD_DELAYED_REF) { ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root, ref->objectid, ref->offset, - node->ref_mod, extent_op); + extent_op); } else if (node->action == BTRFS_DROP_DELAYED_REF) { ret = __btrfs_free_extent(trans, node, parent, ref_root, ref->objectid, @@ -1710,7 +1709,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, ret = alloc_reserved_tree_block(trans, node, extent_op); } else if (node->action == BTRFS_ADD_DELAYED_REF) { ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root, - ref->level, 0, 1, extent_op); + ref->level, 0, extent_op); } else if (node->action == BTRFS_DROP_DELAYED_REF) { ret = __btrfs_free_extent(trans, node, parent, ref_root, ref->level, 0, 1, extent_op); From patchwork Fri Sep 8 12:09:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377402 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF72DEE7FE0 for ; Fri, 8 Sep 2023 12:09:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243202AbjIHMJk (ORCPT ); Fri, 8 Sep 2023 08:09:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243164AbjIHMJk (ORCPT ); Fri, 8 Sep 2023 08:09:40 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67E091BE7 for ; Fri, 8 Sep 2023 05:09:36 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 88232C433C7 for ; Fri, 8 Sep 2023 12:09:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174976; bh=2aHNAZsssZgCQJ8ExnofXwfQibQSzTru6TtpRQDdE18=; h=From:To:Subject:Date:In-Reply-To:References:From; b=RFNgqhMio4b2btcXIJwV4qKJ+z4ME6PM5DoQ9NchprOym/7A/YyMmkbcrmLqVHaHl 75VK8roOzPfCVF9H1wiZ+5zcG8axz2NjbGamo1F8K2weGZAznnGUS0mSlR3x0cHj8H 5QOW/6hPcK3NqJyEwhVhym2BttdYQnIsUqR0Acw58aq0hZs+ddFNdHxZxYifAoNcYB adQ0R4tHD7Ncq4JppdlXhwHoz+2YJsEE1oLTYQtd8RhtUbS6wksDfV59VlF1i66FW4 wCPLVwTY++KK/hgz09eTie2gQXvID8gzwBIxFYTX1z87xPeLupwsf/pWrzLiG2+rXB 3tyv5vjPAkNcQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 09/21] btrfs: remove refs_to_drop argument from __btrfs_free_extent() Date: Fri, 8 Sep 2023 13:09:11 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Currently the 'refs_to_drop' argument of __btrfs_free_extent() always matches the value of node->ref_mod, so remove the argument and use node->ref_mod at __btrfs_free_extent(). Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 16e511f3d24b..de63e873be3a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -49,7 +49,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_node *node, u64 parent, u64 root_objectid, u64 owner_objectid, - u64 owner_offset, int refs_to_drop, + u64 owner_offset, struct btrfs_delayed_extent_op *extra_op); static void __run_delayed_extent_op(struct btrfs_delayed_extent_op *extent_op, struct extent_buffer *leaf, @@ -1565,8 +1565,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, } else if (node->action == BTRFS_DROP_DELAYED_REF) { ret = __btrfs_free_extent(trans, node, parent, ref_root, ref->objectid, - ref->offset, node->ref_mod, - extent_op); + ref->offset, extent_op); } else { BUG(); } @@ -1712,7 +1711,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, ref->level, 0, extent_op); } else if (node->action == BTRFS_DROP_DELAYED_REF) { ret = __btrfs_free_extent(trans, node, parent, ref_root, - ref->level, 0, 1, extent_op); + ref->level, 0, extent_op); } else { BUG(); } @@ -2927,7 +2926,7 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans, static int __btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_node *node, u64 parent, u64 root_objectid, u64 owner_objectid, - u64 owner_offset, int refs_to_drop, + u64 owner_offset, struct btrfs_delayed_extent_op *extent_op) { struct btrfs_fs_info *info = trans->fs_info; @@ -2942,6 +2941,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, int extent_slot = 0; int found_extent = 0; int num_to_del = 1; + int refs_to_drop = node->ref_mod; u32 item_size; u64 refs; u64 bytenr = node->bytenr; From patchwork Fri Sep 8 12:09:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05BC1EE7FE1 for ; Fri, 8 Sep 2023 12:09:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243207AbjIHMJm (ORCPT ); Fri, 8 Sep 2023 08:09:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243205AbjIHMJl (ORCPT ); Fri, 8 Sep 2023 08:09:41 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E5821BC5 for ; Fri, 8 Sep 2023 05:09:37 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 82462C433CA for ; Fri, 8 Sep 2023 12:09:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174977; bh=CS7FgnFp/TxGQmnTgzs1JupmjaJP8GemUnGDTXUwpyc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=kUKtTpLqseSAO8vPGuX6L0Zar60fHM+kujt03o9KUbMCEtaOcXs+0ajidyEBHR8Q8 d9yfP1T/izeeM6LTiA9bFH3Sy/J6UIQBTgKpyzEdGglfXU/BuZbHM5HVO7auaK47Ty B5qfDJb8rUmWJGCryyRao83d1TsRsC830ZKCCrQczAnOK0sE+baAdi7rwF6tXYkKL5 NQYSptbk4b/mvXN6C7HJsvICWHozU5FvhTVTjiP7ZcFD1xJrI3+uySjK07G1R1m7/X a3xjosNWHX1M0NcES8IFfdaYwnVjJmq7CXaMYgEek44bAD69pP292qR6LK8AWdo4iZ IsajaFYEMm5QA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 10/21] btrfs: initialize key where it's used when running delayed data ref Date: Fri, 8 Sep 2023 13:09:12 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At run_delayed_data_ref() we are always initializing a key but the key is only needed and used if we are inserting a new extent. So move the declaration and initialization of the key to 'if' branch where it's used. Also rename the key from 'ins' to 'key', as it's a more clear name. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index de63e873be3a..16c7122bdfb5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1535,15 +1535,10 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, { int ret = 0; struct btrfs_delayed_data_ref *ref; - struct btrfs_key ins; u64 parent = 0; u64 ref_root = 0; u64 flags = 0; - ins.objectid = node->bytenr; - ins.offset = node->num_bytes; - ins.type = BTRFS_EXTENT_ITEM_KEY; - ref = btrfs_delayed_node_to_data_ref(node); trace_run_delayed_data_ref(trans->fs_info, node, ref, node->action); @@ -1552,11 +1547,18 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, ref_root = ref->root; if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) { + struct btrfs_key key; + if (extent_op) flags |= extent_op->flags_to_set; + + key.objectid = node->bytenr; + key.type = BTRFS_EXTENT_ITEM_KEY; + key.offset = node->num_bytes; + ret = alloc_reserved_file_extent(trans, parent, ref_root, flags, ref->objectid, - ref->offset, &ins, + ref->offset, &key, node->ref_mod); } else if (node->action == BTRFS_ADD_DELAYED_REF) { ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root, From patchwork Fri Sep 8 12:09:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377404 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B886EE7FE2 for ; Fri, 8 Sep 2023 12:09:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243209AbjIHMJm (ORCPT ); Fri, 8 Sep 2023 08:09:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239833AbjIHMJl (ORCPT ); Fri, 8 Sep 2023 08:09:41 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 595911BE7 for ; Fri, 8 Sep 2023 05:09:38 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7AF08C433C8 for ; Fri, 8 Sep 2023 12:09:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174978; bh=PAP3FOJoTPSe/qm3pOdT9zqiWK6FT6FFWFqv4QY3dIw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Ypl8Mk1ZFKPRVzr97ls7jiKupK3eejo1eOZ2GGv8+wynmmB4GDR6pHLsiwsO8r/J0 T9ufgrIg1XM3Vl/ZLouqAQcyTly0SuHmUwIRysq0/Lqv7wjOelO5NHYcyYPPSuebHP LcsPSNod0FVT+XIXgUrmjXz8vdFz6PF3NMQy4N4BvLD6Y44APSlcNorEKyp0cVgoW8 fQQQXsnzLiLngRXFJyRhxiX8xwyN5Dpq8zpTgKbrrrePywSLij1fmNb0Jf7+ItXier wijp2+Lll5m5NNRt6lEMslBCHCBXuF9VgEOyDVXUptcyhr3GsH+CGMhPmoA9/las0I tgKH8MPCFEFxg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 11/21] btrfs: remove pointless 'ref_root' variable from run_delayed_data_ref() Date: Fri, 8 Sep 2023 13:09:13 +0100 Message-Id: <0f7ec851de1a7913b102ea39723fa3cf636cf6b0.1694174371.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana The 'ref_root' variable, at run_delayed_data_ref(), is not really needed as we can always use ref->root directly, plus its initialization to 0 is completely pointless as we assign it ref->root before its first use. So just drop that variable and use ref->root directly. This may help avoid some warnings with clang tools such as the one reported/fixed by commit 966de47ff0c9 ("btrfs: remove redundant initialization of variables in log_new_ancestors"). Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 16c7122bdfb5..21049609c5fc 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1536,7 +1536,6 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, int ret = 0; struct btrfs_delayed_data_ref *ref; u64 parent = 0; - u64 ref_root = 0; u64 flags = 0; ref = btrfs_delayed_node_to_data_ref(node); @@ -1544,7 +1543,6 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, if (node->type == BTRFS_SHARED_DATA_REF_KEY) parent = ref->parent; - ref_root = ref->root; if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) { struct btrfs_key key; @@ -1556,17 +1554,17 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, key.type = BTRFS_EXTENT_ITEM_KEY; key.offset = node->num_bytes; - ret = alloc_reserved_file_extent(trans, parent, ref_root, + ret = alloc_reserved_file_extent(trans, parent, ref->root, flags, ref->objectid, ref->offset, &key, node->ref_mod); } else if (node->action == BTRFS_ADD_DELAYED_REF) { - ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root, + ret = __btrfs_inc_extent_ref(trans, node, parent, ref->root, ref->objectid, ref->offset, extent_op); } else if (node->action == BTRFS_DROP_DELAYED_REF) { ret = __btrfs_free_extent(trans, node, parent, - ref_root, ref->objectid, + ref->root, ref->objectid, ref->offset, extent_op); } else { BUG(); From patchwork Fri Sep 8 12:09:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16B0FEE7FE0 for ; Fri, 8 Sep 2023 12:09:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240395AbjIHMJn (ORCPT ); Fri, 8 Sep 2023 08:09:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243212AbjIHMJm (ORCPT ); Fri, 8 Sep 2023 08:09:42 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 501381BC5 for ; Fri, 8 Sep 2023 05:09:39 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 72115C433CA for ; Fri, 8 Sep 2023 12:09:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174979; bh=tndu4mmAhqsjkELfVkNsN8ukn7LnhQEO0iQpYV/e/1o=; h=From:To:Subject:Date:In-Reply-To:References:From; b=NjHldNqU1L2IbJ4aLgFLzAAsNGqQRhGQB/bUJavfe+VjdAAUYAZX7hMf/jOyDMyfs 0BJx9oRlEgFOLSPYjGRX5Vy6gx7KFjpwaMmU0qVDrXdpk3D1OvFEOOw4m709T2jnTl arfJa/x75YQlrDV9znzKI9Ry6txt9TLZuq0iDoNEt7uBxjvp4UdreXxMmEjnoKtRmi f7wnTbMZZz1uY6Dqm7tybBEVMmP4hAdrEmLTpo1ku5oJSZ06F459vbNODzPii5OCbt wBMKQfqambYL2dmycatny2s5MfpaJb+lm4I++4WGOaVK6ErPD6FuEwky676P/s0twH bMITL5XyAE15A== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 12/21] btrfs: log message if extent item not found when running delayed extent op Date: Fri, 8 Sep 2023 13:09:14 +0100 Message-Id: <5c0f10fff0bb9e0bbd0368069d965d8e4ea0cb1e.1694174371.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When running a delayed extent operation, if we don't find the extent item in the extent tree we just return -EIO without any logged message. This indicates some bug or possibly a memory or fs corruption, so the return value should not be -EIO but -EUCLEAN instead, and since it's not expected to ever happen, print an informative error message so that if it happens we have some idea of what went wrong, where to look at. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 21049609c5fc..167d0438da6e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1653,7 +1653,10 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans, goto again; } } else { - err = -EIO; + err = -EUCLEAN; + btrfs_err(fs_info, + "missing extent item for extent %llu num_bytes %llu level %d", + head->bytenr, head->num_bytes, extent_op->level); goto out; } } From patchwork Fri Sep 8 12:09:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377406 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 280A0EE7FE4 for ; Fri, 8 Sep 2023 12:09:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241813AbjIHMJo (ORCPT ); Fri, 8 Sep 2023 08:09:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243030AbjIHMJn (ORCPT ); Fri, 8 Sep 2023 08:09:43 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 465111BC5 for ; Fri, 8 Sep 2023 05:09:40 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 69B88C433C9 for ; Fri, 8 Sep 2023 12:09:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174979; bh=xmDP0/6BM+GvZHuSdZJs+KsIWF9EuFMRNX8EQHhiEI8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=XE/Ms1IEO4UBUuN0MWNq+dBqRkyvLvBZm39EPoPoxXOvsO5XBsGJjpfHL6LU6ogmJ OMAnLFUIX/7uo4HZmpfWfde2xeTeyNXC6ZfcqFMRzfWXWITyL2XTFB0RVG0TLvER5V 66Xg8nQjq0Fl+EPj/DQaicS7ajc499XrsC+ib78TDX1mtCci1ebrig8H+6E8iY7HN1 FMpLCKC2ssWJnT+GXv03TUfS8Fuhnm9O1lCLtwzgfDYjNSqs2rqGq3jzI5W2umUWfx g/kp2NIg53EHwehN56v+Jun/kaF6K0J3MC8z586TIxYFE4UrRgdQrd5jwHW6i30rpm ZwiPyu/SAqAGg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 13/21] btrfs: use a single variable for return value at run_delayed_extent_op() Date: Fri, 8 Sep 2023 13:09:15 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Instead of using a 'ret' and an 'err' variable at run_delayed_extent_op() for tracking the return value, use a single one ('ret'). This simplifies the code, makes it comply with most of the existing code and it's less prone for logic errors as time has proven over and over in the btrfs code. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 167d0438da6e..e33c4b393922 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1602,7 +1602,6 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans, struct extent_buffer *leaf; u32 item_size; int ret; - int err = 0; int metadata = 1; if (TRANS_ABORTED(trans)) @@ -1629,10 +1628,8 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans, again: ret = btrfs_search_slot(trans, root, &key, path, 0, 1); if (ret < 0) { - err = ret; goto out; - } - if (ret > 0) { + } else if (ret > 0) { if (metadata) { if (path->slots[0] > 0) { path->slots[0]--; @@ -1653,7 +1650,7 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans, goto again; } } else { - err = -EUCLEAN; + ret = -EUCLEAN; btrfs_err(fs_info, "missing extent item for extent %llu num_bytes %llu level %d", head->bytenr, head->num_bytes, extent_op->level); @@ -1665,11 +1662,11 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans, item_size = btrfs_item_size(leaf, path->slots[0]); if (unlikely(item_size < sizeof(*ei))) { - err = -EUCLEAN; + ret = -EUCLEAN; btrfs_err(fs_info, "unexpected extent item size, has %u expect >= %zu", item_size, sizeof(*ei)); - btrfs_abort_transaction(trans, err); + btrfs_abort_transaction(trans, ret); goto out; } @@ -1679,7 +1676,7 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans, btrfs_mark_buffer_dirty(leaf); out: btrfs_free_path(path); - return err; + return ret; } static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, From patchwork Fri Sep 8 12:09:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68CF3EE7FE2 for ; Fri, 8 Sep 2023 12:09:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242826AbjIHMJp (ORCPT ); Fri, 8 Sep 2023 08:09:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243030AbjIHMJo (ORCPT ); Fri, 8 Sep 2023 08:09:44 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F9931BC5 for ; Fri, 8 Sep 2023 05:09:41 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6121EC433C8 for ; Fri, 8 Sep 2023 12:09:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174980; bh=5jALY8Tyarr/TIo+Gi/M/dyGvLBjNfwlSVD+PfqjjqU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=OzVRYUJHv0FpQcaud5j2Ugpo08X/4SBdE7cqw6vJJvP8Ava6ebmxZnW9P/cJLO3dK DmJaKyxLvPfpYBrqpcD35kcGJp6HXN/YQwwwzHGXRxuoYaXg90f/xdbC85KjObYSRx kGCq06KpLRi2bG6ddZrTNBAJKluAAOoTIn3uv4jsnwAls5g7DQtjI/YsbEsEJWopR+ l8+CJ2AjDhxHRfUGQzzoCSJkbCcvCxfIv/kCy67XBvGzsYg4NmDnDoYzCeUIsPxMVG tdsXNJLJhWKCNTvrRg8L254xst4ntZtFuFRZOag0CKWoqclg7uOdqt8UaR1PoabcP7 ZcRUxyxrbZTlg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 14/21] btrfs: use a single variable for return value at lookup_inline_extent_backref() Date: Fri, 8 Sep 2023 13:09:16 +0100 Message-Id: <6a03f43747c9126f6162886e3bb7934fd480ae48.1694174371.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At lookup_inline_extent_backref(), instead of using a 'ret' and an 'err' variable for tracking the return value, use a single one ('ret'). This simplifies the code, makes it comply with most of the existing code and it's less prone for logic errors as time has proven over and over in the btrfs code. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 33 +++++++++++++++------------------ 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e33c4b393922..e57061106860 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -789,7 +789,6 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, int type; int want; int ret; - int err = 0; bool skinny_metadata = btrfs_fs_incompat(fs_info, SKINNY_METADATA); int needed; @@ -816,10 +815,8 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, again: ret = btrfs_search_slot(trans, root, &key, path, extra_size, 1); - if (ret < 0) { - err = ret; + if (ret < 0) goto out; - } /* * We may be a newly converted file system which still has the old fat @@ -846,7 +843,7 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, } if (ret && !insert) { - err = -ENOENT; + ret = -ENOENT; goto out; } else if (WARN_ON(ret)) { btrfs_print_leaf(path->nodes[0]); @@ -854,18 +851,18 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu", bytenr, num_bytes, parent, root_objectid, owner, offset); - err = -EIO; + ret = -EIO; goto out; } leaf = path->nodes[0]; item_size = btrfs_item_size(leaf, path->slots[0]); if (unlikely(item_size < sizeof(*ei))) { - err = -EUCLEAN; + ret = -EUCLEAN; btrfs_err(fs_info, "unexpected extent item size, has %llu expect >= %zu", item_size, sizeof(*ei)); - btrfs_abort_transaction(trans, err); + btrfs_abort_transaction(trans, ret); goto out; } @@ -885,11 +882,11 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, else needed = BTRFS_REF_TYPE_BLOCK; - err = -ENOENT; + ret = -ENOENT; while (1) { if (ptr >= end) { if (ptr > end) { - err = -EUCLEAN; + ret = -EUCLEAN; btrfs_print_leaf(path->nodes[0]); btrfs_crit(fs_info, "overrun extent record at slot %d while looking for inline extent for root %llu owner %llu offset %llu parent %llu", @@ -900,7 +897,7 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, iref = (struct btrfs_extent_inline_ref *)ptr; type = btrfs_get_extent_inline_ref_type(leaf, iref, needed); if (type == BTRFS_REF_TYPE_INVALID) { - err = -EUCLEAN; + ret = -EUCLEAN; goto out; } @@ -916,7 +913,7 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, dref = (struct btrfs_extent_data_ref *)(&iref->offset); if (match_extent_data_ref(leaf, dref, root_objectid, owner, offset)) { - err = 0; + ret = 0; break; } if (hash_extent_data_ref_item(leaf, dref) < @@ -927,14 +924,14 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, ref_offset = btrfs_extent_inline_ref_offset(leaf, iref); if (parent > 0) { if (parent == ref_offset) { - err = 0; + ret = 0; break; } if (ref_offset < parent) break; } else { if (root_objectid == ref_offset) { - err = 0; + ret = 0; break; } if (ref_offset < root_objectid) @@ -943,10 +940,10 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, } ptr += btrfs_extent_inline_ref_size(type); } - if (err == -ENOENT && insert) { + if (ret == -ENOENT && insert) { if (item_size + extra_size >= BTRFS_MAX_EXTENT_ITEM_SIZE(root)) { - err = -EAGAIN; + ret = -EAGAIN; goto out; } /* @@ -958,7 +955,7 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, if (find_next_key(path, 0, &key) == 0 && key.objectid == bytenr && key.type < BTRFS_BLOCK_GROUP_ITEM_KEY) { - err = -EAGAIN; + ret = -EAGAIN; goto out; } } @@ -969,7 +966,7 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, path->search_for_extension = 0; btrfs_unlock_up_safe(path, 1); } - return err; + return ret; } /* From patchwork Fri Sep 8 12:09:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377408 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0115DEE7FE1 for ; Fri, 8 Sep 2023 12:09:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243164AbjIHMJq (ORCPT ); Fri, 8 Sep 2023 08:09:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242989AbjIHMJp (ORCPT ); Fri, 8 Sep 2023 08:09:45 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37C081BC5 for ; Fri, 8 Sep 2023 05:09:42 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 58454C433CA for ; Fri, 8 Sep 2023 12:09:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174981; bh=h9dIxJBzVpiFuPuXVPpGEbIiYP//2Olu5rK4Tnd3X40=; h=From:To:Subject:Date:In-Reply-To:References:From; b=bI+fnJdEa5xyOLj6mWZDditz5Tq7XBVbdSD+K8jLKy08zyEGa2xYCyXhKStMYcY7j ilL4eGeir5nV7CLB+frU9pH3nfdXPoyyxbUWXpM6kACX6KxJvnyZAj2kjnJo8sRtea iUHArLYIiAkxVNt15YNfM3y/BeRtATsixcWE8J6nE891FfpPSbvyVprZV/kT4SUwPw 3eut0GiqdyY1pUk7HTjv0IOkxRt6q9mV5vw7rs7BgtTKYbDpUjO9CvBJshB0+lD3XD CNDJlbddcFm/I4+XPZp3G0THwMGrhTF7DxjxzZaD789LVbwwUrUSSH1ucIRdssnrwL aHL0GxL8fKBXw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 15/21] btrfs: return -EUCLEAN if extent item is missing when searching inline backref Date: Fri, 8 Sep 2023 13:09:17 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At lookup_inline_extent_backref() when trying to insert an inline backref, if we don't find the extent item we log an error and then return -EIO. This error code is confusing because there was actually no IO error, and this means we have some corruption, either caused by a bug or something like a memory bitflip for example. So change the error code from -EIO to -EUCLEAN. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e57061106860..756589195ed7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -851,7 +851,7 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu", bytenr, num_bytes, parent, root_objectid, owner, offset); - ret = -EIO; + ret = -EUCLEAN; goto out; } From patchwork Fri Sep 8 12:09:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FD67EE7FE3 for ; Fri, 8 Sep 2023 12:09:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241557AbjIHMJr (ORCPT ); Fri, 8 Sep 2023 08:09:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238937AbjIHMJq (ORCPT ); Fri, 8 Sep 2023 08:09:46 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53EDC1BC5 for ; Fri, 8 Sep 2023 05:09:43 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4F640C433C8 for ; Fri, 8 Sep 2023 12:09:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174982; bh=WpLaJ0PmwGZxfUWa6spM04Hfy13Dwhu2fQCqMTAcM3I=; h=From:To:Subject:Date:In-Reply-To:References:From; b=q7JQrQlbokLgJJ9P3S38qeAg8xSWFwVXlTxBdpkERG/qBRN7SzdAGD4fXzfSLUjfu cZrrVnJ8iYyLDPZQLF7oFuQbP9J1VXE0TPxFXyJkF5tE7Rgzn4YPTw+YP0Xj5t2w9B WMhQ/FT2YpquPERxM9Fpo/62EBAhNesmy7ihe0K3vpIlQ70FdJRvPhhgwU6Ofv8w0L o4V1b4CzkAizI8SgC/F50sR10WFafVgepYa9Eh6dtKMcyZUYtftt9wBF3Z8RbBem3w vL/C11R2V3GWc9cUzJW0bb5Dbj794i+NJ+1D+iVZBTmgvRcc/ZVFMjAMtuSwuxe8ah Sjs6MnHr2EVpg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 16/21] btrfs: simplify check for extent item overrun at lookup_inline_extent_backref() Date: Fri, 8 Sep 2023 13:09:18 +0100 Message-Id: <1e6e5a365ce01567b0334ac2792c8e8aa1fe1a64.1694174371.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At lookup_inline_extent_backref() we can simplify the check for an overrun of the extent item by making the while loop's condition to be "ptr < end" and then check after the loop if an overrun happened ("ptr > end"). This reduces indentation and makes the loop condition more clear. So move the check out of the loop and change the loop condition accordingly, while also adding the 'unlikely' tag to the check since it's not supposed to be triggered. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 756589195ed7..b27e0a6878b3 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -883,17 +883,7 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, needed = BTRFS_REF_TYPE_BLOCK; ret = -ENOENT; - while (1) { - if (ptr >= end) { - if (ptr > end) { - ret = -EUCLEAN; - btrfs_print_leaf(path->nodes[0]); - btrfs_crit(fs_info, -"overrun extent record at slot %d while looking for inline extent for root %llu owner %llu offset %llu parent %llu", - path->slots[0], root_objectid, owner, offset, parent); - } - break; - } + while (ptr < end) { iref = (struct btrfs_extent_inline_ref *)ptr; type = btrfs_get_extent_inline_ref_type(leaf, iref, needed); if (type == BTRFS_REF_TYPE_INVALID) { @@ -940,6 +930,16 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, } ptr += btrfs_extent_inline_ref_size(type); } + + if (unlikely(ptr > end)) { + ret = -EUCLEAN; + btrfs_print_leaf(path->nodes[0]); + btrfs_crit(fs_info, +"overrun extent record at slot %d while looking for inline extent for root %llu owner %llu offset %llu parent %llu", + path->slots[0], root_objectid, owner, offset, parent); + goto out; + } + if (ret == -ENOENT && insert) { if (item_size + extra_size >= BTRFS_MAX_EXTENT_ITEM_SIZE(root)) { From patchwork Fri Sep 8 12:09:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377410 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C517EE7FE0 for ; Fri, 8 Sep 2023 12:09:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238937AbjIHMJt (ORCPT ); Fri, 8 Sep 2023 08:09:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230414AbjIHMJs (ORCPT ); Fri, 8 Sep 2023 08:09:48 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 246EA1BE7 for ; Fri, 8 Sep 2023 05:09:44 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 479F5C433CA for ; Fri, 8 Sep 2023 12:09:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174983; bh=FDiIk0ikEEX2d8BXdtmlorI+t2fVrTA4Kg0eWaOdHuM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=g6cWIj1onTv/rJYpfpCHKM4TmG3XgeJ8UQ8ql1zPd1Y51CMiHR5/HMgBF8A5xqJsA J0Hkzqb1bvmVl9kx6A00wQ3BhucyMhDXL4dCHnTbRk8av99dzIOS+UmHkk/KHUanDh Y+faipgMmDLUWn53IEj1HuG1P8GU4vU+azRzHVVz0cUMukfgCRe8EE359MWmfdk4q8 TBgkHVZjSIFCV1QQO9ReEXBkJZshyrRK+bV4YuR08K+Vi6LGkxwyM1J7FMWrGEi1bh GwR78VVxhAI5o6/MyBkLbKjdrOZy/kYWvdLM/bgQ4zmj4zCWYMLe6bUCKXE6hp6K6U PlzNl1U3/F57Q== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 17/21] btrfs: allow to run delayed refs by bytes to be released instead of count Date: Fri, 8 Sep 2023 13:09:19 +0100 Message-Id: <5df582d8f2fadb2eb0baab70fe977fc2d3934ddc.1694174371.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When running delayed references, through btrfs_run_delayed_refs(), we can specify how many to run, run all existing delayed references and keep running delayed references while we can find any. This is controlled with the value of the 'count' argument, where a value of 0 means to run all delayed references that exist by the time btrfs_run_delayed_refs() is called, (unsigned long)-1 means to keep running delayed references while we are able find any, and any other value to run that exact number of delayed references. Typically a specific value other than 0 or -1 is used when flushing space to try to release a certain amount of bytes for a ticket. In this case we just simply calculate how many delayed reference heads correspond to a specific amount of bytes, with calc_delayed_refs_nr(). However that only takes into account the space reserved for the reference heads themselves, and does not account for the space reserved for deleting checksums from the csum tree (see add_delayed_ref_head() and update_existing_head_ref()) in case we are going to delete a data extent. This means we may end up running more delayed references than necessary in case we process delayed references for deleting a data extent. So change the logic of btrfs_run_delayed_refs() to take a bytes argument to specify how many bytes of delayed references to run/release, using the special values of 0 to mean all existing delayed references and U64_MAX (or (u64)-1) to keep running delayed references while we can find any. This prevents running more delayed references than necessary, when we have delayed references for deleting data extents, but also makes the upcoming changes/patches simpler and it's preparatory work for them. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/block-group.c | 3 +-- fs/btrfs/extent-tree.c | 53 +++++++++++++++++++++++++++--------------- fs/btrfs/extent-tree.h | 4 ++-- fs/btrfs/space-info.c | 17 ++------------ fs/btrfs/transaction.c | 8 +++---- 5 files changed, 43 insertions(+), 42 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index b2e5107b7cec..fb506ee51d2c 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -3474,8 +3474,7 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans) cache_save_setup(cache, trans, path); if (!ret) - ret = btrfs_run_delayed_refs(trans, - (unsigned long) -1); + ret = btrfs_run_delayed_refs(trans, U64_MAX); if (!ret && cache->disk_cache_state == BTRFS_DC_SETUP) { cache->io_ctl.inode = NULL; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index b27e0a6878b3..475903a8d772 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1815,7 +1815,7 @@ static int run_and_cleanup_extent_op(struct btrfs_trans_handle *trans, return ret ? ret : 1; } -void btrfs_cleanup_ref_head_accounting(struct btrfs_fs_info *fs_info, +u64 btrfs_cleanup_ref_head_accounting(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_root *delayed_refs, struct btrfs_delayed_ref_head *head) { @@ -1833,10 +1833,13 @@ void btrfs_cleanup_ref_head_accounting(struct btrfs_fs_info *fs_info, } btrfs_delayed_refs_rsv_release(fs_info, nr_items); + + return btrfs_calc_delayed_ref_bytes(fs_info, nr_items); } static int cleanup_ref_head(struct btrfs_trans_handle *trans, - struct btrfs_delayed_ref_head *head) + struct btrfs_delayed_ref_head *head, + u64 *bytes_released) { struct btrfs_fs_info *fs_info = trans->fs_info; @@ -1881,7 +1884,7 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans, } } - btrfs_cleanup_ref_head_accounting(fs_info, delayed_refs, head); + *bytes_released = btrfs_cleanup_ref_head_accounting(fs_info, delayed_refs, head); trace_run_delayed_ref_head(fs_info, head, 0); btrfs_delayed_ref_unlock(head); @@ -2002,15 +2005,22 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans, * Returns -ENOMEM or -EIO on failure and will abort the transaction. */ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, - unsigned long nr) + u64 min_bytes) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_delayed_ref_head *locked_ref = NULL; int ret; unsigned long count = 0; + unsigned long max_count = 0; + u64 bytes_processed = 0; delayed_refs = &trans->transaction->delayed_refs; + if (min_bytes == 0) { + max_count = delayed_refs->num_heads_ready; + min_bytes = U64_MAX; + } + do { if (!locked_ref) { locked_ref = btrfs_obtain_ref_head(trans); @@ -2046,11 +2056,14 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, */ return ret; } else if (!ret) { + u64 bytes_released = 0; + /* * Success, perform the usual cleanup of a processed * head */ - ret = cleanup_ref_head(trans, locked_ref); + ret = cleanup_ref_head(trans, locked_ref, &bytes_released); + bytes_processed += bytes_released; if (ret > 0 ) { /* We dropped our lock, we need to loop. */ ret = 0; @@ -2067,7 +2080,9 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, locked_ref = NULL; cond_resched(); - } while ((nr != -1 && count < nr) || locked_ref); + } while ((min_bytes != U64_MAX && bytes_processed < min_bytes) || + (max_count > 0 && count < max_count) || + locked_ref); return 0; } @@ -2116,22 +2131,25 @@ static u64 find_middle(struct rb_root *root) #endif /* - * this starts processing the delayed reference count updates and - * extent insertions we have queued up so far. count can be - * 0, which means to process everything in the tree at the start - * of the run (but not newly added entries), or it can be some target - * number you'd like to process. + * This starts processing the delayed reference count updates and extent + * insertions we have queued up so far. + * + * @trans: Transaction handle. + * @min_bytes: How many bytes of delayed references to process. After this + * many bytes we stop processing delayed references if there are + * any more. If 0 it means to run all existing delayed references, + * but not new ones added after running all existing ones. + * Use (u64)-1 (U64_MAX) to run all existing delayed references + * plus any new ones that are added. * * Returns 0 on success or if called with an aborted transaction * Returns <0 on error and aborts the transaction */ -int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, - unsigned long count) +int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, u64 min_bytes) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_delayed_ref_root *delayed_refs; int ret; - int run_all = count == (unsigned long)-1; /* We'll clean this up in btrfs_cleanup_transaction */ if (TRANS_ABORTED(trans)) @@ -2141,20 +2159,17 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, return 0; delayed_refs = &trans->transaction->delayed_refs; - if (count == 0) - count = delayed_refs->num_heads_ready; - again: #ifdef SCRAMBLE_DELAYED_REFS delayed_refs->run_delayed_start = find_middle(&delayed_refs->root); #endif - ret = __btrfs_run_delayed_refs(trans, count); + ret = __btrfs_run_delayed_refs(trans, min_bytes); if (ret < 0) { btrfs_abort_transaction(trans, ret); return ret; } - if (run_all) { + if (min_bytes == U64_MAX) { btrfs_create_pending_block_groups(trans); spin_lock(&delayed_refs->lock); diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h index 88c249c37516..64436f15c4eb 100644 --- a/fs/btrfs/extent-tree.h +++ b/fs/btrfs/extent-tree.h @@ -91,8 +91,8 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, enum btrfs_inline_ref_type is_data); u64 hash_extent_data_ref(u64 root_objectid, u64 owner, u64 offset); -int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long count); -void btrfs_cleanup_ref_head_accounting(struct btrfs_fs_info *fs_info, +int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, u64 min_bytes); +u64 btrfs_cleanup_ref_head_accounting(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_root *delayed_refs, struct btrfs_delayed_ref_head *head); int btrfs_lookup_data_extent(struct btrfs_fs_info *fs_info, u64 start, u64 len); diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index ca7c702172fd..5443d76c83cb 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -556,18 +556,6 @@ static inline u64 calc_reclaim_items_nr(const struct btrfs_fs_info *fs_info, return nr; } -static inline u64 calc_delayed_refs_nr(const struct btrfs_fs_info *fs_info, - u64 to_reclaim) -{ - const u64 bytes = btrfs_calc_delayed_ref_bytes(fs_info, 1); - u64 nr; - - nr = div64_u64(to_reclaim, bytes); - if (!nr) - nr = 1; - return nr; -} - #define EXTENT_SIZE_PER_ITEM SZ_256K /* @@ -749,10 +737,9 @@ static void flush_space(struct btrfs_fs_info *fs_info, break; } if (state == FLUSH_DELAYED_REFS_NR) - nr = calc_delayed_refs_nr(fs_info, num_bytes); + btrfs_run_delayed_refs(trans, num_bytes); else - nr = 0; - btrfs_run_delayed_refs(trans, nr); + btrfs_run_delayed_refs(trans, 0); btrfs_end_transaction(trans); break; case ALLOC_CHUNK: diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index ee63d92d66b4..06050aa520dc 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1330,7 +1330,7 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans) } /* Now flush any delayed refs generated by updating all of the roots */ - ret = btrfs_run_delayed_refs(trans, (unsigned long)-1); + ret = btrfs_run_delayed_refs(trans, U64_MAX); if (ret) return ret; @@ -1345,7 +1345,7 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans) * so we want to keep this flushing in this loop to make sure * everything gets run. */ - ret = btrfs_run_delayed_refs(trans, (unsigned long)-1); + ret = btrfs_run_delayed_refs(trans, U64_MAX); if (ret) return ret; } @@ -1563,7 +1563,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, * for now flush the delayed refs to narrow the race window where the * qgroup counters could end up wrong. */ - ret = btrfs_run_delayed_refs(trans, (unsigned long)-1); + ret = btrfs_run_delayed_refs(trans, U64_MAX); if (ret) { btrfs_abort_transaction(trans, ret); return ret; @@ -2397,7 +2397,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) if (ret) goto unlock_reloc; - ret = btrfs_run_delayed_refs(trans, (unsigned long)-1); + ret = btrfs_run_delayed_refs(trans, U64_MAX); if (ret) goto unlock_reloc; From patchwork Fri Sep 8 12:09:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377412 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3B18EE7FE3 for ; Fri, 8 Sep 2023 12:09:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230414AbjIHMJu (ORCPT ); Fri, 8 Sep 2023 08:09:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242263AbjIHMJt (ORCPT ); Fri, 8 Sep 2023 08:09:49 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DC4B1BC5 for ; Fri, 8 Sep 2023 05:09:45 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 40603C433C8 for ; Fri, 8 Sep 2023 12:09:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174984; bh=rN05cQRq87IjJE2w8jhzNgIP+KdW1fvyy7fYMAXjNNs=; h=From:To:Subject:Date:In-Reply-To:References:From; b=OrrfnOhHNQGcGzctS3cuBzKb7y7nBQyDmeRTumAPQxdSCkkcYzN2t+zVjLhFw9pO8 ZtE9Of3iyDi2+JnhtFzGEoWkxlIfPO0KndrsDl2s5vEX9wLuA7mlJHZERz3Dget7cG AriUltTsDYiW4kE9oSStWVVZuhYPM3iity8R4mN34RwCcQKHoBPk+RI3QaUzkDygSx yTvIZ0XVS+AikqzhdTtgfhtNoJXfENSRnGSEZUJusdGHIpwEJtflgpLvphuz+pyuGq WOvfhBG5+YMLTTPHUcnNmlaovMVWp6YPYcZProsT3F/cF585dvtbNl7/65tR9CkkR8 u9/BQ2UNyLX2A== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 18/21] btrfs: reserve space for delayed refs on a per ref basis Date: Fri, 8 Sep 2023 13:09:20 +0100 Message-Id: <5a882a00e4a3f39ecb6c2389ebc749acefadf1ab.1694174371.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Currently when reserving space for delayed refs we do it on a per ref head basis. This is generally enough because most back refs for an extent end up being inlined in the extent item - with the default leaf size of 16K we can have at most 33 inline back refs (this is calculated by the macro BTRFS_MAX_EXTENT_ITEM_SIZE()). The amount of bytes reserved for each ref head is given by btrfs_calc_delayed_ref_bytes(), which basically corresponds to a single path for insertion into the extent tree plus another path for insertion into the free space tree if it's enabled. However if we have reached the limit of inline refs or we have a mix of inline and non-inline refs, then we will need to insert a non-inline ref and update the existing extent item to update the total number of references for the extent. This implies we need reserved space for two insertion paths in the extent tree, but we only reserved for one path. The extent item and the non-inline ref item may be located in different leaves, or even if they are located in the same leaf, after updating the extent item and before inserting the non-inline ref item, the extent buffers in the btree path may have been written (due to memory pressure for e.g.), in which case we need to COW the entire path again. In this case since we have not reserved enough space for the delayed refs block reserve, we will use the global block reserve. If we are in a situation where the fs has no more unallocated space enough to allocate a new metadata block group and available space in the existing metadata block groups is close to the maximum size of the global block reserve (512M), we may end up consuming too much of the free metadata space to the point where we can't commit any future transaction because it will fail, with -ENOSPC, during its commit when trying to allocate an extent for some COW operation (running delayed refs generated by running delayed refs or COWing the root tree's root node at commit_cowonly_roots() for example). Such dramatic scenario can happen if we have many delayed refs that require the insertion of non-inline ref items, due to too many reflinks or snapshots. We also have situations where we use the global block reserve because we could not in advance know that we will need space to update some trees (block group creation for example), so this all adds up to increase the chances of exhausting the global block reserve and making any future transaction commit to fail with -ENOSPC and turn the fs into RO mode, or fail the mount operation in case the mount needs to start and commit a transaction, such as when we have orphans to cleanup for example - such case was reported and hit by someone running a SLE (SUSE Linux Enterprise) distribution for example - where the fs had no more unallocated space that could be used to allocate a new metadata block group, and the available metadata space was about 1.5M, not enough to commit a transaction to cleanup an orphan inode (or do relocation of data block groups that were far from being full). So reserve space for delayed refs by individual refs and not by ref heads, as we may need to COW multiple extent tree paths due to non-inline ref items. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/delayed-ref.c | 32 ++++++++++++++++++++++---------- fs/btrfs/disk-io.c | 1 + fs/btrfs/extent-tree.c | 29 +++++++++++++++-------------- 3 files changed, 38 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 0991f66471ff..067176ac5a2b 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -422,7 +422,8 @@ int btrfs_delayed_ref_lock(struct btrfs_delayed_ref_root *delayed_refs, return 0; } -static inline void drop_delayed_ref(struct btrfs_delayed_ref_root *delayed_refs, +static inline void drop_delayed_ref(struct btrfs_fs_info *fs_info, + struct btrfs_delayed_ref_root *delayed_refs, struct btrfs_delayed_ref_head *head, struct btrfs_delayed_ref_node *ref) { @@ -433,9 +434,11 @@ static inline void drop_delayed_ref(struct btrfs_delayed_ref_root *delayed_refs, list_del(&ref->add_list); btrfs_put_delayed_ref(ref); atomic_dec(&delayed_refs->num_entries); + btrfs_delayed_refs_rsv_release(fs_info, 1); } -static bool merge_ref(struct btrfs_delayed_ref_root *delayed_refs, +static bool merge_ref(struct btrfs_fs_info *fs_info, + struct btrfs_delayed_ref_root *delayed_refs, struct btrfs_delayed_ref_head *head, struct btrfs_delayed_ref_node *ref, u64 seq) @@ -464,10 +467,10 @@ static bool merge_ref(struct btrfs_delayed_ref_root *delayed_refs, mod = -next->ref_mod; } - drop_delayed_ref(delayed_refs, head, next); + drop_delayed_ref(fs_info, delayed_refs, head, next); ref->ref_mod += mod; if (ref->ref_mod == 0) { - drop_delayed_ref(delayed_refs, head, ref); + drop_delayed_ref(fs_info, delayed_refs, head, ref); done = true; } else { /* @@ -505,7 +508,7 @@ void btrfs_merge_delayed_refs(struct btrfs_fs_info *fs_info, ref = rb_entry(node, struct btrfs_delayed_ref_node, ref_node); if (seq && ref->seq >= seq) continue; - if (merge_ref(delayed_refs, head, ref, seq)) + if (merge_ref(fs_info, delayed_refs, head, ref, seq)) goto again; } } @@ -584,10 +587,11 @@ void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs, * Return true if the ref was merged into an existing one (and therefore can be * freed by the caller). */ -static bool insert_delayed_ref(struct btrfs_delayed_ref_root *root, +static bool insert_delayed_ref(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_head *href, struct btrfs_delayed_ref_node *ref) { + struct btrfs_delayed_ref_root *root = &trans->transaction->delayed_refs; struct btrfs_delayed_ref_node *exist; int mod; @@ -598,6 +602,7 @@ static bool insert_delayed_ref(struct btrfs_delayed_ref_root *root, list_add_tail(&ref->add_list, &href->ref_add_list); atomic_inc(&root->num_entries); spin_unlock(&href->lock); + trans->delayed_ref_updates++; return false; } @@ -626,7 +631,7 @@ static bool insert_delayed_ref(struct btrfs_delayed_ref_root *root, /* remove existing tail if its ref_mod is zero */ if (exist->ref_mod == 0) - drop_delayed_ref(root, href, exist); + drop_delayed_ref(trans->fs_info, root, href, exist); spin_unlock(&href->lock); return true; } @@ -695,6 +700,8 @@ static noinline void update_existing_head_ref(struct btrfs_trans_handle *trans, /* * If we are going to from a positive ref mod to a negative or vice * versa we need to make sure to adjust pending_csums accordingly. + * We reserve bytes for csum deletion when adding or updating a ref head + * see add_delayed_ref_head() for more details. */ if (existing->is_data) { u64 csum_leaves = @@ -819,6 +826,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans, kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref); head_ref = existing; } else { + /* + * We reserve the amount of bytes needed to delete csums when + * adding the ref head and not when adding individual drop refs + * since the csum items are deleted only after running the last + * delayed drop ref (the data extent's ref count drops to 0). + */ if (head_ref->is_data && head_ref->ref_mod < 0) { delayed_refs->pending_csums += head_ref->num_bytes; trans->delayed_ref_updates += @@ -828,7 +841,6 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans, delayed_refs->num_heads++; delayed_refs->num_heads_ready++; atomic_inc(&delayed_refs->num_entries); - trans->delayed_ref_updates++; } if (qrecord_inserted_ret) *qrecord_inserted_ret = qrecord_inserted; @@ -959,7 +971,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, head_ref = add_delayed_ref_head(trans, head_ref, record, action, &qrecord_inserted); - merged = insert_delayed_ref(delayed_refs, head_ref, &ref->node); + merged = insert_delayed_ref(trans, head_ref, &ref->node); spin_unlock(&delayed_refs->lock); /* @@ -1051,7 +1063,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans, head_ref = add_delayed_ref_head(trans, head_ref, record, action, &qrecord_inserted); - merged = insert_delayed_ref(delayed_refs, head_ref, &ref->node); + merged = insert_delayed_ref(trans, head_ref, &ref->node); spin_unlock(&delayed_refs->lock); /* diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 0a96ea8c1d3a..f4fcdbf312f4 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4609,6 +4609,7 @@ static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, list_del(&ref->add_list); atomic_dec(&delayed_refs->num_entries); btrfs_put_delayed_ref(ref); + btrfs_delayed_refs_rsv_release(fs_info, 1); } if (head->must_insert_reserved) pin_bytes = true; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 475903a8d772..79aa3e68fd34 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1819,22 +1819,24 @@ u64 btrfs_cleanup_ref_head_accounting(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_root *delayed_refs, struct btrfs_delayed_ref_head *head) { - int nr_items = 1; /* Dropping this ref head update. */ - /* * We had csum deletions accounted for in our delayed refs rsv, we need * to drop the csum leaves for this update from our delayed_refs_rsv. */ if (head->total_ref_mod < 0 && head->is_data) { + int nr_items; + spin_lock(&delayed_refs->lock); delayed_refs->pending_csums -= head->num_bytes; spin_unlock(&delayed_refs->lock); - nr_items += btrfs_csum_bytes_to_leaves(fs_info, head->num_bytes); - } + nr_items = btrfs_csum_bytes_to_leaves(fs_info, head->num_bytes); + + btrfs_delayed_refs_rsv_release(fs_info, nr_items); - btrfs_delayed_refs_rsv_release(fs_info, nr_items); + return btrfs_calc_delayed_ref_bytes(fs_info, nr_items); + } - return btrfs_calc_delayed_ref_bytes(fs_info, nr_items); + return 0; } static int cleanup_ref_head(struct btrfs_trans_handle *trans, @@ -1884,7 +1886,7 @@ static int cleanup_ref_head(struct btrfs_trans_handle *trans, } } - *bytes_released = btrfs_cleanup_ref_head_accounting(fs_info, delayed_refs, head); + *bytes_released += btrfs_cleanup_ref_head_accounting(fs_info, delayed_refs, head); trace_run_delayed_ref_head(fs_info, head, 0); btrfs_delayed_ref_unlock(head); @@ -1926,7 +1928,8 @@ static struct btrfs_delayed_ref_head *btrfs_obtain_ref_head( } static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans, - struct btrfs_delayed_ref_head *locked_ref) + struct btrfs_delayed_ref_head *locked_ref, + u64 *bytes_released) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_delayed_ref_root *delayed_refs; @@ -1982,7 +1985,8 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans, ret = run_one_delayed_ref(trans, ref, extent_op, must_insert_reserved); - + btrfs_delayed_refs_rsv_release(fs_info, 1); + *bytes_released += btrfs_calc_delayed_ref_bytes(fs_info, 1); btrfs_free_delayed_extent_op(extent_op); if (ret) { unselect_delayed_ref_head(delayed_refs, locked_ref); @@ -2048,7 +2052,7 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, spin_lock(&locked_ref->lock); btrfs_merge_delayed_refs(fs_info, delayed_refs, locked_ref); - ret = btrfs_run_delayed_refs_for_head(trans, locked_ref); + ret = btrfs_run_delayed_refs_for_head(trans, locked_ref, &bytes_processed); if (ret < 0 && ret != -EAGAIN) { /* * Error, btrfs_run_delayed_refs_for_head already @@ -2056,14 +2060,11 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, */ return ret; } else if (!ret) { - u64 bytes_released = 0; - /* * Success, perform the usual cleanup of a processed * head */ - ret = cleanup_ref_head(trans, locked_ref, &bytes_released); - bytes_processed += bytes_released; + ret = cleanup_ref_head(trans, locked_ref, &bytes_processed); if (ret > 0 ) { /* We dropped our lock, we need to loop. */ ret = 0; From patchwork Fri Sep 8 12:09:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01CF7EE7FE1 for ; Fri, 8 Sep 2023 12:09:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243212AbjIHMJu (ORCPT ); Fri, 8 Sep 2023 08:09:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230414AbjIHMJt (ORCPT ); Fri, 8 Sep 2023 08:09:49 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16AF41BEA for ; Fri, 8 Sep 2023 05:09:46 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3862EC433CB for ; Fri, 8 Sep 2023 12:09:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174985; bh=kGTOrT81v+EufuQQqL2sqgfSVKt+gt6S2LZqpG1f3SE=; h=From:To:Subject:Date:In-Reply-To:References:From; b=uStRo6ar2HqfDopi2fQAAAbbzu2HPdKL4ARSvdxHU7pRLY7fG/Lc6/wF8s4VchU1v PWqukQo+1Ky9HMUUGfweNWkFTMfsZ8b4gXucdyNjesodYrS+9saPXmouUP7ODaX5Fj YlGrAmdhTl24cIsqV5a0tH3XMq7eBFYqKFH932iLY4YIbfjKVO3u67TXkIbf4OYh4k qfbP4zBk9OrJTSgNH7D+Dk0uynt+eILT8UHzXqy+VlE5gpaf/pnMmQztLQPluxqCEa 5Dvs64PIO+rDLragn1ZP+Pu96T165RhdqZslV+EsKYE8Bc92we+Z9shQlMzpC3SHKI CU8qadd/63toQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 19/21] btrfs: remove pointless initialization at btrfs_delayed_refs_rsv_release() Date: Fri, 8 Sep 2023 13:09:21 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana There's no point in initializing to 0 the local variable 'released' as we don't use it before the next assignment to it. So remove the initialization. This may help avoid some warnings with clang tools such as the one reported/fixed by commit 966de47ff0c9 ("btrfs: remove redundant initialization of variables in log_new_ancestors"). Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/delayed-ref.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 067176ac5a2b..19dd74b236c6 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -66,7 +66,7 @@ void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr) { struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv; const u64 num_bytes = btrfs_calc_delayed_ref_bytes(fs_info, nr); - u64 released = 0; + u64 released; released = btrfs_block_rsv_release(fs_info, block_rsv, num_bytes, NULL); if (released) From patchwork Fri Sep 8 12:09:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 489C7EE7FE1 for ; Fri, 8 Sep 2023 12:09:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242263AbjIHMJw (ORCPT ); Fri, 8 Sep 2023 08:09:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230029AbjIHMJw (ORCPT ); Fri, 8 Sep 2023 08:09:52 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FB671BC5 for ; Fri, 8 Sep 2023 05:09:47 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2FB5BC433C7 for ; Fri, 8 Sep 2023 12:09:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174986; bh=LcKZtWBzVb/mj3LccNNR919x8NdQvoz17ntV+fy4q9w=; h=From:To:Subject:Date:In-Reply-To:References:From; b=OIy/0my9Y6HEuTEDw0N7lUsCAYfPeVrH8Xtt3PZLTBtSu2icPMxdEbMpmREBnifWa M7j9SCMawwn3eS91oziDD+SwnD43dvQrY7JDJ86v/2uRds3T6886pXEhhQ5EQhtvnQ ecsUamEQrFUMhGIxLLH1iIZrOn2FD47NKp0r8eeYvTrbmkaIQjS20avwoVmlYFRHC6 M7N7Wat03vsa9o7yGswvMg/1L8xtSQHyVfJto5/T3mjQt9pZ6BuCjVjLGqV4nQ6+Lv cnqiU8sylWI8WMmXIDRKAk8ygg0jcickMfI6OUDxdfKBLU5i7KaNgW7Fc7r5Ln0jEh kltn0AnEFAO7A== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 20/21] btrfs: stop doing excessive space reservation for csum deletion Date: Fri, 8 Sep 2023 13:09:22 +0100 Message-Id: <28529e4ffd497044150775d53395e50c0d48f0f4.1694174371.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Currently when reserving space for deleting the csum items for a data extent, when adding or updating a delayed ref head, we determine how many leaves of csum items we can have and then pass that number to the helper btrfs_calc_delayed_ref_bytes(). This helper is used for calculating space for all tree modifications we need when running delayed references, however the amount of space it computes is excessive for deleting csum items because: 1) It uses btrfs_calc_insert_metadata_size() which is excessive because we only need to delete csum items from the csum tree, we don't need to insert any items, so btrfs_calc_metadata_size() is all we need (as it computes space needed to delete an item); 2) If the free space tree is enabled, it doubles the amount of space, which is pointless for csum deletion since we don't need to touch the free space tree or any other tree other than the csum tree. So improve on this by tracking how many csum deletions we have and using a new helper to calculate space for csum deletions (just a wrapper around btrfs_calc_metadata_size() with a comment). This reduces the amount of space we need to reserve for csum deletions by a factor of 4, and it helps reduce the number of times we have to block space reservations and have the reclaim task enter the space flushing algorihm (flush delayed items, flush delayed refs, etc) in order to satisfy tickets. For example this results in a total time decrease when unlinking (or truncating) files with many extents, as we end up having to block on space metadata reservations less often. Example test: $ cat test.sh #!/bin/bash DEV=/dev/nullb0 MNT=/mnt/test umount $DEV &> /dev/null mkfs.btrfs -f $DEV # Use compression to quickly create files with a lot of extents # (each with a size of 128K). mount -o compress=lzo $DEV $MNT # 100G gives at least 983040 extents with a size of 128K. xfs_io -f -c "pwrite -S 0xab -b 1M 0 120G" $MNT/foobar # Flush all delalloc and clear all metadata from memory. umount $MNT mount -o compress=lzo $DEV $MNT start=$(date +%s%N) rm -f $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "rm took $dur milliseconds" umount $MNT Before this change rm took: 7504 milliseconds After this change rm took: 6574 milliseconds (-12.4%) Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/block-group.c | 8 ++++---- fs/btrfs/delayed-ref.c | 33 ++++++++++++++++++++------------- fs/btrfs/delayed-ref.h | 13 ++++++++++++- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/extent-tree.c | 10 +++++----- fs/btrfs/transaction.c | 2 +- fs/btrfs/transaction.h | 1 + 7 files changed, 45 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index fb506ee51d2c..82c77dbad2e8 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1286,7 +1286,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, /* Once for the lookup reference */ btrfs_put_block_group(block_group); if (remove_rsv) - btrfs_delayed_refs_rsv_release(fs_info, 1); + btrfs_delayed_refs_rsv_release(fs_info, 1, 0); btrfs_free_path(path); return ret; } @@ -2709,7 +2709,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans) /* Already aborted the transaction if it failed. */ next: - btrfs_delayed_refs_rsv_release(fs_info, 1); + btrfs_delayed_refs_rsv_release(fs_info, 1, 0); list_del_init(&block_group->bg_list); clear_bit(BLOCK_GROUP_FLAG_NEW, &block_group->runtime_flags); } @@ -3370,7 +3370,7 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans) if (should_put) btrfs_put_block_group(cache); if (drop_reserve) - btrfs_delayed_refs_rsv_release(fs_info, 1); + btrfs_delayed_refs_rsv_release(fs_info, 1, 0); /* * Avoid blocking other tasks for too long. It might even save * us from writing caches for block groups that are going to be @@ -3517,7 +3517,7 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans) /* If its not on the io list, we need to put the block group */ if (should_put) btrfs_put_block_group(cache); - btrfs_delayed_refs_rsv_release(fs_info, 1); + btrfs_delayed_refs_rsv_release(fs_info, 1, 0); spin_lock(&cur_trans->dirty_bgs_lock); } spin_unlock(&cur_trans->dirty_bgs_lock); diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 19dd74b236c6..a6680e8756a1 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -57,17 +57,21 @@ bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info) * Release a ref head's reservation. * * @fs_info: the filesystem - * @nr: number of items to drop + * @nr_refs: number of delayed refs to drop + * @nr_csums: number of csum items to drop * * Drops the delayed ref head's count from the delayed refs rsv and free any * excess reservation we had. */ -void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr) +void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr_refs, int nr_csums) { struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv; - const u64 num_bytes = btrfs_calc_delayed_ref_bytes(fs_info, nr); + u64 num_bytes; u64 released; + num_bytes = btrfs_calc_delayed_ref_bytes(fs_info, nr_refs); + num_bytes += btrfs_calc_delayed_ref_csum_bytes(fs_info, nr_csums); + released = btrfs_block_rsv_release(fs_info, block_rsv, num_bytes, NULL); if (released) trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv", @@ -77,8 +81,9 @@ void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr) /* * Adjust the size of the delayed refs rsv. * - * This is to be called anytime we may have adjusted trans->delayed_ref_updates, - * it'll calculate the additional size and add it to the delayed_refs_rsv. + * This is to be called anytime we may have adjusted trans->delayed_ref_updates + * or trans->delayed_ref_csum_deletions, it'll calculate the additional size and + * add it to the delayed_refs_rsv. */ void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans) { @@ -86,17 +91,19 @@ void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans) struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_refs_rsv; u64 num_bytes; - if (!trans->delayed_ref_updates) - return; + num_bytes = btrfs_calc_delayed_ref_bytes(fs_info, trans->delayed_ref_updates); + num_bytes += btrfs_calc_delayed_ref_csum_bytes(fs_info, + trans->delayed_ref_csum_deletions); - num_bytes = btrfs_calc_delayed_ref_bytes(fs_info, - trans->delayed_ref_updates); + if (num_bytes == 0) + return; spin_lock(&delayed_rsv->lock); delayed_rsv->size += num_bytes; delayed_rsv->full = false; spin_unlock(&delayed_rsv->lock); trans->delayed_ref_updates = 0; + trans->delayed_ref_csum_deletions = 0; } /* @@ -434,7 +441,7 @@ static inline void drop_delayed_ref(struct btrfs_fs_info *fs_info, list_del(&ref->add_list); btrfs_put_delayed_ref(ref); atomic_dec(&delayed_refs->num_entries); - btrfs_delayed_refs_rsv_release(fs_info, 1); + btrfs_delayed_refs_rsv_release(fs_info, 1, 0); } static bool merge_ref(struct btrfs_fs_info *fs_info, @@ -710,11 +717,11 @@ static noinline void update_existing_head_ref(struct btrfs_trans_handle *trans, if (existing->total_ref_mod >= 0 && old_ref_mod < 0) { delayed_refs->pending_csums -= existing->num_bytes; - btrfs_delayed_refs_rsv_release(fs_info, csum_leaves); + btrfs_delayed_refs_rsv_release(fs_info, 0, csum_leaves); } if (existing->total_ref_mod < 0 && old_ref_mod >= 0) { delayed_refs->pending_csums += existing->num_bytes; - trans->delayed_ref_updates += csum_leaves; + trans->delayed_ref_csum_deletions += csum_leaves; } } @@ -834,7 +841,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans, */ if (head_ref->is_data && head_ref->ref_mod < 0) { delayed_refs->pending_csums += head_ref->num_bytes; - trans->delayed_ref_updates += + trans->delayed_ref_csum_deletions += btrfs_csum_bytes_to_leaves(trans->fs_info, head_ref->num_bytes); } diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index 46a1421cd99d..1bbabfebb329 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -277,6 +277,17 @@ static inline u64 btrfs_calc_delayed_ref_bytes(const struct btrfs_fs_info *fs_in return num_bytes; } +static inline u64 btrfs_calc_delayed_ref_csum_bytes(const struct btrfs_fs_info *fs_info, + int num_csum_items) +{ + /* + * Deleting csum items does not result in new nodes/leaves and does not + * require changing the free space tree, only the csum tree, so this is + * all we need. + */ + return btrfs_calc_metadata_size(fs_info, num_csum_items); +} + static inline void btrfs_init_generic_ref(struct btrfs_ref *generic_ref, int action, u64 bytenr, u64 len, u64 parent) { @@ -401,7 +412,7 @@ struct btrfs_delayed_ref_head *btrfs_select_ref_head( int btrfs_check_delayed_seq(struct btrfs_fs_info *fs_info, u64 seq); -void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr); +void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr_refs, int nr_csums); void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans); int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info, enum btrfs_reserve_flush_enum flush); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f4fcdbf312f4..bf8dc572b9d2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4609,7 +4609,7 @@ static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, list_del(&ref->add_list); atomic_dec(&delayed_refs->num_entries); btrfs_put_delayed_ref(ref); - btrfs_delayed_refs_rsv_release(fs_info, 1); + btrfs_delayed_refs_rsv_release(fs_info, 1, 0); } if (head->must_insert_reserved) pin_bytes = true; @@ -4807,7 +4807,7 @@ void btrfs_cleanup_dirty_bgs(struct btrfs_transaction *cur_trans, spin_unlock(&cur_trans->dirty_bgs_lock); btrfs_put_block_group(cache); - btrfs_delayed_refs_rsv_release(fs_info, 1); + btrfs_delayed_refs_rsv_release(fs_info, 1, 0); spin_lock(&cur_trans->dirty_bgs_lock); } spin_unlock(&cur_trans->dirty_bgs_lock); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 79aa3e68fd34..caef12198cf6 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1824,16 +1824,16 @@ u64 btrfs_cleanup_ref_head_accounting(struct btrfs_fs_info *fs_info, * to drop the csum leaves for this update from our delayed_refs_rsv. */ if (head->total_ref_mod < 0 && head->is_data) { - int nr_items; + int nr_csums; spin_lock(&delayed_refs->lock); delayed_refs->pending_csums -= head->num_bytes; spin_unlock(&delayed_refs->lock); - nr_items = btrfs_csum_bytes_to_leaves(fs_info, head->num_bytes); + nr_csums = btrfs_csum_bytes_to_leaves(fs_info, head->num_bytes); - btrfs_delayed_refs_rsv_release(fs_info, nr_items); + btrfs_delayed_refs_rsv_release(fs_info, 0, nr_csums); - return btrfs_calc_delayed_ref_bytes(fs_info, nr_items); + return btrfs_calc_delayed_ref_csum_bytes(fs_info, nr_csums); } return 0; @@ -1985,7 +1985,7 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans, ret = run_one_delayed_ref(trans, ref, extent_op, must_insert_reserved); - btrfs_delayed_refs_rsv_release(fs_info, 1); + btrfs_delayed_refs_rsv_release(fs_info, 1, 0); *bytes_released += btrfs_calc_delayed_ref_bytes(fs_info, 1); btrfs_free_delayed_extent_op(extent_op); if (ret) { diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 06050aa520dc..4453d8113b37 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -2081,7 +2081,7 @@ static void btrfs_cleanup_pending_block_groups(struct btrfs_trans_handle *trans) struct btrfs_block_group *block_group, *tmp; list_for_each_entry_safe(block_group, tmp, &trans->new_bgs, bg_list) { - btrfs_delayed_refs_rsv_release(fs_info, 1); + btrfs_delayed_refs_rsv_release(fs_info, 1, 0); list_del_init(&block_group->bg_list); } } diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 8e9fa23bd7fe..474ce34ed02e 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -119,6 +119,7 @@ struct btrfs_trans_handle { u64 bytes_reserved; u64 chunk_bytes_reserved; unsigned long delayed_ref_updates; + unsigned long delayed_ref_csum_deletions; struct btrfs_transaction *transaction; struct btrfs_block_rsv *block_rsv; struct btrfs_block_rsv *orig_rsv; From patchwork Fri Sep 8 12:09:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13377414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6070EE7FE3 for ; Fri, 8 Sep 2023 12:09:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243215AbjIHMJx (ORCPT ); Fri, 8 Sep 2023 08:09:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34850 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230029AbjIHMJx (ORCPT ); Fri, 8 Sep 2023 08:09:53 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 071D61BE7 for ; Fri, 8 Sep 2023 05:09:48 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 29375C433C8 for ; Fri, 8 Sep 2023 12:09:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1694174987; bh=pDTP2iPr5gR33yWE4YAJE8dFuQCek84F1Oc53ICDzi8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ty4lm+3tpSbDxDOaA6qGcCGNSi2DoFcXjt1p9vJLuaos4GxQqxJDSnUUEr0ygfWlr wegG+zhJra0/gsnkatRU0UMImmxSZ1mxA3SX3cHrtcm3GMdSVnEvaL/YJbVGYcDewm sONqPDwuWPl0oH2iKgvUgcXGXvo/exLKNqcUrmY8yBsqc426/56A3O9wUehw9He8jV QulBnyTkZqjQs/krG2dn004Z8NfFwCEF57Pb14WdmdbWzolOV9uMns/rE9+02MGRD9 IZfE5N8B3kqu1vc6uXV59AdCIZQGNHOsMJks7GIizELSgF1JFsgCiR4alhpRzpjMP4 bkkc5ti7XYz1w== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 21/21] btrfs: always reserve space for delayed refs when starting transaction Date: Fri, 8 Sep 2023 13:09:23 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When starting a transaction (or joining an existing one with btrfs_start_transaction()), we reserve space for the number of items we want to insert in a btree, but we don't do it for the delayed refs we will generate while using the transaction to modify (COW) extent buffers in a btree or allocate new extent buffers. Basically how it works: 1) When we start a transaction we reserve space for the number of items the caller wants to be inserted/modified/deleted in a btree. This space goes to the transaction block reserve; 2) If the delayed refs block reserve is not full, its size is greater than the amount of its reserved space, and the flush method is BTRFS_RESERVE_FLUSH_ALL, then we attempt to reserve more space for it corresponding to the number of items the caller wants to insert/modify/delete in a btree; 3) The size of the delayed refs block reserve is increased when a task creates delayed refs after COWing an extent buffer, allocating a new one or deleting (freeing) an extent buffer. This happens after the the task started or joined a transaction, whenever it calls btrfs_update_delayed_refs_rsv(); 4) The delayed refs block reserve is then refilled by anyone calling btrfs_delayed_refs_rsv_refill(), either during unlink/truncate operations or when someone else calls btrfs_start_transaction() with a 0 number of items and flush method BTRFS_RESERVE_FLUSH_ALL; 5) As a task COWs or allocates extent buffers, it consumes space from the transaction block reserve. When the task releases its transaction handle (btrfs_end_transaction()) or it attempts to commit the transaction, it releases any remaining space in the transaction block reserve that it did not use, as not all space may have been used (due to pessimistic space calculation) by calling btrfs_block_rsv_release() which will try to add that unused space to the delayed refs block reserve (if its current size is greater than its reserved space). That transferred space may not be enough to completely fulfill the delayed refs block reserve. Plus we have some tasks that will attempt do modify as many leaves as they can before getting -ENOSPC (and then reserving more space and retrying), such as hole punching and extent cloning which call btrfs_replace_file_extents(). Such tasks can generate therefore a high number of delayed refs, for both metadata and data (we can't know in advance how many file extent items we will find in a range and therefore how many delayed refs for dropping references on data extents we will generate); 6) If a transaction starts its commit before the delayeds refs block reserve is refilled, for example by the transaction kthread or by someone who called btrfs_join_transaction() before starting the commit, then when running delayed references if we don't have enough reserved space in the delayed refs block reserve, we will consume space from the global block reserve. Now this doesn't make a lot of sense because: 1) We should reserve space for delayed references when starting the transaction, since we have no guarantees the delayed refs block reserve will be refilled; 2) If no refill happens then we will consume from the global block reserve when running delayed refs during the transaction commit; 3) If we have a bunch of tasks calling btrfs_start_transaction() with a number of items greater than zero and at the time the delayed refs reserve is full, then we don't reserve any space at btrfs_start_transaction() for the delayed refs that will be generated by a task, and we can therefore end up using a lot of space from the global reserve when running the delayed refs during a transaction commit; 4) There are also other operations that result in bumping the size of the delayed refs reserve, such as creating and deleting block groups, as well as the need to update a block group item because we allocated or freed an extent from the respective block group; 5) If we have a significant gap betweent the delayed refs reserve's size and its reserved space, two very bad things may happen: 1) The reserved space of the global reserve may not be enough and we fail the transaction commit with -ENOSPC when running delayed refs; 2) If the available space in the global reserve is enough it may result in nearly exhausting it. If the fs has no more unallocated device space for allocating a new block group and all the available space in existing metadata block groups is not far from the global reserve's size before we started the transaction commit, we may end up in a situation where after the transaction commit we have too little available metadata space, and any future transaction commit will fail with -ENOSPC, because although we were able to reserve space to start the transaction, we were not able to commit it, as running delayed refs generates some more delayed refs (to update the extent tree for example) - this includes not even being able to commit a transaction that was started with the goal of unlinking a file, removing an empty data block group or doing reclaim/balance, so there's no way to release metadata space. In the worst case the next time we mount the filesystem we may also fail with -ENOSPC due to failure to commit a transaction to cleanup orphan inodes. This later case was reported and hit by someone running a SLE (SUSE Linux Enterprise) distribution for example - where the fs had no more unallocated space that could be used to allocate a new metadata block group, and the available metadata space was about 1.5M, not enough to commit a transaction to cleanup an orphan inode (or do relocation of data block groups that were far from being full). So improve on this situation by always reserving space for delayed refs when calling start_transaction(), and if the flush method is BTRFS_RESERVE_FLUSH_ALL, also try to refill the delayeds refs block reserve if it's not full. The space reserved for the delayed refs is added to a local block reserve that is part of the transaction handle, and when a task updates the delayed refs block reserve size, after creating a delayed ref, the space is transferred from that local reserve to the global delayed refs reserve (fs_info->delayed_refs_rsv). In case the local reserve does not have enough space, which may happen for tasks that generate a variable and potentially large number of delayed refs (such as the hole punching and extent cloning cases mentioned before), we transfer any available space and then rely on the current behaviour of hoping some other task refills the delayed refs reserve or fallback to the global block reserve. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik --- fs/btrfs/block-rsv.c | 6 +- fs/btrfs/delayed-ref.c | 21 ++++++- fs/btrfs/transaction.c | 135 ++++++++++++++++++++++++++++++++--------- fs/btrfs/transaction.h | 2 + 4 files changed, 132 insertions(+), 32 deletions(-) diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c index 6ccd91bbff3e..6a8f9629bbbd 100644 --- a/fs/btrfs/block-rsv.c +++ b/fs/btrfs/block-rsv.c @@ -281,10 +281,10 @@ u64 btrfs_block_rsv_release(struct btrfs_fs_info *fs_info, struct btrfs_block_rsv *target = NULL; /* - * If we are the delayed_rsv then push to the global rsv, otherwise dump - * into the delayed rsv if it is not full. + * If we are a delayed block reserve then push to the global rsv, + * otherwise dump into the global delayed reserve if it is not full. */ - if (block_rsv == delayed_rsv) + if (block_rsv->type == BTRFS_BLOCK_RSV_DELOPS) target = global_rsv; else if (block_rsv != global_rsv && !btrfs_block_rsv_full(delayed_rsv)) target = delayed_rsv; diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index a6680e8756a1..383d5be22941 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -89,7 +89,9 @@ void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_refs_rsv; + struct btrfs_block_rsv *local_rsv = &trans->delayed_rsv; u64 num_bytes; + u64 reserved_bytes; num_bytes = btrfs_calc_delayed_ref_bytes(fs_info, trans->delayed_ref_updates); num_bytes += btrfs_calc_delayed_ref_csum_bytes(fs_info, @@ -98,9 +100,26 @@ void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans) if (num_bytes == 0) return; + /* + * Try to take num_bytes from the transaction's local delayed reserve. + * If not possible, try to take as much as it's available. If the local + * reserve doesn't have enough reserved space, the delayed refs reserve + * will be refilled next time btrfs_delayed_refs_rsv_refill() is called + * by someone or if a transaction commit is triggered before that, the + * global block reserve will be used. We want to minimize using the + * global block reserve for cases we can account for in advance, to + * avoid exhausting it and reach -ENOSPC during a transaction commit. + */ + spin_lock(&local_rsv->lock); + reserved_bytes = min(num_bytes, local_rsv->reserved); + local_rsv->reserved -= reserved_bytes; + local_rsv->full = (local_rsv->reserved >= local_rsv->size); + spin_unlock(&local_rsv->lock); + spin_lock(&delayed_rsv->lock); delayed_rsv->size += num_bytes; - delayed_rsv->full = false; + delayed_rsv->reserved += reserved_bytes; + delayed_rsv->full = (delayed_rsv->reserved >= delayed_rsv->size); spin_unlock(&delayed_rsv->lock); trans->delayed_ref_updates = 0; trans->delayed_ref_csum_deletions = 0; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 4453d8113b37..ac347de1ffb8 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -555,6 +555,69 @@ static inline bool need_reserve_reloc_root(struct btrfs_root *root) return true; } +static int btrfs_reserve_trans_metadata(struct btrfs_fs_info *fs_info, + enum btrfs_reserve_flush_enum flush, + u64 num_bytes, + u64 *delayed_refs_bytes) +{ + struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv; + struct btrfs_space_info *si = fs_info->trans_block_rsv.space_info; + u64 extra_delayed_refs_bytes = 0; + u64 bytes; + int ret; + + /* + * If there's a gap between the size of the delayed refs reserve and + * its reserved space, than some tasks have added delayed refs or bumped + * its size otherwise (due to block group creation or removal, or block + * group item update). Also try to allocate that gap in order to prevent + * using (and possibly abusing) the global reserve when committing the + * transaction. + */ + if (flush == BTRFS_RESERVE_FLUSH_ALL && + !btrfs_block_rsv_full(delayed_refs_rsv)) { + spin_lock(&delayed_refs_rsv->lock); + if (delayed_refs_rsv->size > delayed_refs_rsv->reserved) + extra_delayed_refs_bytes = delayed_refs_rsv->size - + delayed_refs_rsv->reserved; + spin_unlock(&delayed_refs_rsv->lock); + } + + bytes = num_bytes + *delayed_refs_bytes + extra_delayed_refs_bytes; + + /* + * We want to reserve all the bytes we may need all at once, so we only + * do 1 enospc flushing cycle per transaction start. + */ + ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush); + if (ret == 0) { + if (extra_delayed_refs_bytes > 0) + btrfs_migrate_to_delayed_refs_rsv(fs_info, + extra_delayed_refs_bytes); + return 0; + } + + if (extra_delayed_refs_bytes > 0) { + bytes -= extra_delayed_refs_bytes; + ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush); + if (ret == 0) + return 0; + } + + /* + * If we are an emergency flush, which can steal from the global block + * reserve, then attempt to not reserve space for the delayed refs, as + * we will consume space for them from the global block reserve. + */ + if (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) { + bytes -= *delayed_refs_bytes; + *delayed_refs_bytes = 0; + ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush); + } + + return ret; +} + static struct btrfs_trans_handle * start_transaction(struct btrfs_root *root, unsigned int num_items, unsigned int type, enum btrfs_reserve_flush_enum flush, @@ -562,10 +625,12 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv; + struct btrfs_block_rsv *trans_rsv = &fs_info->trans_block_rsv; struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; u64 num_bytes = 0; u64 qgroup_reserved = 0; + u64 delayed_refs_bytes = 0; bool reloc_reserved = false; bool do_chunk_alloc = false; int ret; @@ -588,9 +653,6 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, * the appropriate flushing if need be. */ if (num_items && root != fs_info->chunk_root) { - struct btrfs_block_rsv *rsv = &fs_info->trans_block_rsv; - u64 delayed_refs_bytes = 0; - qgroup_reserved = num_items * fs_info->nodesize; /* * Use prealloc for now, as there might be a currently running @@ -602,20 +664,16 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, if (ret) return ERR_PTR(ret); + num_bytes = btrfs_calc_insert_metadata_size(fs_info, num_items); /* - * We want to reserve all the bytes we may need all at once, so - * we only do 1 enospc flushing cycle per transaction start. We - * accomplish this by simply assuming we'll do num_items worth - * of delayed refs updates in this trans handle, and refill that - * amount for whatever is missing in the reserve. + * If we plan to insert/update/delete "num_items" from a btree, + * we will also generate delayed refs for extent buffers in the + * respective btree paths, so reserve space for the delayed refs + * that will be generated by the caller as it modifies btrees. + * Try to reserve them to avoid excessive use of the global + * block reserve. */ - num_bytes = btrfs_calc_insert_metadata_size(fs_info, num_items); - if (flush == BTRFS_RESERVE_FLUSH_ALL && - !btrfs_block_rsv_full(delayed_refs_rsv)) { - delayed_refs_bytes = btrfs_calc_delayed_ref_bytes(fs_info, - num_items); - num_bytes += delayed_refs_bytes; - } + delayed_refs_bytes = btrfs_calc_delayed_ref_bytes(fs_info, num_items); /* * Do the reservation for the relocation root creation @@ -625,18 +683,14 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, reloc_reserved = true; } - ret = btrfs_reserve_metadata_bytes(fs_info, rsv->space_info, - num_bytes, flush); + ret = btrfs_reserve_trans_metadata(fs_info, flush, num_bytes, + &delayed_refs_bytes); if (ret) goto reserve_fail; - if (delayed_refs_bytes) { - btrfs_migrate_to_delayed_refs_rsv(fs_info, - delayed_refs_bytes); - num_bytes -= delayed_refs_bytes; - } - btrfs_block_rsv_add_bytes(rsv, num_bytes, true); - if (rsv->space_info->force_alloc) + btrfs_block_rsv_add_bytes(trans_rsv, num_bytes, true); + + if (trans_rsv->space_info->force_alloc) do_chunk_alloc = true; } else if (num_items == 0 && flush == BTRFS_RESERVE_FLUSH_ALL && !btrfs_block_rsv_full(delayed_refs_rsv)) { @@ -696,6 +750,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, h->type = type; INIT_LIST_HEAD(&h->new_bgs); + btrfs_init_metadata_block_rsv(fs_info, &h->delayed_rsv, BTRFS_BLOCK_RSV_DELOPS); smp_mb(); if (cur_trans->state >= TRANS_STATE_COMMIT_START && @@ -708,8 +763,17 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, if (num_bytes) { trace_btrfs_space_reservation(fs_info, "transaction", h->transid, num_bytes, 1); - h->block_rsv = &fs_info->trans_block_rsv; + h->block_rsv = trans_rsv; h->bytes_reserved = num_bytes; + if (delayed_refs_bytes > 0) { + trace_btrfs_space_reservation(fs_info, + "local_delayed_refs_rsv", + h->transid, + delayed_refs_bytes, 1); + h->delayed_refs_bytes_reserved = delayed_refs_bytes; + btrfs_block_rsv_add_bytes(&h->delayed_rsv, delayed_refs_bytes, true); + delayed_refs_bytes = 0; + } h->reloc_reserved = reloc_reserved; } @@ -765,8 +829,10 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, kmem_cache_free(btrfs_trans_handle_cachep, h); alloc_fail: if (num_bytes) - btrfs_block_rsv_release(fs_info, &fs_info->trans_block_rsv, - num_bytes, NULL); + btrfs_block_rsv_release(fs_info, trans_rsv, num_bytes, NULL); + if (delayed_refs_bytes) + btrfs_space_info_free_bytes_may_use(fs_info, trans_rsv->space_info, + delayed_refs_bytes); reserve_fail: btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved); return ERR_PTR(ret); @@ -987,11 +1053,14 @@ static void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans) if (!trans->block_rsv) { ASSERT(!trans->bytes_reserved); + ASSERT(!trans->delayed_refs_bytes_reserved); return; } - if (!trans->bytes_reserved) + if (!trans->bytes_reserved) { + ASSERT(!trans->delayed_refs_bytes_reserved); return; + } ASSERT(trans->block_rsv == &fs_info->trans_block_rsv); trace_btrfs_space_reservation(fs_info, "transaction", @@ -999,6 +1068,16 @@ static void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans) btrfs_block_rsv_release(fs_info, trans->block_rsv, trans->bytes_reserved, NULL); trans->bytes_reserved = 0; + + if (!trans->delayed_refs_bytes_reserved) + return; + + trace_btrfs_space_reservation(fs_info, "local_delayed_refs_rsv", + trans->transid, + trans->delayed_refs_bytes_reserved, 0); + btrfs_block_rsv_release(fs_info, &trans->delayed_rsv, + trans->delayed_refs_bytes_reserved, NULL); + trans->delayed_refs_bytes_reserved = 0; } static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 474ce34ed02e..4dc68d7ec955 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -117,6 +117,7 @@ enum { struct btrfs_trans_handle { u64 transid; u64 bytes_reserved; + u64 delayed_refs_bytes_reserved; u64 chunk_bytes_reserved; unsigned long delayed_ref_updates; unsigned long delayed_ref_csum_deletions; @@ -139,6 +140,7 @@ struct btrfs_trans_handle { bool in_fsync; struct btrfs_fs_info *fs_info; struct list_head new_bgs; + struct btrfs_block_rsv delayed_rsv; }; /*