From patchwork Fri May 10 17:32:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13661802 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A96E018C1A for ; Fri, 10 May 2024 17:33:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362382; cv=none; b=CufRJ5L1OTj2Rt/xSv0aBC4AJwXwKGoTOjrR23etWekhvGztvL2HGA+nClpcGI+uX7mvzJuGXcF1GypzBJE2ngml9Z4q+D0fZvp8et9g5+y54oxSVoRYxCeqgrMRpHMu7OfwM/qVvuIp4mX3hSPz1WraHyBt7viHm+Q27a9+l7I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362382; c=relaxed/simple; bh=Ry5kcUdSn5LNzem6trWZb9h2YhPN/YVmBMBCPPOeZNc=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=KS8t8DK/gUGLEoZ7PTDevqVVrYrzdr6PAz3uiJpFsyTNcU5OqjcO7bTLYPquAtfLw0Cagp5k+cb/PlZCKWaCiFUtF7upMgHnjKSrdD46qP/dM4FE6l2CE6TA+scMTPjZT1Kp1p0988apT18/ZGCktbYuoT6i2uIzs/+tu3E67Fc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eqS0zrc0; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eqS0zrc0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 07C1FC113CC for ; Fri, 10 May 2024 17:33:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715362382; bh=Ry5kcUdSn5LNzem6trWZb9h2YhPN/YVmBMBCPPOeZNc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=eqS0zrc096CAdMb8RzufV0HJtXYDOnKsoPgXXinWvWp0SL1/kbrYCFp1m27uX9ZfV E3Q2wqMQjVeDol6UR6uY/ZkxmojXP+Ib68/ctkzjxR3i8ERifgrAj4zeZ4Fxbc7+3w SmKdd/Mr4DFbsHYUbZc7hk+S2Zt69G7ATfXFVeUf0ZQZrknd3wyqJ2FKGr5I4yPKSA 0J1Qpx1mZNPxn2zWlyNJ6hP290pP2lPK74twzMFLv7zoK6WPWuZJdHUGbSq/hkNuen d9lKvSFZX6A56l06e67Zc7XoyD5M5EWYhr8lLDXr7FUyDCSpK6JWd6M9pjq+OTaaXy DpLZNvN+xMN7w== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 01/10] btrfs: use an xarray to track open inodes in a root Date: Fri, 10 May 2024 18:32:49 +0100 Message-Id: <3657cd06e219b9f1c0e423fde8a8a7f53bc3e228.1715362104.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana Currently we use a red black tree (rb-tree) to track the currently open inodes of a root (in struct btrfs_root::inode_tree). This however is not very efficient when the number of inodes is large since rb-trees are binary trees. For example for 100K open inodes, the tree has a depth of 17. Besides that, inserting into the tree requires navigating through it and pulling useless cache lines in the process since the red black tree nodes are embedded within the btrfs inode - on the other hand, by being embedded, it requires no extra memory allocations. We can improve this by using an xarray instead, which is efficient when indices are densely clustered (such as inode numbers), is more cache friendly and behaves like a resizable array, with a much better search and insertion complexity than a red black tree. This only has one small disadvantage which is that insertion will sometimes require allocating memory for the xarray - which may fail (not that often since it uses a kmem_cache) - but on the other hand we can reduce the btrfs inode structure size by 24 bytes (from 1080 down to 1056 bytes) after removing the embedded red black tree node, which after the next patches will allow to reduce the size of the structure to 1024 bytes, meaning we will be able to store 4 inodes per 4K page instead of 3 inodes. This change does a straighforward change to use an xarray, and results in a transaction abort if we can't allocate memory for the xarray when creating an inode - but the next patch changes things so that we don't need to abort. Running the following fs_mark test showed some improvements: $ cat test.sh #!/bin/bash DEV=/dev/nullb0 MNT=/mnt/nullb0 MOUNT_OPTIONS="-o ssd" FILES=100000 THREADS=$(nproc --all) echo "performance" | \ tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor mkfs.btrfs -f $DEV mount $MOUNT_OPTIONS $DEV $MNT OPTS="-S 0 -L 5 -n $FILES -s 0 -t $THREADS -k" for ((i = 1; i <= $THREADS; i++)); do OPTS="$OPTS -d $MNT/d$i" done fs_mark $OPTS umount $MNT Before this patch: FSUse% Count Size Files/sec App Overhead 10 1200000 0 92081.6 12505547 16 2400000 0 138222.6 13067072 23 3600000 0 148833.1 13290336 43 4800000 0 97864.7 13931248 53 6000000 0 85597.3 14384313 After this patch: FSUse% Count Size Files/sec App Overhead 10 1200000 0 93225.1 12571078 16 2400000 0 146720.3 12805007 23 3600000 0 160626.4 13073835 46 4800000 0 116286.2 13802927 53 6000000 0 90087.9 14754892 The test was run with a release kernel config (Debian's default config). Also capturing the insertion times into the rb tree and into the xarray, that is measuring the duration of the old function inode_tree_add() and the duration of the new btrfs_add_inode_to_root() function, gave the following results (in nanoseconds): Before this patch, inode_tree_add() execution times: Count: 5000000 Range: 0.000 - 5536887.000; Mean: 775.674; Median: 729.000; Stddev: 4820.961 Percentiles: 90th: 1015.000; 95th: 1139.000; 99th: 1397.000 0.000 - 7.816: 40 | 7.816 - 37.858: 209 | 37.858 - 170.278: 6059 | 170.278 - 753.961: 2754890 ##################################################### 753.961 - 3326.728: 2232312 ########################################### 3326.728 - 14667.018: 4366 | 14667.018 - 64652.943: 852 | 64652.943 - 284981.761: 550 | 284981.761 - 1256150.914: 221 | 1256150.914 - 5536887.000: 7 | After this patch, btrfs_add_inode_to_root() execution times: Count: 5000000 Range: 0.000 - 2900652.000; Mean: 272.148; Median: 241.000; Stddev: 2873.369 Percentiles: 90th: 342.000; 95th: 432.000; 99th: 572.000 0.000 - 7.264: 104 | 7.264 - 33.145: 352 | 33.145 - 140.081: 109606 # 140.081 - 581.930: 4840090 ##################################################### 581.930 - 2407.590: 43532 | 2407.590 - 9950.979: 2245 | 9950.979 - 41119.278: 514 | 41119.278 - 169902.616: 155 | 169902.616 - 702018.539: 47 | 702018.539 - 2900652.000: 9 | Average, percentiles, standard deviation, etc, are all much better. Reviewed-by: Qu Wenruo Signed-off-by: Filipe Manana --- fs/btrfs/btrfs_inode.h | 3 - fs/btrfs/ctree.h | 7 ++- fs/btrfs/disk-io.c | 6 +- fs/btrfs/inode.c | 128 ++++++++++++++++------------------------- 4 files changed, 58 insertions(+), 86 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 91c994b569f3..e577b9745884 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -155,9 +155,6 @@ struct btrfs_inode { */ struct list_head delalloc_inodes; - /* node for the red-black tree that links inodes in subvolume root */ - struct rb_node rb_node; - unsigned long runtime_flags; /* full 64 bit generation number, struct vfs_inode doesn't have a big diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c03c58246033..aa2568f86dc9 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -222,8 +222,11 @@ struct btrfs_root { struct list_head root_list; spinlock_t inode_lock; - /* red-black tree that keeps track of in-memory inodes */ - struct rb_root inode_tree; + /* + * Xarray that keeps track of in-memory inodes, protected by the lock + * @inode_lock. + */ + struct xarray inodes; /* * Xarray that keeps track of delayed nodes of every inode, protected diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a91a8056758a..ed40fe1db53e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -662,7 +662,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, root->free_objectid = 0; root->nr_delalloc_inodes = 0; root->nr_ordered_extents = 0; - root->inode_tree = RB_ROOT; + xa_init(&root->inodes); xa_init(&root->delayed_nodes); btrfs_init_root_block_rsv(root); @@ -1854,7 +1854,8 @@ void btrfs_put_root(struct btrfs_root *root) return; if (refcount_dec_and_test(&root->refs)) { - WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree)); + if (WARN_ON(!xa_empty(&root->inodes))) + xa_destroy(&root->inodes); WARN_ON(test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state)); if (root->anon_dev) free_anon_bdev(root->anon_dev); @@ -1939,7 +1940,6 @@ static int btrfs_init_btree_inode(struct super_block *sb) inode->i_mapping->a_ops = &btree_aops; mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); - RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node); extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree, IO_TREE_BTREE_INODE_IO); extent_map_tree_init(&BTRFS_I(inode)->extent_tree); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d0274324c75a..450fe1582f1d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5493,58 +5493,51 @@ static int fixup_tree_root_location(struct btrfs_fs_info *fs_info, return err; } -static void inode_tree_add(struct btrfs_inode *inode) +static int btrfs_add_inode_to_root(struct btrfs_inode *inode) { struct btrfs_root *root = inode->root; - struct btrfs_inode *entry; - struct rb_node **p; - struct rb_node *parent; - struct rb_node *new = &inode->rb_node; - u64 ino = btrfs_ino(inode); + struct btrfs_inode *existing; + const u64 ino = btrfs_ino(inode); + int ret; if (inode_unhashed(&inode->vfs_inode)) - return; - parent = NULL; + return 0; + + ret = xa_reserve(&root->inodes, ino, GFP_NOFS); + if (ret) + return ret; + spin_lock(&root->inode_lock); - p = &root->inode_tree.rb_node; - while (*p) { - parent = *p; - entry = rb_entry(parent, struct btrfs_inode, rb_node); + existing = xa_store(&root->inodes, ino, inode, GFP_ATOMIC); + spin_unlock(&root->inode_lock); - if (ino < btrfs_ino(entry)) - p = &parent->rb_left; - else if (ino > btrfs_ino(entry)) - p = &parent->rb_right; - else { - WARN_ON(!(entry->vfs_inode.i_state & - (I_WILL_FREE | I_FREEING))); - rb_replace_node(parent, new, &root->inode_tree); - RB_CLEAR_NODE(parent); - spin_unlock(&root->inode_lock); - return; - } + if (xa_is_err(existing)) { + ret = xa_err(existing); + ASSERT(ret != -EINVAL); + ASSERT(ret != -ENOMEM); + return ret; + } else if (existing) { + WARN_ON(!(existing->vfs_inode.i_state & (I_WILL_FREE | I_FREEING))); } - rb_link_node(new, parent, p); - rb_insert_color(new, &root->inode_tree); - spin_unlock(&root->inode_lock); + + return 0; } -static void inode_tree_del(struct btrfs_inode *inode) +static void btrfs_del_inode_from_root(struct btrfs_inode *inode) { struct btrfs_root *root = inode->root; - int empty = 0; + struct btrfs_inode *entry; + bool empty = false; spin_lock(&root->inode_lock); - if (!RB_EMPTY_NODE(&inode->rb_node)) { - rb_erase(&inode->rb_node, &root->inode_tree); - RB_CLEAR_NODE(&inode->rb_node); - empty = RB_EMPTY_ROOT(&root->inode_tree); - } + entry = xa_erase(&root->inodes, btrfs_ino(inode)); + if (entry == inode) + empty = xa_empty(&root->inodes); spin_unlock(&root->inode_lock); if (empty && btrfs_root_refs(&root->root_item) == 0) { spin_lock(&root->inode_lock); - empty = RB_EMPTY_ROOT(&root->inode_tree); + empty = xa_empty(&root->inodes); spin_unlock(&root->inode_lock); if (empty) btrfs_add_dead_root(root); @@ -5613,8 +5606,13 @@ struct inode *btrfs_iget_path(struct super_block *s, u64 ino, ret = btrfs_read_locked_inode(inode, path); if (!ret) { - inode_tree_add(BTRFS_I(inode)); - unlock_new_inode(inode); + ret = btrfs_add_inode_to_root(BTRFS_I(inode)); + if (ret) { + iget_failed(inode); + inode = ERR_PTR(ret); + } else { + unlock_new_inode(inode); + } } else { iget_failed(inode); /* @@ -6426,7 +6424,11 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, } } - inode_tree_add(BTRFS_I(inode)); + ret = btrfs_add_inode_to_root(BTRFS_I(inode)); + if (ret) { + btrfs_abort_transaction(trans, ret); + goto discard; + } trace_btrfs_inode_new(inode); btrfs_set_inode_last_trans(trans, BTRFS_I(inode)); @@ -8466,7 +8468,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->ordered_tree_last = NULL; INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->delayed_iput); - RB_CLEAR_NODE(&ei->rb_node); init_rwsem(&ei->i_mmap_lock); return inode; @@ -8538,7 +8539,7 @@ void btrfs_destroy_inode(struct inode *vfs_inode) } } btrfs_qgroup_check_reserved_leak(inode); - inode_tree_del(inode); + btrfs_del_inode_from_root(inode); btrfs_drop_extent_map_range(inode, 0, (u64)-1, false); btrfs_inode_clear_file_extent_range(inode, 0, (u64)-1); btrfs_put_root(inode->root); @@ -10857,52 +10858,23 @@ void btrfs_assert_inode_range_clean(struct btrfs_inode *inode, u64 start, u64 en */ struct btrfs_inode *btrfs_find_first_inode(struct btrfs_root *root, u64 min_ino) { - struct rb_node *node; - struct rb_node *prev; struct btrfs_inode *inode; + unsigned long from = min_ino; spin_lock(&root->inode_lock); -again: - node = root->inode_tree.rb_node; - prev = NULL; - while (node) { - prev = node; - inode = rb_entry(node, struct btrfs_inode, rb_node); - if (min_ino < btrfs_ino(inode)) - node = node->rb_left; - else if (min_ino > btrfs_ino(inode)) - node = node->rb_right; - else + while (true) { + inode = xa_find(&root->inodes, &from, ULONG_MAX, XA_PRESENT); + if (!inode) + break; + if (igrab(&inode->vfs_inode)) break; - } - - if (!node) { - while (prev) { - inode = rb_entry(prev, struct btrfs_inode, rb_node); - if (min_ino <= btrfs_ino(inode)) { - node = prev; - break; - } - prev = rb_next(prev); - } - } - - while (node) { - inode = rb_entry(prev, struct btrfs_inode, rb_node); - if (igrab(&inode->vfs_inode)) { - spin_unlock(&root->inode_lock); - return inode; - } - - min_ino = btrfs_ino(inode) + 1; - if (cond_resched_lock(&root->inode_lock)) - goto again; - node = rb_next(node); + from = btrfs_ino(inode) + 1; + cond_resched_lock(&root->inode_lock); } spin_unlock(&root->inode_lock); - return NULL; + return inode; } static const struct inode_operations btrfs_dir_inode_operations = { From patchwork Fri May 10 17:32:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13661803 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A5F531C69A for ; Fri, 10 May 2024 17:33:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362383; cv=none; b=BimiZyNI6yTiYTH26vWEITH0V5ELG+KyyDbtPdA1EdI681gadca18f9Z3QlY8i+6/aAxSA2VVSmUtB/jO+gZH3+6mfY6d7vyXtOoWVBWAh2h5zZHaXcP3l7erEP8/pWXtWTS6p1A3+AlU2AmP9uictz+P1vuDGQz8IzlYdWcRbE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362383; c=relaxed/simple; bh=WOJ3VG1kjcqAtvx9qo9VdsR2wUUJ5FXa+6V2LwgdB48=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QcHR7W6C7dY5tC/FIx5BHLeaYY5wWEZXr7RLtJAMyDLi/mEqk+F3tNPzn2Z0sTyP4a+YvkknpmivlN30vOU17YGL0KimQtXL44YVco2f2GB4YpY40DAng+WsDWyT6N447uYnNSqXM8P+oWqrZlu5tI93d1jDo5/Nzi+Dwo3XVfE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Z43RmwPB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Z43RmwPB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 071F3C113CC for ; Fri, 10 May 2024 17:33:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715362383; bh=WOJ3VG1kjcqAtvx9qo9VdsR2wUUJ5FXa+6V2LwgdB48=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Z43RmwPBS16UJc9b4CtguXsBrh9QsfyHXjwVuBmoi+QEN0jRERMzEgyvr+McQNdYU TXmE7WvXOXlHeqy1e/Wu3IGNQhIbUhELx0o+RMJlbFYCW7m9YsLKeJm8qgLkfbKyUl wpYeCP1PfZ59j7CsgZVGSuHQC1cg5Hl2hfPPX8BzvPzJ/EV5zn+JGZWhP0TAfCOcUX WjLty9At7uMZ78iwHnQ1oT73DbErM2miNgoAYo9vs10Y7yJkSMvSvFkFjWSVTDlsMk hVpBjeGaVp0GdYT+UjxdHsEYCmqBWeI+ajDWNtoR9RdcCEVkUK7YrGAy4WDK1uwAVW ysK+3swj9j7Kg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 02/10] btrfs: preallocate inodes xarray entry to avoid transaction abort Date: Fri, 10 May 2024 18:32:50 +0100 Message-Id: <9cf8f37a2b6c61c0532ae060a917f7d232acd5e1.1715362104.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana When creating a new inode, at btrfs_create_new_inode(), one of the very last steps is to add the inode to the root's inodes xarray. This often requires allocating memory which may fail (even though xarrays have a dedicated kmem_cache which make it less likely to fail), and at that point we are forced to abort the current transaction (as some, but not all, of the inode metadata was added to its subvolume btree). To avoid a transaction abort, preallocate memory for the xarray early at btrfs_create_new_inode(), so that if we fail we don't need to abort the transaction and the insertion into the xarray is guaranteed to succeed. Reviewed-by: Qu Wenruo Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 450fe1582f1d..85dbc19c2f6f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5493,7 +5493,7 @@ static int fixup_tree_root_location(struct btrfs_fs_info *fs_info, return err; } -static int btrfs_add_inode_to_root(struct btrfs_inode *inode) +static int btrfs_add_inode_to_root(struct btrfs_inode *inode, bool prealloc) { struct btrfs_root *root = inode->root; struct btrfs_inode *existing; @@ -5503,9 +5503,11 @@ static int btrfs_add_inode_to_root(struct btrfs_inode *inode) if (inode_unhashed(&inode->vfs_inode)) return 0; - ret = xa_reserve(&root->inodes, ino, GFP_NOFS); - if (ret) - return ret; + if (prealloc) { + ret = xa_reserve(&root->inodes, ino, GFP_NOFS); + if (ret) + return ret; + } spin_lock(&root->inode_lock); existing = xa_store(&root->inodes, ino, inode, GFP_ATOMIC); @@ -5606,7 +5608,7 @@ struct inode *btrfs_iget_path(struct super_block *s, u64 ino, ret = btrfs_read_locked_inode(inode, path); if (!ret) { - ret = btrfs_add_inode_to_root(BTRFS_I(inode)); + ret = btrfs_add_inode_to_root(BTRFS_I(inode), true); if (ret) { iget_failed(inode); inode = ERR_PTR(ret); @@ -6237,6 +6239,7 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, struct btrfs_item_batch batch; unsigned long ptr; int ret; + bool xa_reserved = false; path = btrfs_alloc_path(); if (!path) @@ -6251,6 +6254,11 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, goto out; inode->i_ino = objectid; + ret = xa_reserve(&root->inodes, objectid, GFP_NOFS); + if (ret) + goto out; + xa_reserved = true; + if (args->orphan) { /* * O_TMPFILE, set link count to 0, so that after this point, we @@ -6424,8 +6432,9 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, } } - ret = btrfs_add_inode_to_root(BTRFS_I(inode)); - if (ret) { + ret = btrfs_add_inode_to_root(BTRFS_I(inode), false); + if (WARN_ON(ret)) { + /* Shouldn't happen, we used xa_reserve() before. */ btrfs_abort_transaction(trans, ret); goto discard; } @@ -6456,6 +6465,9 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, ihold(inode); discard_new_inode(inode); out: + if (xa_reserved) + xa_release(&root->inodes, objectid); + btrfs_free_path(path); return ret; } From patchwork Fri May 10 17:32:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13661804 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A06E21E525 for ; Fri, 10 May 2024 17:33:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362384; cv=none; b=mn6DaWBb/zrt1cjWngh0O/VN6QAr/1OAxy6743HbL4e5EZIIAlDJ96TAM7jvRzR1dxWTnT4xTtRl/IwfWg+wU5C0hhiZX2khT/rqxIX/uNNaSq6JyY6bbMY5kl3tOjqPgixaacnblLWV60qkg1FBf9B3NWixMuCp0Xz9d02Fgk0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362384; c=relaxed/simple; bh=ScWppywJ/FMRvZpW/hyxffNOL2qRQVWnOywFtCydsQ4=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OcU1v02c9AKsJHBQz2215vK8mG6kdifRbRH7XiyZH+p7SXaN5Spx+tlY/vivaJm+JbzPazND6Gamj2Fmkew3XruKgj4q3HdCAGJlnPYtM7cHngTXJ0Ra9W8c2Oz45zzDpHzivcTW5O0oOgfV7UEZkMLknErJzR5iyVlCaRSEYRQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=AgmRU23g; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AgmRU23g" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 04FD3C113CC for ; Fri, 10 May 2024 17:33:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715362384; bh=ScWppywJ/FMRvZpW/hyxffNOL2qRQVWnOywFtCydsQ4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=AgmRU23gK2xCnyRRlIbiXHrFLY2WTPoaIQTTqKeWMTllM76AlWS5NdGlpPnQD0A4o vXzxnijCEatIL/Xo8EhPTZ5Scp2Mq1dmA4nkrznSYxkxIXrVrI6H3W9KZFIkuqJYwq yODCYDKSxfdCsE5IHRLLukwfOZzO9JKK7GStIC5K2d2oSyieQe6Ujt07f8Gp6A2t7P K2k2MWaJV1KSo7M/Z9RYcKCOamncpr4yNHeKESVJ1xjjLksTuam1A2+8xTnMJUDHcq 4K6LRAbr0xAU6nNFnEUUmNh4iZNMtafkmidCvZTnMghAMIMsRPltiUT0qL8UalcJzs hvbDfpy7OGXUQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 03/10] btrfs: reduce nesting and deduplicate error handling at btrfs_iget_path() Date: Fri, 10 May 2024 18:32:51 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana Make btrfs_iget_path() simpler and easier to read by avoiding nesting of if-then-else statements and having an error label to do all the error handling instead of repeating it a couple times. Reviewed-by: Qu Wenruo Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 44 +++++++++++++++++++++----------------------- 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 85dbc19c2f6f..8ea9fd4c2b66 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5598,37 +5598,35 @@ struct inode *btrfs_iget_path(struct super_block *s, u64 ino, struct btrfs_root *root, struct btrfs_path *path) { struct inode *inode; + int ret; inode = btrfs_iget_locked(s, ino, root); if (!inode) return ERR_PTR(-ENOMEM); - if (inode->i_state & I_NEW) { - int ret; + if (!(inode->i_state & I_NEW)) + return inode; - ret = btrfs_read_locked_inode(inode, path); - if (!ret) { - ret = btrfs_add_inode_to_root(BTRFS_I(inode), true); - if (ret) { - iget_failed(inode); - inode = ERR_PTR(ret); - } else { - unlock_new_inode(inode); - } - } else { - iget_failed(inode); - /* - * ret > 0 can come from btrfs_search_slot called by - * btrfs_read_locked_inode, this means the inode item - * was not found. - */ - if (ret > 0) - ret = -ENOENT; - inode = ERR_PTR(ret); - } - } + ret = btrfs_read_locked_inode(inode, path); + /* + * ret > 0 can come from btrfs_search_slot called by + * btrfs_read_locked_inode(), this means the inode item was not found. + */ + if (ret > 0) + ret = -ENOENT; + if (ret < 0) + goto error; + + ret = btrfs_add_inode_to_root(BTRFS_I(inode), true); + if (ret < 0) + goto error; + + unlock_new_inode(inode); return inode; +error: + iget_failed(inode); + return ERR_PTR(ret); } struct inode *btrfs_iget(struct super_block *s, u64 ino, struct btrfs_root *root) From patchwork Fri May 10 17:32:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13661805 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E62C118030 for ; Fri, 10 May 2024 17:33:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362386; cv=none; b=su7tQuQmC5Ev9nHpRYVQgYRUyv28HosQvFQuHBXOOGOQVSl0R/GbAFbs6YqjNNpYJbFQ2BE+iqbQJZMZMwzevJDmxnmc/W0epM8WGGhRVO8DBZORO1vynVf4ZaGGsNUmXDSzjO0Nlbzi6fb4wre24ZMJz1vNsYSwNUOS5BGHAxA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362386; c=relaxed/simple; bh=nht686NgOLBcl9jBhFVOfg+3/MThlvWU2VNR77qjeys=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=A0X8CE5F7hav7nrukdj1DxXM8E4QxxnCXBcZTSmDQiZ9AK4nUhvlJy4jKM2Xuj679jQw6PAUe8rt0r3ggflRP9z/ZEl3FfyZwH5A9sJF5SnqTsOHLqzvaLYHyIMQMFqlWshSVMRW01t53tlz1DrR8FqdHA2MbeArvMLWMKvyoZY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=i8PhfjuT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="i8PhfjuT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02730C113CC for ; Fri, 10 May 2024 17:33:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715362385; bh=nht686NgOLBcl9jBhFVOfg+3/MThlvWU2VNR77qjeys=; h=From:To:Subject:Date:In-Reply-To:References:From; b=i8PhfjuTKd+6lxMuhfrUUhLU/mfWx2lFG7eOJUCv8bewZRSP+T9v4c5iTEsRIT3od +rBoxvQ8LzfIhnjTetzs3xtzI6N7DuD/BXEivkJxAcWGLuqpNc/5dOjvEO3wZsNn8Z XDyDZtKUa25ptfOHZHK0JTl8/a06Fg/SXOAAIEw08rdGL/wiUjYsQvFD2/3eoNpwdP ta933ZPYqfD7UydYCjxdx6RhgA/LGnaBYj4dGSFoUvvG59EHsZJawS/LCYz/DthWAc 9nXmPdETuH9ZlrepMUdiA2pi7Q9rKUG9KruHX5TSl1KCkjggJ7H8q9KaAXKfNzibQb 7DWxC9OaS25Ig== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 04/10] btrfs: remove inode_lock from struct btrfs_root and use xarray locks Date: Fri, 10 May 2024 18:32:52 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana Currently we use the spinlock inode_lock from struct btrfs_root to serialize access to two different data structures: 1) The delayed inodes xarray (struct btrfs_root::delayed_nodes); 2) The inodes xarray (struct btrfs_root::inodes). Instead of using our own lock, we can use the spinlock that is part of the xarray implementation, by using the xa_lock() and xa_unlock() APIs and using the xarray APIs with the double underscore prefix that don't take the xarray locks and assume the caller is using xa_lock() and xa_unlock(). So remove the spinlock inode_lock from struct btrfs_root and use the corresponding xarray locks. This brings 2 benefits: 1) We reduce the size of struct btrfs_root, from 1336 bytes down to 1328 bytes on a 64 bits release kernel config; 2) We reduce lock contention by not using anymore the same lock for changing two different and unrelated xarrays. Reviewed-by: Qu Wenruo Signed-off-by: Filipe Manana --- fs/btrfs/ctree.h | 1 - fs/btrfs/delayed-inode.c | 26 ++++++++++++-------------- fs/btrfs/disk-io.c | 1 - fs/btrfs/inode.c | 18 ++++++++---------- 4 files changed, 20 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index aa2568f86dc9..1004cb934b4a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -221,7 +221,6 @@ struct btrfs_root { struct list_head root_list; - spinlock_t inode_lock; /* * Xarray that keeps track of in-memory inodes, protected by the lock * @inode_lock. diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 95a0497fa866..40e617c7e8a1 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -77,14 +77,14 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node( return node; } - spin_lock(&root->inode_lock); + xa_lock(&root->delayed_nodes); node = xa_load(&root->delayed_nodes, ino); if (node) { if (btrfs_inode->delayed_node) { refcount_inc(&node->refs); /* can be accessed */ BUG_ON(btrfs_inode->delayed_node != node); - spin_unlock(&root->inode_lock); + xa_unlock(&root->delayed_nodes); return node; } @@ -111,10 +111,10 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node( node = NULL; } - spin_unlock(&root->inode_lock); + xa_unlock(&root->delayed_nodes); return node; } - spin_unlock(&root->inode_lock); + xa_unlock(&root->delayed_nodes); return NULL; } @@ -148,21 +148,21 @@ static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node( kmem_cache_free(delayed_node_cache, node); return ERR_PTR(-ENOMEM); } - spin_lock(&root->inode_lock); + xa_lock(&root->delayed_nodes); ptr = xa_load(&root->delayed_nodes, ino); if (ptr) { /* Somebody inserted it, go back and read it. */ - spin_unlock(&root->inode_lock); + xa_unlock(&root->delayed_nodes); kmem_cache_free(delayed_node_cache, node); node = NULL; goto again; } - ptr = xa_store(&root->delayed_nodes, ino, node, GFP_ATOMIC); + ptr = __xa_store(&root->delayed_nodes, ino, node, GFP_ATOMIC); ASSERT(xa_err(ptr) != -EINVAL); ASSERT(xa_err(ptr) != -ENOMEM); ASSERT(ptr == NULL); btrfs_inode->delayed_node = node; - spin_unlock(&root->inode_lock); + xa_unlock(&root->delayed_nodes); return node; } @@ -275,14 +275,12 @@ static void __btrfs_release_delayed_node( if (refcount_dec_and_test(&delayed_node->refs)) { struct btrfs_root *root = delayed_node->root; - spin_lock(&root->inode_lock); + xa_erase(&root->delayed_nodes, delayed_node->inode_id); /* * Once our refcount goes to zero, nobody is allowed to bump it * back up. We can delete it now. */ ASSERT(refcount_read(&delayed_node->refs) == 0); - xa_erase(&root->delayed_nodes, delayed_node->inode_id); - spin_unlock(&root->inode_lock); kmem_cache_free(delayed_node_cache, delayed_node); } } @@ -2057,9 +2055,9 @@ void btrfs_kill_all_delayed_nodes(struct btrfs_root *root) struct btrfs_delayed_node *node; int count; - spin_lock(&root->inode_lock); + xa_lock(&root->delayed_nodes); if (xa_empty(&root->delayed_nodes)) { - spin_unlock(&root->inode_lock); + xa_unlock(&root->delayed_nodes); return; } @@ -2076,7 +2074,7 @@ void btrfs_kill_all_delayed_nodes(struct btrfs_root *root) if (count >= ARRAY_SIZE(delayed_nodes)) break; } - spin_unlock(&root->inode_lock); + xa_unlock(&root->delayed_nodes); index++; for (int i = 0; i < count; i++) { diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ed40fe1db53e..d20e400a9ce3 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -674,7 +674,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(&root->ordered_extents); INIT_LIST_HEAD(&root->ordered_root); INIT_LIST_HEAD(&root->reloc_dirty_list); - spin_lock_init(&root->inode_lock); spin_lock_init(&root->delalloc_lock); spin_lock_init(&root->ordered_extent_lock); spin_lock_init(&root->accounting_lock); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8ea9fd4c2b66..4fd41d6b377f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5509,9 +5509,7 @@ static int btrfs_add_inode_to_root(struct btrfs_inode *inode, bool prealloc) return ret; } - spin_lock(&root->inode_lock); existing = xa_store(&root->inodes, ino, inode, GFP_ATOMIC); - spin_unlock(&root->inode_lock); if (xa_is_err(existing)) { ret = xa_err(existing); @@ -5531,16 +5529,16 @@ static void btrfs_del_inode_from_root(struct btrfs_inode *inode) struct btrfs_inode *entry; bool empty = false; - spin_lock(&root->inode_lock); - entry = xa_erase(&root->inodes, btrfs_ino(inode)); + xa_lock(&root->inodes); + entry = __xa_erase(&root->inodes, btrfs_ino(inode)); if (entry == inode) empty = xa_empty(&root->inodes); - spin_unlock(&root->inode_lock); + xa_unlock(&root->inodes); if (empty && btrfs_root_refs(&root->root_item) == 0) { - spin_lock(&root->inode_lock); + xa_lock(&root->inodes); empty = xa_empty(&root->inodes); - spin_unlock(&root->inode_lock); + xa_unlock(&root->inodes); if (empty) btrfs_add_dead_root(root); } @@ -10871,7 +10869,7 @@ struct btrfs_inode *btrfs_find_first_inode(struct btrfs_root *root, u64 min_ino) struct btrfs_inode *inode; unsigned long from = min_ino; - spin_lock(&root->inode_lock); + xa_lock(&root->inodes); while (true) { inode = xa_find(&root->inodes, &from, ULONG_MAX, XA_PRESENT); if (!inode) @@ -10880,9 +10878,9 @@ struct btrfs_inode *btrfs_find_first_inode(struct btrfs_root *root, u64 min_ino) break; from = btrfs_ino(inode) + 1; - cond_resched_lock(&root->inode_lock); + cond_resched_lock(&root->inodes.xa_lock); } - spin_unlock(&root->inode_lock); + xa_unlock(&root->inodes); return inode; } From patchwork Fri May 10 17:32:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13661806 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7EF4224FD for ; Fri, 10 May 2024 17:33:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362387; cv=none; b=b7cs5OmoQ7u9NUBW0Q4vnYTtciTMH9JcSCvFy3VNkzbIwQ3Oc0fQLC/8L/7eIIG6XGAhTc4FNtS4//NLPAD9gF1+kjUpDMNSuzRzjQ4F6IPW19GV5RT1OyWS5odT527AhYobkT9zUEGL4ltxz0FFtqzhViGKctO42XQJRrMWTS0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362387; c=relaxed/simple; bh=1LIR6O1k4LPU/iJf74HPIguL4C3IaiCN2osKdZTI67s=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hICbWF6T3DCLpnUwaYxxr2ratjuPZgR9afvTzlRonIBe4IgW2v6XAiKgXfibuXXurnkavd8jZxy/Cpe6MWvUoGDRK6s8mVegIaee296oUYuH57sl8rqUo1oxGhtjvgism3xBxg7egIEZH1sWAZpsSchGW3MCki9EC9BbVIsP0GQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=AhxfEhAn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AhxfEhAn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 001CBC2BBFC for ; Fri, 10 May 2024 17:33:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715362386; bh=1LIR6O1k4LPU/iJf74HPIguL4C3IaiCN2osKdZTI67s=; h=From:To:Subject:Date:In-Reply-To:References:From; b=AhxfEhAnCudtXavVsdvzIHfKatmBbFcK3zkzaRF4mLz6X700dJwM64pLzgKf9YSaK yNjVAnrCQdkwpmNmLyncSiLqbBvrpZLVfFzlk3qVIiKrwJVeo+dMXsaYYbUFgD+q8u oFRGst7wpaeWr3Rq5/9k9cRFngxNU/5EX3Ko3/Ev9QxVsGNCkkJj69zYi8GUd/31wx CeV83cEMi9ylF8Q6E2TU5tsq4YoPojrCwaUDH2mW8/4ze0u9Qwsb4Cv9rYB40JDpWf b3tveKP0dnMcOnPQPpCFlm1ixba1VsbaIR4Ayusdb0jHlXbtPGzmo5vgIYOnmX4wet 5q7HUpHae/xPQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 05/10] btrfs: unify index_cnt and csum_bytes from struct btrfs_inode Date: Fri, 10 May 2024 18:32:53 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana The index_cnt field of struct btrfs_inode is used only for two purposes: 1) To store the index for the next entry added to a directory; 2) For the data relocation inode to track the logical start address of the block group currently being relocated. For the relocation case we use index_cnt because it's not used for anything else in the relocation use case - we could have used other fields that are not used by relocation such as defrag_bytes, last_unlink_trans or last_reflink_trans for example (amongs others). Since the csum_bytes field is not used for directories, do the following changes: 1) Put index_cnt and csum_bytes in a union, and index_cnt is only initialized when the inode is a directory. The csum_bytes is only accessed in IO paths for regular files, so we're fine here; 2) Use the defrag_bytes field for relocation, since the data relocation inode is never used for defrag purposes. And to make the naming better, alias it to reloc_block_group_start by using a union. This reduces the size of struct btrfs_inode by 8 bytes in a release kernel, from 1056 bytes down to 1048 bytes. Reviewed-by: Qu Wenruo Signed-off-by: Filipe Manana --- fs/btrfs/btrfs_inode.h | 46 +++++++++++++++++++++++++--------------- fs/btrfs/delayed-inode.c | 3 ++- fs/btrfs/inode.c | 21 ++++++++++++------ fs/btrfs/relocation.c | 12 +++++------ fs/btrfs/tree-log.c | 3 ++- 5 files changed, 54 insertions(+), 31 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index e577b9745884..19bb3d057414 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -215,11 +215,20 @@ struct btrfs_inode { u64 last_dir_index_offset; }; - /* - * Total number of bytes pending defrag, used by stat to check whether - * it needs COW. Protected by 'lock'. - */ - u64 defrag_bytes; + union { + /* + * Total number of bytes pending defrag, used by stat to check whether + * it needs COW. Protected by 'lock'. + * Used by inodes other than the data relocation inode. + */ + u64 defrag_bytes; + + /* + * Logical address of the block group being relocated. + * Used only by the data relocation inode. + */ + u64 reloc_block_group_start; + }; /* * The size of the file stored in the metadata on disk. data=ordered @@ -228,12 +237,21 @@ struct btrfs_inode { */ u64 disk_i_size; - /* - * If this is a directory then index_cnt is the counter for the index - * number for new files that are created. For an empty directory, this - * must be initialized to BTRFS_DIR_START_INDEX. - */ - u64 index_cnt; + union { + /* + * If this is a directory then index_cnt is the counter for the + * index number for new files that are created. For an empty + * directory, this must be initialized to BTRFS_DIR_START_INDEX. + */ + u64 index_cnt; + + /* + * If this is not a directory, this is the number of bytes + * outstanding that are going to need csums. This is used in + * ENOSPC accounting. Protected by 'lock'. + */ + u64 csum_bytes; + }; /* Cache the directory index number to speed the dir/file remove */ u64 dir_index; @@ -256,12 +274,6 @@ struct btrfs_inode { */ u64 last_reflink_trans; - /* - * Number of bytes outstanding that are going to need csums. This is - * used in ENOSPC accounting. Protected by 'lock'. - */ - u64 csum_bytes; - /* Backwards incompatible flags, lower half of inode_item::flags */ u32 flags; /* Read-only compatibility flags, upper half of inode_item::flags */ diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 40e617c7e8a1..483c141dc488 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1914,7 +1914,8 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev) BTRFS_I(inode)->i_otime_nsec = btrfs_stack_timespec_nsec(&inode_item->otime); inode->i_generation = BTRFS_I(inode)->generation; - BTRFS_I(inode)->index_cnt = (u64)-1; + if (S_ISDIR(inode->i_mode)) + BTRFS_I(inode)->index_cnt = (u64)-1; mutex_unlock(&delayed_node->mutex); btrfs_release_delayed_node(delayed_node); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4fd41d6b377f..9b98aa65cc63 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3856,7 +3856,9 @@ static int btrfs_read_locked_inode(struct inode *inode, inode->i_rdev = 0; rdev = btrfs_inode_rdev(leaf, inode_item); - BTRFS_I(inode)->index_cnt = (u64)-1; + if (S_ISDIR(inode->i_mode)) + BTRFS_I(inode)->index_cnt = (u64)-1; + btrfs_inode_split_flags(btrfs_inode_flags(leaf, inode_item), &BTRFS_I(inode)->flags, &BTRFS_I(inode)->ro_flags); @@ -6268,8 +6270,10 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, if (ret) goto out; } - /* index_cnt is ignored for everything but a dir. */ - BTRFS_I(inode)->index_cnt = BTRFS_DIR_START_INDEX; + + if (S_ISDIR(inode->i_mode)) + BTRFS_I(inode)->index_cnt = BTRFS_DIR_START_INDEX; + BTRFS_I(inode)->generation = trans->transid; inode->i_generation = BTRFS_I(inode)->generation; @@ -8435,8 +8439,12 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->disk_i_size = 0; ei->flags = 0; ei->ro_flags = 0; + /* + * ->index_cnt will be propertly initialized later when creating a new + * inode (btrfs_create_new_inode()) or when reading an existing inode + * from disk (btrfs_read_locked_inode()). + */ ei->csum_bytes = 0; - ei->index_cnt = (u64)-1; ei->dir_index = 0; ei->last_unlink_trans = 0; ei->last_reflink_trans = 0; @@ -8511,9 +8519,10 @@ void btrfs_destroy_inode(struct inode *vfs_inode) if (!S_ISDIR(vfs_inode->i_mode)) { WARN_ON(inode->delalloc_bytes); WARN_ON(inode->new_delalloc_bytes); + WARN_ON(inode->csum_bytes); } - WARN_ON(inode->csum_bytes); - WARN_ON(inode->defrag_bytes); + if (!root || !btrfs_is_data_reloc_root(root)) + WARN_ON(inode->defrag_bytes); /* * This can happen where we create an inode, but somebody else also diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 8b24bb5a0aa1..9f35524b6664 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -962,7 +962,7 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr, if (!path) return -ENOMEM; - bytenr -= BTRFS_I(reloc_inode)->index_cnt; + bytenr -= BTRFS_I(reloc_inode)->reloc_block_group_start; ret = btrfs_lookup_file_extent(NULL, root, path, btrfs_ino(BTRFS_I(reloc_inode)), bytenr, 0); if (ret < 0) @@ -2797,7 +2797,7 @@ static noinline_for_stack int prealloc_file_extent_cluster( u64 alloc_hint = 0; u64 start; u64 end; - u64 offset = inode->index_cnt; + u64 offset = inode->reloc_block_group_start; u64 num_bytes; int nr; int ret = 0; @@ -2951,7 +2951,7 @@ static int relocate_one_folio(struct inode *inode, struct file_ra_state *ra, int *cluster_nr, unsigned long index) { struct btrfs_fs_info *fs_info = inode_to_fs_info(inode); - u64 offset = BTRFS_I(inode)->index_cnt; + u64 offset = BTRFS_I(inode)->reloc_block_group_start; const unsigned long last_index = (cluster->end - offset) >> PAGE_SHIFT; gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping); struct folio *folio; @@ -3086,7 +3086,7 @@ static int relocate_one_folio(struct inode *inode, struct file_ra_state *ra, static int relocate_file_extent_cluster(struct inode *inode, const struct file_extent_cluster *cluster) { - u64 offset = BTRFS_I(inode)->index_cnt; + u64 offset = BTRFS_I(inode)->reloc_block_group_start; unsigned long index; unsigned long last_index; struct file_ra_state *ra; @@ -3915,7 +3915,7 @@ static noinline_for_stack struct inode *create_reloc_inode( inode = NULL; goto out; } - BTRFS_I(inode)->index_cnt = group->start; + BTRFS_I(inode)->reloc_block_group_start = group->start; ret = btrfs_orphan_add(trans, BTRFS_I(inode)); out: @@ -4395,7 +4395,7 @@ int btrfs_reloc_clone_csums(struct btrfs_ordered_extent *ordered) { struct btrfs_inode *inode = BTRFS_I(ordered->inode); struct btrfs_fs_info *fs_info = inode->root->fs_info; - u64 disk_bytenr = ordered->file_offset + inode->index_cnt; + u64 disk_bytenr = ordered->file_offset + inode->reloc_block_group_start; struct btrfs_root *csum_root = btrfs_csum_root(fs_info, disk_bytenr); LIST_HEAD(list); int ret; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 5146387b416b..0aee43466c52 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -1625,7 +1625,8 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans, if (ret) goto out; } - BTRFS_I(inode)->index_cnt = (u64)-1; + if (S_ISDIR(inode->i_mode)) + BTRFS_I(inode)->index_cnt = (u64)-1; if (inode->i_nlink == 0) { if (S_ISDIR(inode->i_mode)) { From patchwork Fri May 10 17:32:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13661807 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 99D4E28DCB for ; Fri, 10 May 2024 17:33:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362387; cv=none; b=eSlwhEsz4C8nc6PqVgsAArJkywDEtwIyMade8qSED6cFd/7UoDo5mBhZGilZEvaBd+I2z/Jhmo+GgSaOIOwH/GveRg6YxfXvtm5LrMnDEEnaEInX5IvtidemjkSEp0bce38wyf7Adil3uC0BfIKh7J2v7daaXB0mZe8oyGO49Zc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362387; c=relaxed/simple; bh=qeRI9ygVXjE/voqZyUI6g2DsF6u23+3L02mAh6CVDB0=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=TD5tvoqClAo9NMUrl0nutpMF+P6e63CC6tGsQf0v8cdh2A62/umqsQGDNjbSvD9SF/WesPmu1/O22ODX/bAAh0zFQ7tZfjfV2EGO8hH8at6pQQjwK9tUJTvTeyuB3Hx3dsak4N9Jkame8XONekSjEbDIcXUVcrPp9W/jssXfp58= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tfkmneVv; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tfkmneVv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F0F3FC2BD10 for ; Fri, 10 May 2024 17:33:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715362387; bh=qeRI9ygVXjE/voqZyUI6g2DsF6u23+3L02mAh6CVDB0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=tfkmneVvB9rnOzRf3OG79YxB1hk/iR4k7bx0IBvBWa7ruPH9man76MQJhk9HebkLH AC5vZSMJ0zqGpqMdLfLfdFPKFHuAwmgagqU2H/saG3T639KBg3eQOpZFwu8gOAbHqE YDk3YQORetNEN5XOtxiZuPwQXb5c1UDlCmFEsZD4Xznpfptwb9jA+FMTfAOUnG5Zs5 6BGOUNvS1+Jb6UUvpT3Rz2a1l0IJpcJrZLXiE7id6dStnh9YmSKMuZfq21cWDaNnFI c3g/89k3gCl5CMG70FsU+FKgMdtBxvErTXgi6EOUUYJ94IZtoe3KCSjd77/TsnEC0b XJi/b1LAGctqA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 06/10] btrfs: don't allocate file extent tree for non regular files Date: Fri, 10 May 2024 18:32:54 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana When not using the NO_HOLES feature we always allocate an io tree for an inode's file_extent_tree. This is wasteful because that io tree is only used for regular files, so we allocate more memory than needed for inodes that represent directories or symlinks for example, or for inodes that correspond to free space inodes. So improve on this by allocating the io tree only for inodes of regular files that are not free space inodes. Reviewed-by: Qu Wenruo Signed-off-by: Filipe Manana --- fs/btrfs/file-item.c | 13 ++++++----- fs/btrfs/inode.c | 53 +++++++++++++++++++++++++++++--------------- 2 files changed, 42 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index bce95f871750..f3ed78e21fa4 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -45,13 +45,12 @@ */ void btrfs_inode_safe_disk_i_size_write(struct btrfs_inode *inode, u64 new_i_size) { - struct btrfs_fs_info *fs_info = inode->root->fs_info; u64 start, end, i_size; int ret; spin_lock(&inode->lock); i_size = new_i_size ?: i_size_read(&inode->vfs_inode); - if (btrfs_fs_incompat(fs_info, NO_HOLES)) { + if (!inode->file_extent_tree) { inode->disk_i_size = i_size; goto out_unlock; } @@ -84,13 +83,14 @@ void btrfs_inode_safe_disk_i_size_write(struct btrfs_inode *inode, u64 new_i_siz int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start, u64 len) { + if (!inode->file_extent_tree) + return 0; + if (len == 0) return 0; ASSERT(IS_ALIGNED(start + len, inode->root->fs_info->sectorsize)); - if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES)) - return 0; return set_extent_bit(inode->file_extent_tree, start, start + len - 1, EXTENT_DIRTY, NULL); } @@ -112,14 +112,15 @@ int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start, int btrfs_inode_clear_file_extent_range(struct btrfs_inode *inode, u64 start, u64 len) { + if (!inode->file_extent_tree) + return 0; + if (len == 0) return 0; ASSERT(IS_ALIGNED(start + len, inode->root->fs_info->sectorsize) || len == (u64)-1); - if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES)) - return 0; return clear_extent_bit(inode->file_extent_tree, start, start + len - 1, EXTENT_DIRTY, NULL); } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 9b98aa65cc63..175fd007f0ef 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3781,6 +3781,30 @@ static noinline int acls_after_inode_item(struct extent_buffer *leaf, return 1; } +static int btrfs_init_file_extent_tree(struct btrfs_inode *inode) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + + if (WARN_ON_ONCE(inode->file_extent_tree)) + return 0; + if (btrfs_fs_incompat(fs_info, NO_HOLES)) + return 0; + if (!S_ISREG(inode->vfs_inode.i_mode)) + return 0; + if (btrfs_is_free_space_inode(inode)) + return 0; + + inode->file_extent_tree = kmalloc(sizeof(struct extent_io_tree), GFP_KERNEL); + if (!inode->file_extent_tree) + return -ENOMEM; + + extent_io_tree_init(fs_info, inode->file_extent_tree, IO_TREE_INODE_FILE_EXTENT); + /* Lockdep class is set only for the file extent tree. */ + lockdep_set_class(&inode->file_extent_tree->lock, &file_extent_tree_class); + + return 0; +} + /* * read an inode from the btree into the in-memory inode */ @@ -3800,6 +3824,10 @@ static int btrfs_read_locked_inode(struct inode *inode, bool filled = false; int first_xattr_slot; + ret = btrfs_init_file_extent_tree(BTRFS_I(inode)); + if (ret) + return ret; + ret = btrfs_fill_inode(inode, &rdev); if (!ret) filled = true; @@ -6247,6 +6275,10 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, BTRFS_I(inode)->root = btrfs_grab_root(BTRFS_I(dir)->root); root = BTRFS_I(inode)->root; + ret = btrfs_init_file_extent_tree(BTRFS_I(inode)); + if (ret) + goto out; + ret = btrfs_get_free_objectid(root, &objectid); if (ret) goto out; @@ -8413,20 +8445,10 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) struct btrfs_fs_info *fs_info = btrfs_sb(sb); struct btrfs_inode *ei; struct inode *inode; - struct extent_io_tree *file_extent_tree = NULL; - - /* Self tests may pass a NULL fs_info. */ - if (fs_info && !btrfs_fs_incompat(fs_info, NO_HOLES)) { - file_extent_tree = kmalloc(sizeof(struct extent_io_tree), GFP_KERNEL); - if (!file_extent_tree) - return NULL; - } ei = alloc_inode_sb(sb, btrfs_inode_cachep, GFP_KERNEL); - if (!ei) { - kfree(file_extent_tree); + if (!ei) return NULL; - } ei->root = NULL; ei->generation = 0; @@ -8471,13 +8493,8 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) extent_io_tree_init(fs_info, &ei->io_tree, IO_TREE_INODE_IO); ei->io_tree.inode = ei; - ei->file_extent_tree = file_extent_tree; - if (file_extent_tree) { - extent_io_tree_init(fs_info, ei->file_extent_tree, - IO_TREE_INODE_FILE_EXTENT); - /* Lockdep class is set only for the file extent tree. */ - lockdep_set_class(&ei->file_extent_tree->lock, &file_extent_tree_class); - } + ei->file_extent_tree = NULL; + mutex_init(&ei->log_mutex); spin_lock_init(&ei->ordered_tree_lock); ei->ordered_tree = RB_ROOT; From patchwork Fri May 10 17:32:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13661808 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E67BF3A29A for ; Fri, 10 May 2024 17:33:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362389; cv=none; b=I73T/NmvLvPf6psjtlxlWn1SCo/qsqZsGUMHBVDMX21oJXAxr2qw/g9GJqTKfxf0kE6XN+T9WU7ZMhIHqXcO+1etqyH4O3zNZk9+1LWzJaGMoPv6SaSuijo0TB7o/oAHe2wHUUXmy8nniZ3UY9hwDFbo27XPt7FkAokxUvTa4VA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362389; c=relaxed/simple; bh=6zLSV9xXL29aVT+UvddcLeK3WVRFJD64tesDN8nR7AA=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=DTHig6BTOnmBEFvg1eJ8t2CIyJD85XnKr4Mip6F6NR0h+EF016QexopvUUOsgiCAxKAkR5I+rWcJvMPPHd0EW+R0JIMZD3wcEakFM5UvUi7e2u9SAlqODjJegpVXA/N2YbxogG5DhQnT+TM7blNebtYQ+QarLp8A/TtQw/LegJU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=loJnCzLs; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="loJnCzLs" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EEBD2C113CC for ; Fri, 10 May 2024 17:33:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715362388; bh=6zLSV9xXL29aVT+UvddcLeK3WVRFJD64tesDN8nR7AA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=loJnCzLsiz+410sQP05uzmRnbFMB7qo48MxCU4PyDNIXf9RTCIy/+CJYwunJYg0cO M5ArKXrnDWQ04aNzuDG/wSvfc9k0EvYXsCVoHZAyjP1gtQh2tNKQormN7jQfKASkad dmRM9JfJm4mJwMNNPeUO9UxS6kgImD7y+lX50FzpsaTwrcErywiUHRViFdmf/Mti60 gpKR6bUNnzRSJEGB0OYqr3pzGs5qiswdIavZ0Y3UYgfQBQBakhP6EsWsAYZg0a3uCV gPgna+a9xzZmOJmo7/CqVdnJ1VU5hjqUI+0iGhiNCw5iinVYEW6ihI+4iNKPt1pWIr kLMylPgSTUswg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 07/10] btrfs: remove location key from struct btrfs_inode Date: Fri, 10 May 2024 18:32:55 +0100 Message-Id: <856d3b985011d3dc5dfe9045fb687e995b596e81.1715362104.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana Currently struct btrfs_inode has a key member, named "location", that is either: 1) The key of the inode's item. In this case the objectid is the number of the inode; 2) A key stored in a dir entry with a type of BTRFS_ROOT_ITEM_KEY, for the case where we have a root that is a snapshot of a subvolume that points to other subvolumes. In this case the objectid is the ID of a subvolume inside the snapshotted parent subvolume. The key is only used to lookup the inode item for the first case, while for the second it's never used since it corresponds to directory stubs created with new_simple_dir() and which are marked as dummy, so there's no actual inode item to ever update. In the second case we only check the key type at btrfs_ino() for 32 bits platforms and its objectid is only needed for unlink. Instead of using a key we can do fine with just the objectid, since we can generate the key whenever we need it having only the objectid, as in all use cases the type is always BTRFS_INODE_ITEM_KEY and the offset is always 0. So use only an objectid instead of a full key. This reduces the size of struct btrfs_inode from 1048 bytes down to 1040 bytes on a release kernel. Signed-off-by: Filipe Manana --- fs/btrfs/btrfs_inode.h | 47 +++++++++++++++++++++++++++++++----- fs/btrfs/disk-io.c | 4 +-- fs/btrfs/export.c | 2 +- fs/btrfs/inode.c | 25 +++++++++---------- fs/btrfs/ioctl.c | 8 +++--- fs/btrfs/tests/btrfs-tests.c | 4 +-- fs/btrfs/tree-log.c | 6 +++-- 7 files changed, 63 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 19bb3d057414..fa2f91396ae0 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -89,6 +89,29 @@ enum { BTRFS_INODE_FREE_SPACE_INODE, /* Set when there are no capabilities in XATTs for the inode. */ BTRFS_INODE_NO_CAP_XATTR, + /* + * Indicate this is a directory that points to a subvolume for which + * there is no root reference item. That's a case like the following: + * + * $ btrfs subvolume create /mnt/parent + * $ btrfs subvolume create /mnt/parent/child + * $ btrfs subvolume snapshot /mnt/parent /mnt/snap + * + * If subvolume "parent" is root 256, subvolume "child" is root 257 and + * snapshot "snap" is root 258, then there's no root reference item (key + * BTRFS_ROOT_REF_KEY in the root tree) for the subvolume "child" + * associated to root 258 (the snapshot) - there's only for the root + * of the "parent" subvolume (root 256). In the chunk root we have a + * (256 BTRFS_ROOT_REF_KEY 257) key but we don't have a + * (258 BTRFS_ROOT_REF_KEY 257) key - the sames goes for backrefs, we + * have a (257 BTRFS_ROOT_BACKREF_KEY 256) but we don't have a + * (257 BTRFS_ROOT_BACKREF_KEY 258) key. + * + * So when opening the "child" dentry from the snapshot's directory, + * we don't find a root ref item and we create a stub inode. This is + * done at new_simple_dir(), called from btrfs_lookup_dentry(). + */ + BTRFS_INODE_ROOT_STUB, }; /* in memory btrfs inode */ @@ -96,10 +119,15 @@ struct btrfs_inode { /* which subvolume this inode belongs to */ struct btrfs_root *root; - /* key used to find this inode on disk. This is used by the code - * to read in roots of subvolumes + /* + * This is either: + * + * 1) The objectid of the corresponding BTRFS_INODE_ITEM_KEY; + * + * 2) In case this a root stub inode (BTRFS_INODE_ROOT_STUB flag set), + * the ID of that root. */ - struct btrfs_key location; + u64 objectid; /* Cached value of inode property 'compression'. */ u8 prop_compress; @@ -330,10 +358,9 @@ static inline unsigned long btrfs_inode_hash(u64 objectid, */ static inline u64 btrfs_ino(const struct btrfs_inode *inode) { - u64 ino = inode->location.objectid; + u64 ino = inode->objectid; - /* type == BTRFS_ROOT_ITEM_KEY: subvol dir */ - if (inode->location.type == BTRFS_ROOT_ITEM_KEY) + if (test_bit(BTRFS_INODE_ROOT_STUB, &inode->runtime_flags)) ino = inode->vfs_inode.i_ino; return ino; } @@ -347,6 +374,14 @@ static inline u64 btrfs_ino(const struct btrfs_inode *inode) #endif +static inline void btrfs_get_inode_key(const struct btrfs_inode *inode, + struct btrfs_key *key) +{ + key->objectid = inode->objectid; + key->type = BTRFS_INODE_ITEM_KEY; + key->offset = 0; +} + static inline void btrfs_i_size_write(struct btrfs_inode *inode, u64 size) { i_size_write(&inode->vfs_inode, size); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index d20e400a9ce3..e3edaf510108 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1944,9 +1944,7 @@ static int btrfs_init_btree_inode(struct super_block *sb) extent_map_tree_init(&BTRFS_I(inode)->extent_tree); BTRFS_I(inode)->root = btrfs_grab_root(fs_info->tree_root); - BTRFS_I(inode)->location.objectid = BTRFS_BTREE_INODE_OBJECTID; - BTRFS_I(inode)->location.type = 0; - BTRFS_I(inode)->location.offset = 0; + BTRFS_I(inode)->objectid = BTRFS_BTREE_INODE_OBJECTID; set_bit(BTRFS_INODE_DUMMY, &BTRFS_I(inode)->runtime_flags); __insert_inode_hash(inode, hash); fs_info->btree_inode = inode; diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c index 9e81f89e76d8..5526e25ebb3f 100644 --- a/fs/btrfs/export.c +++ b/fs/btrfs/export.c @@ -40,7 +40,7 @@ static int btrfs_encode_fh(struct inode *inode, u32 *fh, int *max_len, if (parent) { u64 parent_root_id; - fid->parent_objectid = BTRFS_I(parent)->location.objectid; + fid->parent_objectid = BTRFS_I(parent)->objectid; fid->parent_gen = parent->i_generation; parent_root_id = btrfs_root_id(BTRFS_I(parent)->root); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 175fd007f0ef..44dc82ff96db 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3838,7 +3838,7 @@ static int btrfs_read_locked_inode(struct inode *inode, return -ENOMEM; } - memcpy(&location, &BTRFS_I(inode)->location, sizeof(location)); + btrfs_get_inode_key(BTRFS_I(inode), &location); ret = btrfs_lookup_inode(NULL, root, path, &location, 0); if (ret) { @@ -4068,13 +4068,15 @@ static noinline int btrfs_update_inode_item(struct btrfs_trans_handle *trans, struct btrfs_inode_item *inode_item; struct btrfs_path *path; struct extent_buffer *leaf; + struct btrfs_key key; int ret; path = btrfs_alloc_path(); if (!path) return -ENOMEM; - ret = btrfs_lookup_inode(trans, inode->root, path, &inode->location, 1); + btrfs_get_inode_key(inode, &key); + ret = btrfs_lookup_inode(trans, inode->root, path, &key, 1); if (ret) { if (ret > 0) ret = -ENOENT; @@ -4338,7 +4340,7 @@ static int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, if (btrfs_ino(inode) == BTRFS_FIRST_FREE_OBJECTID) { objectid = btrfs_root_id(inode->root); } else if (btrfs_ino(inode) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID) { - objectid = inode->location.objectid; + objectid = inode->objectid; } else { WARN_ON(1); fscrypt_free_filename(&fname); @@ -5580,9 +5582,7 @@ static int btrfs_init_locked_inode(struct inode *inode, void *p) struct btrfs_iget_args *args = p; inode->i_ino = args->ino; - BTRFS_I(inode)->location.objectid = args->ino; - BTRFS_I(inode)->location.type = BTRFS_INODE_ITEM_KEY; - BTRFS_I(inode)->location.offset = 0; + BTRFS_I(inode)->objectid = args->ino; BTRFS_I(inode)->root = btrfs_grab_root(args->root); if (args->root && args->root == args->root->fs_info->tree_root && @@ -5596,7 +5596,7 @@ static int btrfs_find_actor(struct inode *inode, void *opaque) { struct btrfs_iget_args *args = opaque; - return args->ino == BTRFS_I(inode)->location.objectid && + return args->ino == BTRFS_I(inode)->objectid && args->root == BTRFS_I(inode)->root; } @@ -5673,7 +5673,8 @@ static struct inode *new_simple_dir(struct inode *dir, return ERR_PTR(-ENOMEM); BTRFS_I(inode)->root = btrfs_grab_root(root); - memcpy(&BTRFS_I(inode)->location, key, sizeof(*key)); + BTRFS_I(inode)->objectid = key->objectid; + set_bit(BTRFS_INODE_ROOT_STUB, &BTRFS_I(inode)->runtime_flags); set_bit(BTRFS_INODE_DUMMY, &BTRFS_I(inode)->runtime_flags); inode->i_ino = BTRFS_EMPTY_SUBVOL_DIR_OBJECTID; @@ -6149,7 +6150,7 @@ static int btrfs_insert_inode_locked(struct inode *inode) { struct btrfs_iget_args args; - args.ino = BTRFS_I(inode)->location.objectid; + args.ino = BTRFS_I(inode)->objectid; args.root = BTRFS_I(inode)->root; return insert_inode_locked4(inode, @@ -6256,7 +6257,6 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info = inode_to_fs_info(dir); struct btrfs_root *root; struct btrfs_inode_item *inode_item; - struct btrfs_key *location; struct btrfs_path *path; u64 objectid; struct btrfs_inode_ref *ref; @@ -6332,10 +6332,7 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, BTRFS_INODE_NODATASUM; } - location = &BTRFS_I(inode)->location; - location->objectid = objectid; - location->offset = 0; - location->type = BTRFS_INODE_ITEM_KEY; + BTRFS_I(inode)->objectid = objectid; ret = btrfs_insert_inode_locked(inode); if (ret < 0) { diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 28df28e50ad9..79a5ccb27b92 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1918,7 +1918,7 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap, { struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; struct super_block *sb = inode->i_sb; - struct btrfs_key upper_limit = BTRFS_I(inode)->location; + u64 upper_limit = BTRFS_I(inode)->objectid; u64 treeid = btrfs_root_id(BTRFS_I(inode)->root); u64 dirid = args->dirid; unsigned long item_off; @@ -1944,7 +1944,7 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap, * If the bottom subvolume does not exist directly under upper_limit, * construct the path in from the bottom up. */ - if (dirid != upper_limit.objectid) { + if (dirid != upper_limit) { ptr = &args->path[BTRFS_INO_LOOKUP_USER_PATH_MAX - 1]; root = btrfs_get_fs_root(fs_info, treeid, true); @@ -2019,7 +2019,7 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap, goto out_put; } - if (key.offset == upper_limit.objectid) + if (key.offset == upper_limit) break; if (key.objectid == BTRFS_FIRST_FREE_OBJECTID) { ret = -EACCES; @@ -2140,7 +2140,7 @@ static int btrfs_ioctl_ino_lookup_user(struct file *file, void __user *argp) inode = file_inode(file); if (args->dirid == BTRFS_FIRST_FREE_OBJECTID && - BTRFS_I(inode)->location.objectid != BTRFS_FIRST_FREE_OBJECTID) { + BTRFS_I(inode)->objectid != BTRFS_FIRST_FREE_OBJECTID) { /* * The subvolume does not exist under fd with which this is * called diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c index dce0387ef155..b28a79935d8e 100644 --- a/fs/btrfs/tests/btrfs-tests.c +++ b/fs/btrfs/tests/btrfs-tests.c @@ -62,9 +62,7 @@ struct inode *btrfs_new_test_inode(void) inode->i_mode = S_IFREG; inode->i_ino = BTRFS_FIRST_FREE_OBJECTID; - BTRFS_I(inode)->location.type = BTRFS_INODE_ITEM_KEY; - BTRFS_I(inode)->location.objectid = BTRFS_FIRST_FREE_OBJECTID; - BTRFS_I(inode)->location.offset = 0; + BTRFS_I(inode)->objectid = BTRFS_FIRST_FREE_OBJECTID; inode_init_owner(&nop_mnt_idmap, inode, NULL, S_IFREG); return inode; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 0aee43466c52..2e762b89d4a2 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4235,8 +4235,10 @@ static int log_inode_item(struct btrfs_trans_handle *trans, struct btrfs_inode *inode, bool inode_item_dropped) { struct btrfs_inode_item *inode_item; + struct btrfs_key key; int ret; + btrfs_get_inode_key(inode, &key); /* * If we are doing a fast fsync and the inode was logged before in the * current transaction, then we know the inode was previously logged and @@ -4248,7 +4250,7 @@ static int log_inode_item(struct btrfs_trans_handle *trans, * already exists can also result in unnecessarily splitting a leaf. */ if (!inode_item_dropped && inode->logged_trans == trans->transid) { - ret = btrfs_search_slot(trans, log, &inode->location, path, 0, 1); + ret = btrfs_search_slot(trans, log, &key, path, 0, 1); ASSERT(ret <= 0); if (ret > 0) ret = -ENOENT; @@ -4262,7 +4264,7 @@ static int log_inode_item(struct btrfs_trans_handle *trans, * the inode, we set BTRFS_INODE_NEEDS_FULL_SYNC on its runtime * flags and set ->logged_trans to 0. */ - ret = btrfs_insert_empty_item(trans, log, path, &inode->location, + ret = btrfs_insert_empty_item(trans, log, path, &key, sizeof(*inode_item)); ASSERT(ret != -EEXIST); } From patchwork Fri May 10 17:32:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13661809 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0CE43D968 for ; Fri, 10 May 2024 17:33:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362389; cv=none; b=E+dxgVzF31J4p/XGJxsM1m7mE4XVuQHKqbCpqfDmJRlSKdtFMos6HA05eyk2lsaWiJLTTtKtB2hB90tx35XMuaAy2HhX2691V72SIkFFv4Tc4s0RCph6EttxiDpeh2nfI2rmCOA7vzvBb+Mcd1GQgWeKYOVWZMnV9mwm1QoCy9U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362389; c=relaxed/simple; bh=5nBFbtgECOUnA+qD+71YVN/wl1bUUg8X7IFDSRR68As=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=o9l2kWrD1yc9qvFdFKjegS8h8gIzGRWg3p2AOfXrJEB70WqW7mmZKmk2yQ7eehNamikHPTWX+ptRPwOJkGR2L18zDs0vA4IKJR5Thwb7u2+jw0IZYMniemORaqmIHVF+VhmpuDQkeEieD6cXNqDVfR+VRiPL7WRtpsAzg0Jr/yk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bYIFnDqz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bYIFnDqz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EE564C2BBFC for ; Fri, 10 May 2024 17:33:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715362389; bh=5nBFbtgECOUnA+qD+71YVN/wl1bUUg8X7IFDSRR68As=; h=From:To:Subject:Date:In-Reply-To:References:From; b=bYIFnDqzDrv9cmcVNH0uFPxc4LwCJZUN9JVe4CsUUNT944FTgw2qY0qSyg1JQLGZO 0KccsmPATjx/7Xd8TVRdkYElFz7FrqID+DZburpkRwVbf2ADwZC1l0D8M2mF2/oVjf FaZSOstKfnJv+x0awNxLazBASJaKasMGyiveFQW30XGcUVWkUNPzuRH/j/OyH/pVTV 9J3k5CAkFTXT3X747gXEFFKbGLY0UCR85v8JLlrUi2JmjbCYL2JTQP0lMrNeIuLbGD 7JyA2dZGrBw2YJi5NVFRJxQNe/H2HyhQ+A6jJe4rKqAFKwPeHIiGrlDZtgQMygt7kR r4N0AFvm3IUuw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 08/10] btrfs: remove objectid from struct btrfs_inode on 64 bits platforms Date: Fri, 10 May 2024 18:32:56 +0100 Message-Id: <9ca25d445d45b9fc3466d86fe49f9fee61f74f26.1715362104.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana On 64 bits platforms we don't really need to have a dedicated member (the objectid field) for the inode's number since we store in the vfs inode's i_ino member, which is an unsigned long and this type is 64 bits wide on 64 bits platforms. We only need that field in case we are on a 32 bits platform because the unsigned long type is 32 bits wide on such platforms See commit 33345d01522f ("Btrfs: Always use 64bit inode number") regarding this 64/32 bits detail. The objectid field of struct btrfs_inode is also used to store the ID of a root for directories that are stubs for unreferenced roots. In such cases the inode is a directory and has the BTRFS_INODE_ROOT_STUB runtime flag set. So in order to reduce the size of btrfs_inode structure on 64 bits platforms we can remove the objectid member and use the vfs inode's i_ino member instead whenever we need to get the inode number. In case the inode is a root stub (BTRFS_INODE_ROOT_STUB set) we can use the member last_reflink_trans to store the ID of the unreferenced root, since such inode is a directory and reflinks can't be done against directories. So remove the objectid fields for 64 bits platforms and alias the last_reflink_trans field with a name of ref_root_id in a union. On a release kernel config, this reduces the size of struct btrfs_inode from 1040 bytes down to 1032 bytes. Signed-off-by: Filipe Manana --- fs/btrfs/btrfs_inode.h | 50 ++++++++++++++++++++++++------------ fs/btrfs/disk-io.c | 3 +-- fs/btrfs/export.c | 2 +- fs/btrfs/inode.c | 17 +++++------- fs/btrfs/ioctl.c | 4 +-- fs/btrfs/tests/btrfs-tests.c | 3 +-- 6 files changed, 45 insertions(+), 34 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index fa2f91396ae0..4d9299789a03 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -119,15 +119,14 @@ struct btrfs_inode { /* which subvolume this inode belongs to */ struct btrfs_root *root; +#if BITS_PER_LONG == 32 /* - * This is either: - * - * 1) The objectid of the corresponding BTRFS_INODE_ITEM_KEY; - * - * 2) In case this a root stub inode (BTRFS_INODE_ROOT_STUB flag set), - * the ID of that root. + * The objectid of the corresponding BTRFS_INODE_ITEM_KEY. + * On 64 bits platforms we can get it from vfs_inode.i_ino, which is an + * unsigned long and therefore 64 bits on such platforms. */ u64 objectid; +#endif /* Cached value of inode property 'compression'. */ u8 prop_compress; @@ -291,16 +290,25 @@ struct btrfs_inode { */ u64 last_unlink_trans; - /* - * The id/generation of the last transaction where this inode was - * either the source or the destination of a clone/dedupe operation. - * Used when logging an inode to know if there are shared extents that - * need special care when logging checksum items, to avoid duplicate - * checksum items in a log (which can lead to a corruption where we end - * up with missing checksum ranges after log replay). - * Protected by the vfs inode lock. - */ - u64 last_reflink_trans; + union { + /* + * The id/generation of the last transaction where this inode + * was either the source or the destination of a clone/dedupe + * operation. Used when logging an inode to know if there are + * shared extents that need special care when logging checksum + * items, to avoid duplicate checksum items in a log (which can + * lead to a corruption where we end up with missing checksum + * ranges after log replay). Protected by the vfs inode lock. + * Used for regular files only. + */ + u64 last_reflink_trans; + + /* + * In case this a root stub inode (BTRFS_INODE_ROOT_STUB flag set), + * the ID of that root. + */ + u64 ref_root_id; + }; /* Backwards incompatible flags, lower half of inode_item::flags */ u32 flags; @@ -377,11 +385,19 @@ static inline u64 btrfs_ino(const struct btrfs_inode *inode) static inline void btrfs_get_inode_key(const struct btrfs_inode *inode, struct btrfs_key *key) { - key->objectid = inode->objectid; + key->objectid = btrfs_ino(inode); key->type = BTRFS_INODE_ITEM_KEY; key->offset = 0; } +static inline void btrfs_set_inode_number(struct btrfs_inode *inode, u64 ino) +{ +#if BITS_PER_LONG == 32 + inode->objectid = ino; +#endif + inode->vfs_inode.i_ino = ino; +} + static inline void btrfs_i_size_write(struct btrfs_inode *inode, u64 size) { i_size_write(&inode->vfs_inode, size); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e3edaf510108..e6bf895b3547 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1928,7 +1928,7 @@ static int btrfs_init_btree_inode(struct super_block *sb) if (!inode) return -ENOMEM; - inode->i_ino = BTRFS_BTREE_INODE_OBJECTID; + btrfs_set_inode_number(BTRFS_I(inode), BTRFS_BTREE_INODE_OBJECTID); set_nlink(inode, 1); /* * we set the i_size on the btree inode to the max possible int. @@ -1944,7 +1944,6 @@ static int btrfs_init_btree_inode(struct super_block *sb) extent_map_tree_init(&BTRFS_I(inode)->extent_tree); BTRFS_I(inode)->root = btrfs_grab_root(fs_info->tree_root); - BTRFS_I(inode)->objectid = BTRFS_BTREE_INODE_OBJECTID; set_bit(BTRFS_INODE_DUMMY, &BTRFS_I(inode)->runtime_flags); __insert_inode_hash(inode, hash); fs_info->btree_inode = inode; diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c index 5526e25ebb3f..5da56e21ff73 100644 --- a/fs/btrfs/export.c +++ b/fs/btrfs/export.c @@ -40,7 +40,7 @@ static int btrfs_encode_fh(struct inode *inode, u32 *fh, int *max_len, if (parent) { u64 parent_root_id; - fid->parent_objectid = BTRFS_I(parent)->objectid; + fid->parent_objectid = btrfs_ino(BTRFS_I(parent)); fid->parent_gen = parent->i_generation; parent_root_id = btrfs_root_id(BTRFS_I(parent)->root); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 44dc82ff96db..5a1014122088 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4340,7 +4340,7 @@ static int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, if (btrfs_ino(inode) == BTRFS_FIRST_FREE_OBJECTID) { objectid = btrfs_root_id(inode->root); } else if (btrfs_ino(inode) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID) { - objectid = inode->objectid; + objectid = inode->ref_root_id; } else { WARN_ON(1); fscrypt_free_filename(&fname); @@ -5581,8 +5581,7 @@ static int btrfs_init_locked_inode(struct inode *inode, void *p) { struct btrfs_iget_args *args = p; - inode->i_ino = args->ino; - BTRFS_I(inode)->objectid = args->ino; + btrfs_set_inode_number(BTRFS_I(inode), args->ino); BTRFS_I(inode)->root = btrfs_grab_root(args->root); if (args->root && args->root == args->root->fs_info->tree_root && @@ -5596,7 +5595,7 @@ static int btrfs_find_actor(struct inode *inode, void *opaque) { struct btrfs_iget_args *args = opaque; - return args->ino == BTRFS_I(inode)->objectid && + return args->ino == btrfs_ino(BTRFS_I(inode)) && args->root == BTRFS_I(inode)->root; } @@ -5673,11 +5672,11 @@ static struct inode *new_simple_dir(struct inode *dir, return ERR_PTR(-ENOMEM); BTRFS_I(inode)->root = btrfs_grab_root(root); - BTRFS_I(inode)->objectid = key->objectid; + BTRFS_I(inode)->ref_root_id = key->objectid; set_bit(BTRFS_INODE_ROOT_STUB, &BTRFS_I(inode)->runtime_flags); set_bit(BTRFS_INODE_DUMMY, &BTRFS_I(inode)->runtime_flags); - inode->i_ino = BTRFS_EMPTY_SUBVOL_DIR_OBJECTID; + btrfs_set_inode_number(BTRFS_I(inode), BTRFS_EMPTY_SUBVOL_DIR_OBJECTID); /* * We only need lookup, the rest is read-only and there's no inode * associated with the dentry @@ -6150,7 +6149,7 @@ static int btrfs_insert_inode_locked(struct inode *inode) { struct btrfs_iget_args args; - args.ino = BTRFS_I(inode)->objectid; + args.ino = btrfs_ino(BTRFS_I(inode)); args.root = BTRFS_I(inode)->root; return insert_inode_locked4(inode, @@ -6282,7 +6281,7 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, ret = btrfs_get_free_objectid(root, &objectid); if (ret) goto out; - inode->i_ino = objectid; + btrfs_set_inode_number(BTRFS_I(inode), objectid); ret = xa_reserve(&root->inodes, objectid, GFP_NOFS); if (ret) @@ -6332,8 +6331,6 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, BTRFS_INODE_NODATASUM; } - BTRFS_I(inode)->objectid = objectid; - ret = btrfs_insert_inode_locked(inode); if (ret < 0) { if (!args->orphan) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 79a5ccb27b92..968e256003af 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1918,7 +1918,7 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap, { struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; struct super_block *sb = inode->i_sb; - u64 upper_limit = BTRFS_I(inode)->objectid; + u64 upper_limit = btrfs_ino(BTRFS_I(inode)); u64 treeid = btrfs_root_id(BTRFS_I(inode)->root); u64 dirid = args->dirid; unsigned long item_off; @@ -2140,7 +2140,7 @@ static int btrfs_ioctl_ino_lookup_user(struct file *file, void __user *argp) inode = file_inode(file); if (args->dirid == BTRFS_FIRST_FREE_OBJECTID && - BTRFS_I(inode)->objectid != BTRFS_FIRST_FREE_OBJECTID) { + btrfs_ino(BTRFS_I(inode)) != BTRFS_FIRST_FREE_OBJECTID) { /* * The subvolume does not exist under fd with which this is * called diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c index b28a79935d8e..ce50847e1e01 100644 --- a/fs/btrfs/tests/btrfs-tests.c +++ b/fs/btrfs/tests/btrfs-tests.c @@ -61,8 +61,7 @@ struct inode *btrfs_new_test_inode(void) return NULL; inode->i_mode = S_IFREG; - inode->i_ino = BTRFS_FIRST_FREE_OBJECTID; - BTRFS_I(inode)->objectid = BTRFS_FIRST_FREE_OBJECTID; + btrfs_set_inode_number(BTRFS_I(inode), BTRFS_FIRST_FREE_OBJECTID); inode_init_owner(&nop_mnt_idmap, inode, NULL, S_IFREG); return inode; From patchwork Fri May 10 17:32:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13661810 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAA2C3FB0F for ; Fri, 10 May 2024 17:33:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362390; cv=none; b=rfDQ/290Jw2BPhRmM/ybmnB9E3bv+3lh/lYRvFG+NKDh1+Ac0qFJTguTH5dUEuF+UAAXXyYzgOlT0ReV3cafHzhBJvjRSDwPUExE8iRoU17eTAWE23+410E/SBSEsxLLTQ6VfT8lsmkFV8hUaBd+or14nEv6l1BA2dPZOx9EqQo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362390; c=relaxed/simple; bh=wSAHi18F4gyv2RMO2cU9GHf0Qo+2IpEw3OlLiso/zVw=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kavN3gs2orr7zhkoQKk8+tR8HnwaYMv+iazFqu9dYgJRdqQbwYA5910jlNhLV/ZCB4DuK/A/sPJNBHY1smTj87/bhAanO4YnGgSYZn9pO9GlLDdQgRADEK4iCOZkRX45XIUQLExTYeTMTlTnvV9G3bY6H3oPtsKv8PFMNfn/3N8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XKlnEiY/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XKlnEiY/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EB413C113CC for ; Fri, 10 May 2024 17:33:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715362390; bh=wSAHi18F4gyv2RMO2cU9GHf0Qo+2IpEw3OlLiso/zVw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=XKlnEiY/YIBqM7amA/HnUg4xb4srDyjeEIwCc10B0BUc0hLbzLn2pRvgjsYVl76wc FPLbQ3ePhIt6rbOHZWmBhF2Y+sNnxTL0FKJeugWfNAV+1zaY97nZu+GLYmH14wZeGe z/uskc7NWyYJUX8B0+2sB20CRaS7Ixkzxdofn0Zlye4FjZRXY4hbxjBshGAMDgbpIZ RPP0ojYOaNL1WqvP6fUEs67stjKkWSCl22rJHWBKQiQVLLIyELmjHFt+3KRxNZJuPn py8j13jbBVm8aWg7swoRdnXwR90CNgyZ3iupi/fXodfyKFYmSBT9QcNY7tfYFRjCKq zU5ogn6ULB/LQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 09/10] btrfs: rename rb_root member of extent_map_tree from map to root Date: Fri, 10 May 2024 18:32:57 +0100 Message-Id: <8f500216e8bb4020acfa192c7f826cbda922d2a7.1715362104.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana Currently we name the rb_root member of struct extent_map_tree as 'map', which is odd and confusing. Since it's a root node, rename it to 'root'. Signed-off-by: Filipe Manana --- fs/btrfs/extent_map.c | 22 +++++++++++----------- fs/btrfs/extent_map.h | 2 +- fs/btrfs/tests/extent-map-tests.c | 6 +++--- 3 files changed, 15 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 744e8952abb0..4bc41b0dd701 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -33,7 +33,7 @@ void __cold extent_map_exit(void) */ void extent_map_tree_init(struct extent_map_tree *tree) { - tree->map = RB_ROOT_CACHED; + tree->root = RB_ROOT_CACHED; INIT_LIST_HEAD(&tree->modified_extents); rwlock_init(&tree->lock); } @@ -265,7 +265,7 @@ static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em) em->generation = max(em->generation, merge->generation); em->flags |= EXTENT_FLAG_MERGED; - rb_erase_cached(&merge->rb_node, &tree->map); + rb_erase_cached(&merge->rb_node, &tree->root); RB_CLEAR_NODE(&merge->rb_node); free_extent_map(merge); dec_evictable_extent_maps(inode); @@ -278,7 +278,7 @@ static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em) if (rb && can_merge_extent_map(merge) && mergeable_maps(em, merge)) { em->len += merge->len; em->block_len += merge->block_len; - rb_erase_cached(&merge->rb_node, &tree->map); + rb_erase_cached(&merge->rb_node, &tree->root); RB_CLEAR_NODE(&merge->rb_node); em->generation = max(em->generation, merge->generation); em->flags |= EXTENT_FLAG_MERGED; @@ -389,7 +389,7 @@ static int add_extent_mapping(struct btrfs_inode *inode, lockdep_assert_held_write(&tree->lock); - ret = tree_insert(&tree->map, em); + ret = tree_insert(&tree->root, em); if (ret) return ret; @@ -410,7 +410,7 @@ __lookup_extent_mapping(struct extent_map_tree *tree, struct rb_node *prev_or_next = NULL; u64 end = range_end(start, len); - rb_node = __tree_search(&tree->map.rb_root, start, &prev_or_next); + rb_node = __tree_search(&tree->root.rb_root, start, &prev_or_next); if (!rb_node) { if (prev_or_next) rb_node = prev_or_next; @@ -479,7 +479,7 @@ void remove_extent_mapping(struct btrfs_inode *inode, struct extent_map *em) lockdep_assert_held_write(&tree->lock); WARN_ON(em->flags & EXTENT_FLAG_PINNED); - rb_erase_cached(&em->rb_node, &tree->map); + rb_erase_cached(&em->rb_node, &tree->root); if (!(em->flags & EXTENT_FLAG_LOGGING)) list_del_init(&em->list); RB_CLEAR_NODE(&em->rb_node); @@ -500,7 +500,7 @@ static void replace_extent_mapping(struct btrfs_inode *inode, ASSERT(extent_map_in_tree(cur)); if (!(cur->flags & EXTENT_FLAG_LOGGING)) list_del_init(&cur->list); - rb_replace_node_cached(&cur->rb_node, &new->rb_node, &tree->map); + rb_replace_node_cached(&cur->rb_node, &new->rb_node, &tree->root); RB_CLEAR_NODE(&cur->rb_node); setup_extent_mapping(inode, new, modified); @@ -659,11 +659,11 @@ static void drop_all_extent_maps_fast(struct btrfs_inode *inode) struct extent_map_tree *tree = &inode->extent_tree; write_lock(&tree->lock); - while (!RB_EMPTY_ROOT(&tree->map.rb_root)) { + while (!RB_EMPTY_ROOT(&tree->root.rb_root)) { struct extent_map *em; struct rb_node *node; - node = rb_first_cached(&tree->map); + node = rb_first_cached(&tree->root); em = rb_entry(node, struct extent_map, rb_node); em->flags &= ~(EXTENT_FLAG_PINNED | EXTENT_FLAG_LOGGING); remove_extent_mapping(inode, em); @@ -1058,7 +1058,7 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t return 0; write_lock(&tree->lock); - node = rb_first_cached(&tree->map); + node = rb_first_cached(&tree->root); while (node) { struct extent_map *em; @@ -1094,7 +1094,7 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t * lock and took it again. */ if (cond_resched_rwlock_write(&tree->lock)) - node = rb_first_cached(&tree->map); + node = rb_first_cached(&tree->root); } write_unlock(&tree->lock); up_read(&inode->i_mmap_lock); diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h index 6d587111f73a..9c0793888d13 100644 --- a/fs/btrfs/extent_map.h +++ b/fs/btrfs/extent_map.h @@ -115,7 +115,7 @@ struct extent_map { }; struct extent_map_tree { - struct rb_root_cached map; + struct rb_root_cached root; struct list_head modified_extents; rwlock_t lock; }; diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c index ba36794ba2d5..075e6930acda 100644 --- a/fs/btrfs/tests/extent-map-tests.c +++ b/fs/btrfs/tests/extent-map-tests.c @@ -19,8 +19,8 @@ static int free_extent_map_tree(struct btrfs_inode *inode) int ret = 0; write_lock(&em_tree->lock); - while (!RB_EMPTY_ROOT(&em_tree->map.rb_root)) { - node = rb_first_cached(&em_tree->map); + while (!RB_EMPTY_ROOT(&em_tree->root.rb_root)) { + node = rb_first_cached(&em_tree->root); em = rb_entry(node, struct extent_map, rb_node); remove_extent_mapping(inode, em); @@ -551,7 +551,7 @@ static int validate_range(struct extent_map_tree *em_tree, int index) struct rb_node *n; int i; - for (i = 0, n = rb_first_cached(&em_tree->map); + for (i = 0, n = rb_first_cached(&em_tree->root); valid_ranges[index][i].len && n; i++, n = rb_next(n)) { struct extent_map *entry = rb_entry(n, struct extent_map, rb_node); From patchwork Fri May 10 17:32:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13661811 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E225A433C5 for ; Fri, 10 May 2024 17:33:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362392; cv=none; b=Sdia/dxiBuVKC6jwpdjJKtpTGmFo0daGV0M2JaEA7YHr0DvoGtcsXXKY+PuqrRUWDn4QT+6tkCpk7tJymATEoDUIHOA8/S85J9Wg7o+1owAZnxY2jahTvk5biXFJBqjtH+l8Nj1i2NzfRBY2E/4F2kvJ61zKAHXpvbXQDODrtiA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715362392; c=relaxed/simple; bh=vaPaq+O4thdTC1ajFjlDnKocjqbxYvqsdgtPuP9AA1o=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WcIKd0/WLDahJwT/NvluOQ85B5AiZhWLSmDVyoAzBpEoC96gTpy012QuyAChObTHG9+duMNm8gKjjdnohRkrm3L498wGf15QETK1evIaMpjOU1mDnqd+3oPMoiKWVRYCZrBzewsqYSDe7Tl+gc4zLC1X5mYbc8r9U1/x0XdiIP8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=HEirSpRo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="HEirSpRo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E9A8CC2BBFC for ; Fri, 10 May 2024 17:33:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715362391; bh=vaPaq+O4thdTC1ajFjlDnKocjqbxYvqsdgtPuP9AA1o=; h=From:To:Subject:Date:In-Reply-To:References:From; b=HEirSpRoDluifISwf5np4S4fRDajW6IogzKqDiF8p53I+V5H+SNV7k/7rHJTRyTop aCNxpBJTJamYWLGk38OgNOWZxqzGVZ70h6YGyoUUMYMjv5X2M4GofrL9JVG3jL5Xwv AB+7N+iatetb7b8a0zW/SwIiyEytbNDfXwtjeKbKxbaPcKI1FRbJRssT6bXH7Eej31 1GOhOeMkk1AlyeU3sFR88ImyfFAF8Ah2uRi9/t8sMcNxNU/dq8ZRJnxkRrnlvKClTj krK0EXP3kZSdNZpWUh/q49Rxy65Qzjrj6lCDtV82L/xqLwUSAVq6uTzCzZaU5xFp7G xmkIapQk+5dpQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 10/10] btrfs: use a regular rb_root instead of cached rb_root for extent_map_tree Date: Fri, 10 May 2024 18:32:58 +0100 Message-Id: <37b63a8da723a934b72c5fe00b49922fcec5f5c7.1715362104.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana We are currently using a cached rb_root (struct rb_root_cached) for the rb root of struct extent_map_tree. This does't offer much of an advantage here because: 1) It's only advantage over the regular rb_root is that it caches a pointer to the left most node (first node), so a call to rb_first_cached() doesn't have to chase pointers until it reaches the left most node; 2) We only have two scenarios that access left most node with rb_first_cached(): When dropping all extent maps from an inode, during inode eviction; When iterating over extent maps during the extent map shrinker; 3) In both cases we keep removing extent maps, which causes deletion of the left most node so rb_erase_cached() has to call rb_next() to find out what's the next left most node and assign it to struct rb_root_cached::rb_leftmost; 4) We can do that ourselves in those two uses cases and stop using a rb_root_cached rb tree and use instead a regular rb_root rb tree. This reduces the size of struct extent_map_tree by 8 bytes and, since this structure is embedded in struct btrfs_inode, it also reduces the size of that structure by 8 bytes. So on a 64 bits platform the size of btrfs_inode is reduced from 1032 bytes down to 1024 bytes. This means we will be able to have 4 inodes per 4K page instead of 3. Signed-off-by: Filipe Manana --- fs/btrfs/extent_map.c | 48 +++++++++++++++++-------------- fs/btrfs/extent_map.h | 2 +- fs/btrfs/tests/extent-map-tests.c | 6 ++-- 3 files changed, 30 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 4bc41b0dd701..35e163152dbc 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -33,7 +33,7 @@ void __cold extent_map_exit(void) */ void extent_map_tree_init(struct extent_map_tree *tree) { - tree->root = RB_ROOT_CACHED; + tree->root = RB_ROOT; INIT_LIST_HEAD(&tree->modified_extents); rwlock_init(&tree->lock); } @@ -85,27 +85,24 @@ static void dec_evictable_extent_maps(struct btrfs_inode *inode) percpu_counter_dec(&fs_info->evictable_extent_maps); } -static int tree_insert(struct rb_root_cached *root, struct extent_map *em) +static int tree_insert(struct rb_root *root, struct extent_map *em) { - struct rb_node **p = &root->rb_root.rb_node; + struct rb_node **p = &root->rb_node; struct rb_node *parent = NULL; struct extent_map *entry = NULL; struct rb_node *orig_parent = NULL; u64 end = range_end(em->start, em->len); - bool leftmost = true; while (*p) { parent = *p; entry = rb_entry(parent, struct extent_map, rb_node); - if (em->start < entry->start) { + if (em->start < entry->start) p = &(*p)->rb_left; - } else if (em->start >= extent_map_end(entry)) { + else if (em->start >= extent_map_end(entry)) p = &(*p)->rb_right; - leftmost = false; - } else { + else return -EEXIST; - } } orig_parent = parent; @@ -128,7 +125,7 @@ static int tree_insert(struct rb_root_cached *root, struct extent_map *em) return -EEXIST; rb_link_node(&em->rb_node, orig_parent, p); - rb_insert_color_cached(&em->rb_node, root, leftmost); + rb_insert_color(&em->rb_node, root); return 0; } @@ -265,7 +262,7 @@ static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em) em->generation = max(em->generation, merge->generation); em->flags |= EXTENT_FLAG_MERGED; - rb_erase_cached(&merge->rb_node, &tree->root); + rb_erase(&merge->rb_node, &tree->root); RB_CLEAR_NODE(&merge->rb_node); free_extent_map(merge); dec_evictable_extent_maps(inode); @@ -278,7 +275,7 @@ static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em) if (rb && can_merge_extent_map(merge) && mergeable_maps(em, merge)) { em->len += merge->len; em->block_len += merge->block_len; - rb_erase_cached(&merge->rb_node, &tree->root); + rb_erase(&merge->rb_node, &tree->root); RB_CLEAR_NODE(&merge->rb_node); em->generation = max(em->generation, merge->generation); em->flags |= EXTENT_FLAG_MERGED; @@ -410,7 +407,7 @@ __lookup_extent_mapping(struct extent_map_tree *tree, struct rb_node *prev_or_next = NULL; u64 end = range_end(start, len); - rb_node = __tree_search(&tree->root.rb_root, start, &prev_or_next); + rb_node = __tree_search(&tree->root, start, &prev_or_next); if (!rb_node) { if (prev_or_next) rb_node = prev_or_next; @@ -479,7 +476,7 @@ void remove_extent_mapping(struct btrfs_inode *inode, struct extent_map *em) lockdep_assert_held_write(&tree->lock); WARN_ON(em->flags & EXTENT_FLAG_PINNED); - rb_erase_cached(&em->rb_node, &tree->root); + rb_erase(&em->rb_node, &tree->root); if (!(em->flags & EXTENT_FLAG_LOGGING)) list_del_init(&em->list); RB_CLEAR_NODE(&em->rb_node); @@ -500,7 +497,7 @@ static void replace_extent_mapping(struct btrfs_inode *inode, ASSERT(extent_map_in_tree(cur)); if (!(cur->flags & EXTENT_FLAG_LOGGING)) list_del_init(&cur->list); - rb_replace_node_cached(&cur->rb_node, &new->rb_node, &tree->root); + rb_replace_node(&cur->rb_node, &new->rb_node, &tree->root); RB_CLEAR_NODE(&cur->rb_node); setup_extent_mapping(inode, new, modified); @@ -657,18 +654,23 @@ int btrfs_add_extent_mapping(struct btrfs_inode *inode, static void drop_all_extent_maps_fast(struct btrfs_inode *inode) { struct extent_map_tree *tree = &inode->extent_tree; + struct rb_node *node; write_lock(&tree->lock); - while (!RB_EMPTY_ROOT(&tree->root.rb_root)) { + node = rb_first(&tree->root); + while (node) { struct extent_map *em; - struct rb_node *node; + struct rb_node *next = rb_next(node); - node = rb_first_cached(&tree->root); em = rb_entry(node, struct extent_map, rb_node); em->flags &= ~(EXTENT_FLAG_PINNED | EXTENT_FLAG_LOGGING); remove_extent_mapping(inode, em); free_extent_map(em); - cond_resched_rwlock_write(&tree->lock); + + if (cond_resched_rwlock_write(&tree->lock)) + node = rb_first(&tree->root); + else + node = next; } write_unlock(&tree->lock); } @@ -1058,12 +1060,12 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t return 0; write_lock(&tree->lock); - node = rb_first_cached(&tree->root); + node = rb_first(&tree->root); while (node) { + struct rb_node *next = rb_next(node); struct extent_map *em; em = rb_entry(node, struct extent_map, rb_node); - node = rb_next(node); (*scanned)++; if (em->flags & EXTENT_FLAG_PINNED) @@ -1094,7 +1096,9 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t * lock and took it again. */ if (cond_resched_rwlock_write(&tree->lock)) - node = rb_first_cached(&tree->root); + node = rb_first(&tree->root); + else + node = next; } write_unlock(&tree->lock); up_read(&inode->i_mmap_lock); diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h index 9c0793888d13..9144721b88a5 100644 --- a/fs/btrfs/extent_map.h +++ b/fs/btrfs/extent_map.h @@ -115,7 +115,7 @@ struct extent_map { }; struct extent_map_tree { - struct rb_root_cached root; + struct rb_root root; struct list_head modified_extents; rwlock_t lock; }; diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c index 075e6930acda..c511a1297956 100644 --- a/fs/btrfs/tests/extent-map-tests.c +++ b/fs/btrfs/tests/extent-map-tests.c @@ -19,8 +19,8 @@ static int free_extent_map_tree(struct btrfs_inode *inode) int ret = 0; write_lock(&em_tree->lock); - while (!RB_EMPTY_ROOT(&em_tree->root.rb_root)) { - node = rb_first_cached(&em_tree->root); + while (!RB_EMPTY_ROOT(&em_tree->root)) { + node = rb_first(&em_tree->root); em = rb_entry(node, struct extent_map, rb_node); remove_extent_mapping(inode, em); @@ -551,7 +551,7 @@ static int validate_range(struct extent_map_tree *em_tree, int index) struct rb_node *n; int i; - for (i = 0, n = rb_first_cached(&em_tree->root); + for (i = 0, n = rb_first(&em_tree->root); valid_ranges[index][i].len && n; i++, n = rb_next(n)) { struct extent_map *entry = rb_entry(n, struct extent_map, rb_node);