From patchwork Thu Nov 21 01:57:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Yu X-Patchwork-Id: 13881584 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.sourceforge.net (lists.sourceforge.net [216.105.38.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D5EC6D743FC for ; Thu, 21 Nov 2024 01:58:11 +0000 (UTC) Received: from [127.0.0.1] (helo=sfs-ml-4.v29.lw.sourceforge.com) by sfs-ml-4.v29.lw.sourceforge.com with esmtp (Exim 4.95) (envelope-from ) id 1tDwSQ-0004kY-R2; Thu, 21 Nov 2024 01:58:10 +0000 Received: from [172.30.29.66] (helo=mx.sourceforge.net) by sfs-ml-4.v29.lw.sourceforge.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1tDwSP-0004kS-9X for linux-f2fs-devel@lists.sourceforge.net; Thu, 21 Nov 2024 01:58:09 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sourceforge.net; s=x; h=Content-Transfer-Encoding:MIME-Version:Message-Id: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=K93kwPsibderbk6McAVuNCzrUZ9RqqP5wU5hHL6UiaE=; b=SyB34gO7+Pcio5tK0IyJCOpns8 X7Fd1QaodUvjppAPMrYJJ0jqGEpLI+dsd8yiZ1RsUI8ALnkRmx3nk19tXoEUPowCjikBxfbbkRZjO F1x037g1N0E9pSDmKBLkK5p87QF2pbqSLcz52aRUC8QZlluXfhkf/DSrxi0EeHL33MoE=; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sf.net; s=x ; h=Content-Transfer-Encoding:MIME-Version:Message-Id:Date:Subject:Cc:To:From :Sender:Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Id:List-Help:List-Unsubscribe:List-Subscribe:List-Post: List-Owner:List-Archive; bh=K93kwPsibderbk6McAVuNCzrUZ9RqqP5wU5hHL6UiaE=; b=E hGlcq0ApbyX2hMIM8HdP6aKS01v3YtjZ7tKeR8HB2aXW6/AMHy7Bg2MoETB4gaH5wwmQyzrEOp39u 5aCH79iRq+wN/R6Gr4YgAQnptpm5db+fkgHA+j+ck+FYEpOe5zRT/dH1nvPaRlxj9lakSAVsjPL+q AbD9zFVJc8yrKeCM=; Received: from nyc.source.kernel.org ([147.75.193.91]) by sfi-mx-2.v28.lw.sourceforge.com with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.95) id 1tDwSO-0007Qd-9r for linux-f2fs-devel@lists.sourceforge.net; Thu, 21 Nov 2024 01:58:09 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 3EC98A434F9; Thu, 21 Nov 2024 01:56:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C4685C4CECD; Thu, 21 Nov 2024 01:57:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732154277; bh=TXQ7aoNM0tnS/RnjEIecLbc4EDssigqIjGAeDmxFkQ8=; h=From:To:Cc:Subject:Date:From; b=ODMAFw6NNWR2v7x/3ARlAitjnPoYqWTAJ8j0nZgARTQjkOSKdcfnTelNPFP3ygmwF 5Lir2+uYFaZn4NsmvehgcgpmgeMKfQAnXkrA4ahE2ol6LmxKpTSeinlz3f2fmBA6f6 yYaSXqsXPVsQzF43gcuMf4jp7aUvXmJhzuWGUTvDgpgRX2lvsoIqY2cyQNNU4WW7uH GK/820wAESARW/G6tvKeLOh8J2C+Y+bUHakfvfKsQQ7L2LO2xuGUHPkWGcQNDWzVai RW2xqgKzu1/gnQuAOupzaRD1/+zTg4yEJskodVomMo6h5Tny533qP3ud0IkuiF2l/B DLB+A6ROzaLaQ== To: jaegeuk@kernel.org Date: Thu, 21 Nov 2024 09:57:50 +0800 Message-Id: <20241121015751.2300234-1-chao@kernel.org> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 X-Headers-End: 1tDwSO-0007Qd-9r Subject: [f2fs-dev] [PATCH 1/2] f2fs: fix to shrink read extent node in batches X-BeenThere: linux-f2fs-devel@lists.sourceforge.net X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Chao Yu via Linux-f2fs-devel From: Chao Yu Reply-To: Chao Yu Cc: Xiuhong Wang , Zhiguo Niu , linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Errors-To: linux-f2fs-devel-bounces@lists.sourceforge.net We use rwlock to protect core structure data of extent tree during its shrink, however, if there is a huge number of extent nodes in extent tree, during shrink of extent tree, it may hold rwlock for a very long time, which may trigger kernel hang issue. This patch fixes to shrink read extent node in batches, so that, critical region of the rwlock can be shrunk to avoid its extreme long time hold. Reported-by: Xiuhong Wang Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/ Signed-off-by: Xiuhong Wang Signed-off-by: Zhiguo Niu Signed-off-by: Chao Yu --- fs/f2fs/extent_cache.c | 69 +++++++++++++++++++++++++----------------- 1 file changed, 41 insertions(+), 28 deletions(-) diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c index 019c1f7b7fa5..b7a6817b44b0 100644 --- a/fs/f2fs/extent_cache.c +++ b/fs/f2fs/extent_cache.c @@ -379,21 +379,22 @@ static struct extent_tree *__grab_extent_tree(struct inode *inode, } static unsigned int __free_extent_tree(struct f2fs_sb_info *sbi, - struct extent_tree *et) + struct extent_tree *et, unsigned int nr_shrink) { struct rb_node *node, *next; struct extent_node *en; - unsigned int count = atomic_read(&et->node_cnt); + unsigned int count; node = rb_first_cached(&et->root); - while (node) { + + for (count = 0; node && count < nr_shrink; count++) { next = rb_next(node); en = rb_entry(node, struct extent_node, rb_node); __release_extent_node(sbi, et, en); node = next; } - return count - atomic_read(&et->node_cnt); + return count; } static void __drop_largest_extent(struct extent_tree *et, @@ -622,6 +623,30 @@ static struct extent_node *__insert_extent_tree(struct f2fs_sb_info *sbi, return en; } +static unsigned int __destroy_extent_node(struct inode *inode, + enum extent_type type) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct extent_tree *et = F2FS_I(inode)->extent_tree[type]; + unsigned int nr_shrink = type == EX_READ ? + READ_EXTENT_CACHE_SHRINK_NUMBER : + AGE_EXTENT_CACHE_SHRINK_NUMBER; + unsigned int node_cnt = 0; + + if (!et || !atomic_read(&et->node_cnt)) + return 0; + + while (atomic_read(&et->node_cnt)) { + write_lock(&et->lock); + node_cnt += __free_extent_tree(sbi, et, nr_shrink); + write_unlock(&et->lock); + } + + f2fs_bug_on(sbi, atomic_read(&et->node_cnt)); + + return node_cnt; +} + static void __update_extent_tree_range(struct inode *inode, struct extent_info *tei, enum extent_type type) { @@ -760,9 +785,6 @@ static void __update_extent_tree_range(struct inode *inode, } } - if (is_inode_flag_set(inode, FI_NO_EXTENT)) - __free_extent_tree(sbi, et); - if (et->largest_updated) { et->largest_updated = false; updated = true; @@ -780,6 +802,9 @@ static void __update_extent_tree_range(struct inode *inode, out_read_extent_cache: write_unlock(&et->lock); + if (is_inode_flag_set(inode, FI_NO_EXTENT)) + __destroy_extent_node(inode, EX_READ); + if (updated) f2fs_mark_inode_dirty_sync(inode, true); } @@ -942,10 +967,14 @@ static unsigned int __shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink list_for_each_entry_safe(et, next, &eti->zombie_list, list) { if (atomic_read(&et->node_cnt)) { write_lock(&et->lock); - node_cnt += __free_extent_tree(sbi, et); + node_cnt += __free_extent_tree(sbi, et, + nr_shrink - node_cnt - tree_cnt); write_unlock(&et->lock); } - f2fs_bug_on(sbi, atomic_read(&et->node_cnt)); + + if (atomic_read(&et->node_cnt)) + goto unlock_out; + list_del_init(&et->list); radix_tree_delete(&eti->extent_tree_root, et->ino); kmem_cache_free(extent_tree_slab, et); @@ -1084,23 +1113,6 @@ unsigned int f2fs_shrink_age_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink return __shrink_extent_tree(sbi, nr_shrink, EX_BLOCK_AGE); } -static unsigned int __destroy_extent_node(struct inode *inode, - enum extent_type type) -{ - struct f2fs_sb_info *sbi = F2FS_I_SB(inode); - struct extent_tree *et = F2FS_I(inode)->extent_tree[type]; - unsigned int node_cnt = 0; - - if (!et || !atomic_read(&et->node_cnt)) - return 0; - - write_lock(&et->lock); - node_cnt = __free_extent_tree(sbi, et); - write_unlock(&et->lock); - - return node_cnt; -} - void f2fs_destroy_extent_node(struct inode *inode) { __destroy_extent_node(inode, EX_READ); @@ -1109,7 +1121,6 @@ void f2fs_destroy_extent_node(struct inode *inode) static void __drop_extent_tree(struct inode *inode, enum extent_type type) { - struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct extent_tree *et = F2FS_I(inode)->extent_tree[type]; bool updated = false; @@ -1117,7 +1128,6 @@ static void __drop_extent_tree(struct inode *inode, enum extent_type type) return; write_lock(&et->lock); - __free_extent_tree(sbi, et); if (type == EX_READ) { set_inode_flag(inode, FI_NO_EXTENT); if (et->largest.len) { @@ -1126,6 +1136,9 @@ static void __drop_extent_tree(struct inode *inode, enum extent_type type) } } write_unlock(&et->lock); + + __destroy_extent_node(inode, type); + if (updated) f2fs_mark_inode_dirty_sync(inode, true); } From patchwork Thu Nov 21 01:57:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Yu X-Patchwork-Id: 13881585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.sourceforge.net (lists.sourceforge.net [216.105.38.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 785E7D743FD for ; Thu, 21 Nov 2024 01:58:13 +0000 (UTC) Received: from [127.0.0.1] (helo=sfs-ml-1.v29.lw.sourceforge.com) by sfs-ml-1.v29.lw.sourceforge.com with esmtp (Exim 4.95) (envelope-from ) id 1tDwSS-0005gw-63; Thu, 21 Nov 2024 01:58:12 +0000 Received: from [172.30.29.66] (helo=mx.sourceforge.net) by sfs-ml-1.v29.lw.sourceforge.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1tDwSQ-0005gn-Fb for linux-f2fs-devel@lists.sourceforge.net; Thu, 21 Nov 2024 01:58:11 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sourceforge.net; s=x; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=/0EJP5WOjXdXsLklH2iQ5QbeN2w/NhtxWra7b0BvvyQ=; b=T+NC1Odd7FIDiREzOdaCct7o+n 0ou/cJjFGhG1OHVnJ7sHpbWrUQay8/6wrJFOgn2CILf9prbWnMyS2tBAJk41gDtEgKumQ7mNbc9uo dPodZWtilLMO8ccSXAjmwHLN4KA450vHosKk1FQA7aJNcVp3KuX+zqiFtXT3M1hOQW9w=; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sf.net; s=x ; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-Id: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=/0EJP5WOjXdXsLklH2iQ5QbeN2w/NhtxWra7b0BvvyQ=; b=DGhMAm1qHMcZLR2uLfLhzHHqMl D6i19Gv4QAT3U7/6rA9F+wjOtH2w/rSV3t0mu/UhBcDhwc+HHKub2iRQGRu7GF3xUZCDaE4Y1FUqo K/RjfH2CRrbTT26kxIxuq6FFuvyY9Y6zA9dX+g3NkLpHD5UL6Gs9eciUVygWizymuylE=; Received: from nyc.source.kernel.org ([147.75.193.91]) by sfi-mx-2.v28.lw.sourceforge.com with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.95) id 1tDwSQ-0007Qj-At for linux-f2fs-devel@lists.sourceforge.net; Thu, 21 Nov 2024 01:58:11 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 40C51A4352E; Thu, 21 Nov 2024 01:56:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C781AC4CED6; Thu, 21 Nov 2024 01:57:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732154279; bh=Sx8HEO+F+EE6bWdYBSu5S5n0Pvr1b4pffYsz5svaACs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=e8Ku1wgcT6hpCeDBJs72fO3Rq3N0QeVcmXf1zjqe6gD7EoykpJcvwSqiMQEky7o9l o1tIc2Pg/S7OVKHk3f0/hzobf36u8/IlpFk4CezA+jE9M3RiRR1djXehWtfbsJvXmR zrWF2MatPoszQXOa8Yjx5l0VT1rGg20KqPJSSy7HNu+pEK+nsSwcUrmaexEXa+CONG QhEX/pI2MasRv4C+/FVaixU3bEU5BrWZ5GaV9h//ZZnLKCFA7KwdQsqrRqFns3vX62 XE0ZMEjZ6j5q82/9is006E7YKz039q7dt6Ba3i2AxhCS6eCVW/S5YQMqWEvg7JVSK2 /Mhua2oU83b/g== To: jaegeuk@kernel.org Date: Thu, 21 Nov 2024 09:57:51 +0800 Message-Id: <20241121015751.2300234-2-chao@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20241121015751.2300234-1-chao@kernel.org> References: <20241121015751.2300234-1-chao@kernel.org> MIME-Version: 1.0 X-Headers-End: 1tDwSQ-0007Qj-At Subject: [f2fs-dev] [PATCH 2/2] f2fs: add a sysfs node to limit max read extent count per-inode X-BeenThere: linux-f2fs-devel@lists.sourceforge.net X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Chao Yu via Linux-f2fs-devel From: Chao Yu Reply-To: Chao Yu Cc: Xiuhong Wang , Zhiguo Niu , linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Errors-To: linux-f2fs-devel-bounces@lists.sourceforge.net Quoted: "at this time, there are still 1086911 extent nodes in this zombie extent tree that need to be cleaned up. crash_arm64_sprd_v8.0.3++> extent_tree.node_cnt ffffff80896cc500 node_cnt = { counter = 1086911 }, " As reported by Xiuhong, there will be a huge number of extent nodes in extent tree, it may potentially cause: - slab memory fragments - extreme long time shrink on extent tree - low mapping efficiency Let's add a sysfs node to limit max read extent count for each inode, by default, value of this threshold is 10240, it can be updated according to user's requirement. Reported-by: Xiuhong Wang Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/ Signed-off-by: Xiuhong Wang Signed-off-by: Zhiguo Niu Signed-off-by: Chao Yu --- Documentation/ABI/testing/sysfs-fs-f2fs | 6 ++++++ fs/f2fs/extent_cache.c | 5 ++++- fs/f2fs/f2fs.h | 4 ++++ fs/f2fs/sysfs.c | 7 +++++++ 4 files changed, 21 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index 513296bb6f29..3e1630c70d8a 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -822,3 +822,9 @@ Description: It controls the valid block ratio threshold not to trigger excessiv for zoned deivces. The initial value of it is 95(%). F2FS will stop the background GC thread from intiating GC for sections having valid blocks exceeding the ratio. + +What: /sys/fs/f2fs//max_read_extent_count +Date: November 2024 +Contact: "Chao Yu" +Description: It controls max read extent count for per-inode, the value of threshold + is 10240 by default. diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c index b7a6817b44b0..347b3b647834 100644 --- a/fs/f2fs/extent_cache.c +++ b/fs/f2fs/extent_cache.c @@ -717,7 +717,9 @@ static void __update_extent_tree_range(struct inode *inode, } if (end < org_end && (type != EX_READ || - org_end - end >= F2FS_MIN_EXTENT_LEN)) { + (org_end - end >= F2FS_MIN_EXTENT_LEN && + atomic_read(&et->node_cnt) < + sbi->max_read_extent_count))) { if (parts) { __set_extent_info(&ei, end, org_end - end, @@ -1212,6 +1214,7 @@ void f2fs_init_extent_cache_info(struct f2fs_sb_info *sbi) sbi->hot_data_age_threshold = DEF_HOT_DATA_AGE_THRESHOLD; sbi->warm_data_age_threshold = DEF_WARM_DATA_AGE_THRESHOLD; sbi->last_age_weight = LAST_AGE_WEIGHT; + sbi->max_read_extent_count = DEF_MAX_READ_EXTENT_COUNT; } int __init f2fs_create_extent_cache(void) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index b65b023a588a..6f2cbf4c5740 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -635,6 +635,9 @@ enum { #define DEF_HOT_DATA_AGE_THRESHOLD 262144 #define DEF_WARM_DATA_AGE_THRESHOLD 2621440 +/* default max read extent count per inode */ +#define DEF_MAX_READ_EXTENT_COUNT 10240 + /* extent cache type */ enum extent_type { EX_READ, @@ -1619,6 +1622,7 @@ struct f2fs_sb_info { /* for extent tree cache */ struct extent_tree_info extent_tree[NR_EXTENT_CACHES]; atomic64_t allocated_data_blocks; /* for block age extent_cache */ + unsigned int max_read_extent_count; /* max read extent count per inode */ /* The threshold used for hot and warm data seperation*/ unsigned int hot_data_age_threshold; diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index bdbf24db667b..d1356c656cac 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -787,6 +787,13 @@ static ssize_t __sbi_store(struct f2fs_attr *a, return count; } + if (!strcmp(a->attr.name, "max_read_extent_count")) { + if (t > UINT_MAX) + return -EINVAL; + *ui = (unsigned int)t; + return count; + } + if (!strcmp(a->attr.name, "ipu_policy")) { if (t >= BIT(F2FS_IPU_MAX)) return -EINVAL;