From patchwork Thu Oct 24 13:22:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 13849018 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EACDCE8E75 for ; Thu, 24 Oct 2024 13:25:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2DBE56B00B5; Thu, 24 Oct 2024 09:25:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2157F6B00B8; Thu, 24 Oct 2024 09:25:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC24E6B00B7; Thu, 24 Oct 2024 09:25:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C09696B00B4 for ; Thu, 24 Oct 2024 09:25:28 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id E4DBEA12BE for ; Thu, 24 Oct 2024 13:24:54 +0000 (UTC) X-FDA: 82708567326.02.5709A2A Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by imf18.hostedemail.com (Postfix) with ESMTP id EDBCB1C000F for ; Thu, 24 Oct 2024 13:25:17 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf18.hostedemail.com: domain of yukuai1@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=yukuai1@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729776158; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KbpwhwYeCQI3fLkl+8Y84BqrGEiVCWKI6ugE1w5VUBw=; b=n5DY2uQzdT3VLuJyohQB1rMUfh932+BdRNvpt3tx043KuMFzi/bdxdj+04JUHYmadm1snn LtWT/yFEkz2HE6wp9b+OcrET6kVsXob4ncKiZUKPhDsEiXSoz2Kw2QhbKHbrTOAWmRv7vo gJljl2DkD2+bg5WL9Ib0o9lqvlid9D4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729776158; a=rsa-sha256; cv=none; b=eJzN3DjkKuwHllBP16+1Brp/kWxvxEJqAnvmoOSRX6l3+6sKhg+iwjqnRcks+rB/BYpwXz lv+Beb1iFjy+MyvSAV0tX5edbi1pAyT4rgiq4bOAk3ZVXIkJDVTkeZg8S1kfw+/fkxCgCe q44tgIVUGan8GUhkY7y7y20+gd6QABc= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf18.hostedemail.com: domain of yukuai1@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=yukuai1@huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XZ68t5tNjz4f3nb0 for ; Thu, 24 Oct 2024 21:25:02 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 1C5821A018D for ; Thu, 24 Oct 2024 21:25:21 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgD3LMmxShpnmfz6Ew--.42902S14; Thu, 24 Oct 2024 21:25:19 +0800 (CST) From: Yu Kuai To: stable@vger.kernel.org, gregkh@linuxfoundation.org, harry.wentland@amd.com, sunpeng.li@amd.com, Rodrigo.Siqueira@amd.com, alexander.deucher@amd.com, christian.koenig@amd.com, Xinhui.Pan@amd.com, airlied@gmail.com, daniel@ffwll.ch, viro@zeniv.linux.org.uk, brauner@kernel.org, Liam.Howlett@oracle.com, akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, sashal@kernel.org, srinivasan.shanmugam@amd.com, chiahsuan.chung@amd.com, mingo@kernel.org, mgorman@techsingularity.net, yukuai3@huawei.com, chengming.zhou@linux.dev, zhangpeng.00@bytedance.com, chuck.lever@oracle.com Cc: amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, maple-tree@lists.infradead.org, linux-mm@kvack.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH 6.6 26/28] libfs: Convert simple directory offsets to use a Maple Tree Date: Thu, 24 Oct 2024 21:22:23 +0800 Message-Id: <20241024132225.2271667-11-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20241024132225.2271667-1-yukuai1@huaweicloud.com> References: <20241024132009.2267260-1-yukuai1@huaweicloud.com> <20241024132225.2271667-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgD3LMmxShpnmfz6Ew--.42902S14 X-Coremail-Antispam: 1UD129KBjvJXoW3Gr4UXF1xtry7Zr1UXw18Zrb_yoW3GF47pF 9xJay5tr4fXw1UWF48XF4DZw1F9wn5Wr1UGFZYgw1fA3sFvr4kJanF9r45ua4UJrWkCrsx JFs8Kr1avF4UXrUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUm214x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_GcCE3s 1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1l e2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI 8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwAC jcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka0x kIwI1lc7CjxVAaw2AFwI0_Wrv_ZF1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_ Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1V AY17CE14v26rWY6r4UJwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26ryj6F1UMIIF 0xvE2Ix0cI8IcVCY1x0267AKxVWxJr0_GcWlIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMI IF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Cr1j6rxdYxBIdaVF xhVjvjDU0xZFpf9x0pRnYFAUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: EDBCB1C000F X-Stat-Signature: pit6wwfeatw4fw9k9mepkpah6eh4fnq4 X-Rspam-User: X-HE-Tag: 1729776317-196352 X-HE-Meta: U2FsdGVkX18OMBMLzcSrV86Rv2lKdJU1itdp1JctNi+EWVNOh+flnPWLaVpmeRHO60hDm/7m3bk7TmCflBfiNeTzR5B9ag0lngvw5RLUhAqeuVbehaPxLB0bjGLQyH9ICG8GgvrH/sNFm2K2NhPpuQD3fZdzMNYKu3d5g2GYtelMKjeQmpEQjo1sX68ZrUNVIubno5Q1Yg+W02vQ7vwZHA75iyxdRcUQQsc6KAulN2+yERWCj/m5FHLhW7PYo0fbQlLn9RqUkNEvkceVNfjALb+qJbMA+Bul9QnsTa2EpDVriV3PE5rpK0+oEKzUYfDQdfqT7UkYwAZusNJav4r1ZgFBDXVsBBnsCu3mL4Va07kNQYFdZM2M4FgPr75d05rOiErjT+MCHKBdiy4zFPezO9kBPkSazzJn2KuhIH2hPGZUKbiwv35t/hNXop27RkyIi3CZX5JUIW6kSpEAi3uPigEUX6PvPaDLfHWfl+b9Cb7sAZh4Fyhw6phCe/76KoP4R29fkzM25tF2wX/0WyMcbr+sh+72M7KWHMndxyGagsO94hmXZ3QCerp4Awk7zS5RKV7IB+ZSpBhMXdbmZpQEUZFLscwuVZTZaGG+iDJVD0y3P8q2uEUSqKYYHMqHGqRgdZbucW3QLPADroC+nnbqd6Anmhx+hka4L9rG7gyzVVblzFXRItKlyJ52fnMoYo32an6FcJXdadQSsPgUBS9+l42y6dg54qq6ix3mpovJGiSsZNC4rXEY0Hcmbk467JGod3aOXPADgFbjhAPQI4gODOTzmu81d7ooPyYwqSCLY3iOaIOBB3kteCK0vSmzFJYgIQAh7crtF5UlBbDwEmzf8i0UwIo3bgO/9ECPD0dnHsf6JlDXWLgGuw+euTkgWqX35PV0NH3ZWRE3+Oco+2hrfUjM0JYlEZwSuq9PBPmHm5a4tsxQ39zvN7WrBlx5ZXB+KkK0mi8Lws5v1tncDs+ Ww9DlDeH /MK9l97xFbVRB6uEC2j083hjDNo9jS43rmVryfITjpRSJTTXiomLJCHjHhHNyCpYap3tlciGhb6D3tRCmxjWqQI4NrK4HUnfpW0JTLxxgC4s+FprrzEtxGYZTRRMizZaNA+bn1rVHc1T8xHI5tGH8sCjvm4T9gyg4v0cw87ZIJafEL+4JkA7wr4wUAzGYWisMB6+uT/LajsE1QtTrgebqHmvFmsZvrCqaDfaHD+uf1eNMlk4VhvvELWWa9yS+GdIY2GRZVMNmvw7RKBkBax6YhbB0bmmnH55R6932 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chuck Lever commit 0e4a862174f2a8d1653a8a9cf0815020e1d3af24 upstream. Test robot reports: > kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: > > commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master Feng Tang further clarifies that: > ... the new simple_offset_add() > called by shmem_mknod() brings extra cost related with slab, > specifically the 'radix_tree_node', which cause the regression. Willy's analysis is that, over time, the test workload causes xa_alloc_cyclic() to fragment the underlying SLAB cache. This patch replaces the offset_ctx's xarray with a Maple Tree in the hope that Maple Tree's dense node mode will handle this scenario more scalably. In addition, we can widen the simple directory offset maximum to signed long (as loff_t is also signed). Suggested-by: Matthew Wilcox Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@intel.com Signed-off-by: Chuck Lever Link: https://lore.kernel.org/r/170820145616.6328.12620992971699079156.stgit@91.116.238.104.host.secureserver.net Reviewed-by: Jan Kara Signed-off-by: Christian Brauner Signed-off-by: Yu Kuai --- fs/libfs.c | 47 +++++++++++++++++++++++----------------------- include/linux/fs.h | 5 +++-- 2 files changed, 26 insertions(+), 26 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index d7b901cb9af4..98731178a3c1 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -244,17 +244,17 @@ enum { DIR_OFFSET_MIN = 2, }; -static void offset_set(struct dentry *dentry, u32 offset) +static void offset_set(struct dentry *dentry, long offset) { - dentry->d_fsdata = (void *)((uintptr_t)(offset)); + dentry->d_fsdata = (void *)offset; } -static u32 dentry2offset(struct dentry *dentry) +static long dentry2offset(struct dentry *dentry) { - return (u32)((uintptr_t)(dentry->d_fsdata)); + return (long)dentry->d_fsdata; } -static struct lock_class_key simple_offset_xa_lock; +static struct lock_class_key simple_offset_lock_class; /** * simple_offset_init - initialize an offset_ctx @@ -263,8 +263,8 @@ static struct lock_class_key simple_offset_xa_lock; */ void simple_offset_init(struct offset_ctx *octx) { - xa_init_flags(&octx->xa, XA_FLAGS_ALLOC1); - lockdep_set_class(&octx->xa.xa_lock, &simple_offset_xa_lock); + mt_init_flags(&octx->mt, MT_FLAGS_ALLOC_RANGE); + lockdep_set_class(&octx->mt.ma_lock, &simple_offset_lock_class); octx->next_offset = DIR_OFFSET_MIN; } @@ -273,20 +273,19 @@ void simple_offset_init(struct offset_ctx *octx) * @octx: directory offset ctx to be updated * @dentry: new dentry being added * - * Returns zero on success. @so_ctx and the dentry offset are updated. + * Returns zero on success. @octx and the dentry's offset are updated. * Otherwise, a negative errno value is returned. */ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry) { - static const struct xa_limit limit = XA_LIMIT(DIR_OFFSET_MIN, U32_MAX); - u32 offset; + unsigned long offset; int ret; if (dentry2offset(dentry) != 0) return -EBUSY; - ret = xa_alloc_cyclic(&octx->xa, &offset, dentry, limit, - &octx->next_offset, GFP_KERNEL); + ret = mtree_alloc_cyclic(&octx->mt, &offset, dentry, DIR_OFFSET_MIN, + LONG_MAX, &octx->next_offset, GFP_KERNEL); if (ret < 0) return ret; @@ -302,13 +301,13 @@ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry) */ void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry) { - u32 offset; + long offset; offset = dentry2offset(dentry); if (offset == 0) return; - xa_erase(&octx->xa, offset); + mtree_erase(&octx->mt, offset); offset_set(dentry, 0); } @@ -331,7 +330,7 @@ int simple_offset_empty(struct dentry *dentry) index = DIR_OFFSET_MIN; octx = inode->i_op->get_offset_ctx(inode); - xa_for_each(&octx->xa, index, child) { + mt_for_each(&octx->mt, child, index, LONG_MAX) { spin_lock(&child->d_lock); if (simple_positive(child)) { spin_unlock(&child->d_lock); @@ -361,8 +360,8 @@ int simple_offset_rename_exchange(struct inode *old_dir, { struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir); struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir); - u32 old_index = dentry2offset(old_dentry); - u32 new_index = dentry2offset(new_dentry); + long old_index = dentry2offset(old_dentry); + long new_index = dentry2offset(new_dentry); int ret; simple_offset_remove(old_ctx, old_dentry); @@ -388,9 +387,9 @@ int simple_offset_rename_exchange(struct inode *old_dir, out_restore: offset_set(old_dentry, old_index); - xa_store(&old_ctx->xa, old_index, old_dentry, GFP_KERNEL); + mtree_store(&old_ctx->mt, old_index, old_dentry, GFP_KERNEL); offset_set(new_dentry, new_index); - xa_store(&new_ctx->xa, new_index, new_dentry, GFP_KERNEL); + mtree_store(&new_ctx->mt, new_index, new_dentry, GFP_KERNEL); return ret; } @@ -403,7 +402,7 @@ int simple_offset_rename_exchange(struct inode *old_dir, */ void simple_offset_destroy(struct offset_ctx *octx) { - xa_destroy(&octx->xa); + mtree_destroy(&octx->mt); } /** @@ -433,16 +432,16 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) /* In this case, ->private_data is protected by f_pos_lock */ file->private_data = NULL; - return vfs_setpos(file, offset, U32_MAX); + return vfs_setpos(file, offset, LONG_MAX); } static struct dentry *offset_find_next(struct offset_ctx *octx, loff_t offset) { + MA_STATE(mas, &octx->mt, offset, offset); struct dentry *child, *found = NULL; - XA_STATE(xas, &octx->xa, offset); rcu_read_lock(); - child = xas_next_entry(&xas, U32_MAX); + child = mas_find(&mas, LONG_MAX); if (!child) goto out; spin_lock(&child->d_lock); @@ -456,8 +455,8 @@ static struct dentry *offset_find_next(struct offset_ctx *octx, loff_t offset) static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) { - u32 offset = dentry2offset(dentry); struct inode *inode = d_inode(dentry); + long offset = dentry2offset(dentry); return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len, offset, inode->i_ino, fs_umode_to_dtype(inode->i_mode)); diff --git a/include/linux/fs.h b/include/linux/fs.h index 5104405ce3e6..b9edab0ba46c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -43,6 +43,7 @@ #include #include #include +#include #include #include @@ -3190,8 +3191,8 @@ extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos, const void __user *from, size_t count); struct offset_ctx { - struct xarray xa; - u32 next_offset; + struct maple_tree mt; + unsigned long next_offset; }; void simple_offset_init(struct offset_ctx *octx);