From patchwork Fri Jun 9 08:15:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13273415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EEF53C7EE2F for ; Fri, 9 Jun 2023 08:17:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 85C308E0009; Fri, 9 Jun 2023 04:17:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E3A88E0003; Fri, 9 Jun 2023 04:17:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65D758E0009; Fri, 9 Jun 2023 04:17:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 4F26D8E0003 for ; Fri, 9 Jun 2023 04:17:07 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 17C7A4018E for ; Fri, 9 Jun 2023 08:17:07 +0000 (UTC) X-FDA: 80882504094.15.0F5B8C8 Received: from out-15.mta0.migadu.com (out-15.mta0.migadu.com [91.218.175.15]) by imf03.hostedemail.com (Postfix) with ESMTP id 0A9942000D for ; Fri, 9 Jun 2023 08:17:04 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="VJW/sTjl"; spf=pass (imf03.hostedemail.com: domain of qi.zheng@linux.dev designates 91.218.175.15 as permitted sender) smtp.mailfrom=qi.zheng@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686298625; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=50XF484pkGZZlvaX6wjD3S7ullpcs6Au3jWyE87o9lo=; b=hG4Ou7zNALQoZdDr/NlBj9Hd9gCPWH7jZYWQHeaRUB4rkJTJeEmKavMKsbHxJZA1YsHLYq jd54NCG9aLVxtMAQ+Fei488u2XHKoPT+Rurx4BKlDeIxQp2Or2ZE+6lBI362iVJK04ymfz FJUmMM8vsUlpP6tkIAgABp5m/wpZVy0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686298625; a=rsa-sha256; cv=none; b=fNnE5JALdJwDXeJMI9aX0wGU5mUNayy6te0ys3NorULWxe2U+WeGqAynlvpl0yvJujGKVK 4JoWOoI1NLwgJai151jcogz8i0N31HG8AmgOhdbHdNiQMRIgWM3CYx4ryZ24jtdb7aubCr rY305Zxfbr+IL+RcKB69EyRlOl0B9MM= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="VJW/sTjl"; spf=pass (imf03.hostedemail.com: domain of qi.zheng@linux.dev designates 91.218.175.15 as permitted sender) smtp.mailfrom=qi.zheng@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1686298623; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=50XF484pkGZZlvaX6wjD3S7ullpcs6Au3jWyE87o9lo=; b=VJW/sTjlrsD06VeMs5MWZHOINSVeIuvUVV3XeMcdlaZgU6yHH1KDKeoDlQ0uNJ8ow7jI5i C2f1mQy0VShusDu7ScqyZuVT65khmY5y8dLoApSOfxCjNpnx2aeqmPB13WMTVIUTOMDiTn dvzcgxnTuwP9thnsK4Ff5TbiUeHu3ak= From: Qi Zheng To: akpm@linux-foundation.org Cc: david@fromorbit.com, tkhai@ya.ru, roman.gushchin@linux.dev, vbabka@suse.cz, muchun.song@linux.dev, yujie.liu@intel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH 1/7] Revert "mm: shrinkers: convert shrinker_rwsem to mutex" Date: Fri, 9 Jun 2023 08:15:12 +0000 Message-Id: <20230609081518.3039120-2-qi.zheng@linux.dev> In-Reply-To: <20230609081518.3039120-1-qi.zheng@linux.dev> References: <20230609081518.3039120-1-qi.zheng@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 0A9942000D X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: xbxckhqnfwbomf8xdkafqtwixhwsef1a X-HE-Tag: 1686298624-61193 X-HE-Meta: U2FsdGVkX18MJVuYb0vrqUB+uyNRPe0UGxU9uYQdNUCk/iDuT2KGVC+NHJUhV/kwnNQf3SFOcv8LvT+jzIgM3wt9M0MqXMMazg/tNyhyWJaM7tuZIvsZDz3aA8VguctRo6tHvDASe8AQpIaUZynoTi90RoWyNpoR2/H1i+7TXWxoPSRuCj2xmQijr6NWyJoumD5HBW2yfZ+oQfz1bAZNf6VC5ZE7mrtnu/NuPIEsBy6mon9L/nrdJ1J6lVzMc7rrnenK+BDMfXtoJFhQIUuxr+qrlwparBMoRvgG2lp1Nh0Z3W8iB7SEU88lwqlQmthT2c2K7Rk4luptdY7zH9Sk9x18J27lTuSjv1TkJ2+LbjJfjG77hGF+mHkPIcWytRXEUsDw4qEZ2ZOy2vmGagbVkPEbTn53OZJDzsjTa5obXYayfRMPKuvr3M9Moz5BAYkz6gAXOp1Z14Pye4MzPPQtg6bdewBwfJgYls2u8rGYFkCRfBbqsfcio1iv+qfCyROJduoeg8M4x45KSMZiMsSvmggXTHvw2soEpTFUbLot/lcddEP++DX6iMPYwSIZCXsTWkjXa8l67NDXdjek4a/k6Hu2GIkbhcf1isAS9U0mu601PrI6ySbPQWzp3XyxpA00/+2drg/iYtm0O0baQ6cJM9fOd/fQedOp/oMgrk48PPb7dDa944KBXNsNxSeAOXju6uy+K9+vYs460quXHOjP9KmDz7LwDpbaefwijp4Pe4zLtXah8x/dr5mHkXqANWzqSZtq5FT1XUIEp1TGrEnhsHlYQoG8oPH8ow2BKZBT2YOYM1V43hNMw4cDPOcpeI0+aBnX+YYWk3k5iXoG6uJ4REbwLFyc92ga4fsfZdarzDGCqs1jOe7cyXWInYKKqsqYq5wdrUtssbA3lVJCqU258X5GjMxjYEzpgTysiXhLfHCEoa/bSYh2lroZMhOwMY1NC2MEcfO9O9h3dKwsBqE bXLPv4Wt OKxD0dQWy9YsJTkA6TKWMygeKuYbaNOMi2OaECfxpS6rdNmYIrdX/K1myyzoRUooEtO895wdyZ9J6rIbIzHUv51yALyYrRPvb97GP/VkF5eaRwv/JpVSAR7n/0TeCy7xxtG4W1PkfvRkaJpODe4xozwtpkwThp+fTkpWqXLL7ibr0OLmAGEONum+KnMDPvc+wPNIRUkEFkeJ64EnYeqFVgnByT5iZB+t7Cw5lfq+nYLJNsVAU6II7i2lzI9SW/+Q+KRqMhH9VnU2/aMo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Qi Zheng This reverts commit cf2e309ebca7bb0916771839f9b580b06c778530. Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec test case [1], which is caused by commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). The root cause is that SRCU has to be careful to not frequently check for SRCU read-side critical section exits. Therefore, even if no one is currently in the SRCU read-side critical section, synchronize_srcu() cannot return quickly. That's why unregister_shrinker() has become slower. After discussion, we will try to use the refcount+RCU method [2] proposed by Dave Chinner to continue to re-implement the lockless slab shrink. So revert the shrinker_mutex back to shrinker_rwsem first. [1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/ [2]. https://lore.kernel.org/lkml/ZIJhou1d55d4H1s0@dread.disaster.area/ Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202305230837.db2c233f-yujie.liu@intel.com Signed-off-by: Qi Zheng --- drivers/md/dm-cache-metadata.c | 2 +- drivers/md/dm-thin-metadata.c | 2 +- fs/super.c | 2 +- mm/shrinker_debug.c | 14 +++++++------- mm/vmscan.c | 34 +++++++++++++++++----------------- 5 files changed, 27 insertions(+), 27 deletions(-) diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c index 9e0c69958587..acffed750e3e 100644 --- a/drivers/md/dm-cache-metadata.c +++ b/drivers/md/dm-cache-metadata.c @@ -1828,7 +1828,7 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd) * Replacement block manager (new_bm) is created and old_bm destroyed outside of * cmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of * shrinker associated with the block manager's bufio client vs cmd root_lock). - * - must take shrinker_mutex without holding cmd->root_lock + * - must take shrinker_rwsem without holding cmd->root_lock */ new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT, CACHE_MAX_CONCURRENT_LOCKS); diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c index 9f5cb52c5763..fd464fb024c3 100644 --- a/drivers/md/dm-thin-metadata.c +++ b/drivers/md/dm-thin-metadata.c @@ -1887,7 +1887,7 @@ int dm_pool_abort_metadata(struct dm_pool_metadata *pmd) * Replacement block manager (new_bm) is created and old_bm destroyed outside of * pmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of * shrinker associated with the block manager's bufio client vs pmd root_lock). - * - must take shrinker_mutex without holding pmd->root_lock + * - must take shrinker_rwsem without holding pmd->root_lock */ new_bm = dm_block_manager_create(pmd->bdev, THIN_METADATA_BLOCK_SIZE << SECTOR_SHIFT, THIN_MAX_CONCURRENT_LOCKS); diff --git a/fs/super.c b/fs/super.c index 34afe411cf2b..04bc62ab7dfe 100644 --- a/fs/super.c +++ b/fs/super.c @@ -54,7 +54,7 @@ static char *sb_writers_name[SB_FREEZE_LEVELS] = { * One thing we have to be careful of with a per-sb shrinker is that we don't * drop the last active reference to the superblock from within the shrinker. * If that happens we could trigger unregistering the shrinker from within the - * shrinker path and that leads to deadlock on the shrinker_mutex. Hence we + * shrinker path and that leads to deadlock on the shrinker_rwsem. Hence we * take a passive reference to the superblock to avoid this from occurring. */ static unsigned long super_cache_scan(struct shrinker *shrink, diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c index fe10436d9911..2be15b8a6d0b 100644 --- a/mm/shrinker_debug.c +++ b/mm/shrinker_debug.c @@ -8,7 +8,7 @@ #include /* defined in vmscan.c */ -extern struct mutex shrinker_mutex; +extern struct rw_semaphore shrinker_rwsem; extern struct list_head shrinker_list; extern struct srcu_struct shrinker_srcu; @@ -168,7 +168,7 @@ int shrinker_debugfs_add(struct shrinker *shrinker) char buf[128]; int id; - lockdep_assert_held(&shrinker_mutex); + lockdep_assert_held(&shrinker_rwsem); /* debugfs isn't initialized yet, add debugfs entries later. */ if (!shrinker_debugfs_root) @@ -211,7 +211,7 @@ int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...) if (!new) return -ENOMEM; - mutex_lock(&shrinker_mutex); + down_write(&shrinker_rwsem); old = shrinker->name; shrinker->name = new; @@ -229,7 +229,7 @@ int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...) shrinker->debugfs_entry = entry; } - mutex_unlock(&shrinker_mutex); + up_write(&shrinker_rwsem); kfree_const(old); @@ -242,7 +242,7 @@ struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker, { struct dentry *entry = shrinker->debugfs_entry; - lockdep_assert_held(&shrinker_mutex); + lockdep_assert_held(&shrinker_rwsem); kfree_const(shrinker->name); shrinker->name = NULL; @@ -271,14 +271,14 @@ static int __init shrinker_debugfs_init(void) shrinker_debugfs_root = dentry; /* Create debugfs entries for shrinkers registered at boot */ - mutex_lock(&shrinker_mutex); + down_write(&shrinker_rwsem); list_for_each_entry(shrinker, &shrinker_list, list) if (!shrinker->debugfs_entry) { ret = shrinker_debugfs_add(shrinker); if (ret) break; } - mutex_unlock(&shrinker_mutex); + up_write(&shrinker_rwsem); return ret; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 6d0cd2840cf0..4730dba253c8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -35,7 +35,7 @@ #include #include #include -#include +#include #include #include #include @@ -190,7 +190,7 @@ struct scan_control { int vm_swappiness = 60; LIST_HEAD(shrinker_list); -DEFINE_MUTEX(shrinker_mutex); +DECLARE_RWSEM(shrinker_rwsem); DEFINE_SRCU(shrinker_srcu); static atomic_t shrinker_srcu_generation = ATOMIC_INIT(0); @@ -213,7 +213,7 @@ static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg, { return srcu_dereference_check(memcg->nodeinfo[nid]->shrinker_info, &shrinker_srcu, - lockdep_is_held(&shrinker_mutex)); + lockdep_is_held(&shrinker_rwsem)); } static struct shrinker_info *shrinker_info_srcu(struct mem_cgroup *memcg, @@ -292,7 +292,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg) int nid, size, ret = 0; int map_size, defer_size = 0; - mutex_lock(&shrinker_mutex); + down_write(&shrinker_rwsem); map_size = shrinker_map_size(shrinker_nr_max); defer_size = shrinker_defer_size(shrinker_nr_max); size = map_size + defer_size; @@ -308,7 +308,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg) info->map_nr_max = shrinker_nr_max; rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info); } - mutex_unlock(&shrinker_mutex); + up_write(&shrinker_rwsem); return ret; } @@ -324,7 +324,7 @@ static int expand_shrinker_info(int new_id) if (!root_mem_cgroup) goto out; - lockdep_assert_held(&shrinker_mutex); + lockdep_assert_held(&shrinker_rwsem); map_size = shrinker_map_size(new_nr_max); defer_size = shrinker_defer_size(new_nr_max); @@ -374,7 +374,7 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker) if (mem_cgroup_disabled()) return -ENOSYS; - mutex_lock(&shrinker_mutex); + down_write(&shrinker_rwsem); id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL); if (id < 0) goto unlock; @@ -388,7 +388,7 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker) shrinker->id = id; ret = 0; unlock: - mutex_unlock(&shrinker_mutex); + up_write(&shrinker_rwsem); return ret; } @@ -398,7 +398,7 @@ static void unregister_memcg_shrinker(struct shrinker *shrinker) BUG_ON(id < 0); - lockdep_assert_held(&shrinker_mutex); + lockdep_assert_held(&shrinker_rwsem); idr_remove(&shrinker_idr, id); } @@ -433,7 +433,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg) parent = root_mem_cgroup; /* Prevent from concurrent shrinker_info expand */ - mutex_lock(&shrinker_mutex); + down_write(&shrinker_rwsem); for_each_node(nid) { child_info = shrinker_info_protected(memcg, nid); parent_info = shrinker_info_protected(parent, nid); @@ -442,7 +442,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg) atomic_long_add(nr, &parent_info->nr_deferred[i]); } } - mutex_unlock(&shrinker_mutex); + up_write(&shrinker_rwsem); } static bool cgroup_reclaim(struct scan_control *sc) @@ -743,9 +743,9 @@ void free_prealloced_shrinker(struct shrinker *shrinker) shrinker->name = NULL; #endif if (shrinker->flags & SHRINKER_MEMCG_AWARE) { - mutex_lock(&shrinker_mutex); + down_write(&shrinker_rwsem); unregister_memcg_shrinker(shrinker); - mutex_unlock(&shrinker_mutex); + up_write(&shrinker_rwsem); return; } @@ -755,11 +755,11 @@ void free_prealloced_shrinker(struct shrinker *shrinker) void register_shrinker_prepared(struct shrinker *shrinker) { - mutex_lock(&shrinker_mutex); + down_write(&shrinker_rwsem); list_add_tail_rcu(&shrinker->list, &shrinker_list); shrinker->flags |= SHRINKER_REGISTERED; shrinker_debugfs_add(shrinker); - mutex_unlock(&shrinker_mutex); + up_write(&shrinker_rwsem); } static int __register_shrinker(struct shrinker *shrinker) @@ -810,13 +810,13 @@ void unregister_shrinker(struct shrinker *shrinker) if (!(shrinker->flags & SHRINKER_REGISTERED)) return; - mutex_lock(&shrinker_mutex); + down_write(&shrinker_rwsem); list_del_rcu(&shrinker->list); shrinker->flags &= ~SHRINKER_REGISTERED; if (shrinker->flags & SHRINKER_MEMCG_AWARE) unregister_memcg_shrinker(shrinker); debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id); - mutex_unlock(&shrinker_mutex); + up_write(&shrinker_rwsem); atomic_inc(&shrinker_srcu_generation); synchronize_srcu(&shrinker_srcu);