From patchwork Fri Feb 16 08:55:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13559702 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E46BC48BEB for ; Fri, 16 Feb 2024 08:55:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCC528D0013; Fri, 16 Feb 2024 03:55:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C336E8D0001; Fri, 16 Feb 2024 03:55:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A57748D0013; Fri, 16 Feb 2024 03:55:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8A9278D0001 for ; Fri, 16 Feb 2024 03:55:38 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 56FE41C0329 for ; Fri, 16 Feb 2024 08:55:38 +0000 (UTC) X-FDA: 81797058756.18.C27B5C2 Received: from out-176.mta0.migadu.com (out-176.mta0.migadu.com [91.218.175.176]) by imf07.hostedemail.com (Postfix) with ESMTP id B250940006 for ; Fri, 16 Feb 2024 08:55:36 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine); spf=pass (imf07.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.176 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708073736; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BOqo8t6QtRMGymVBjEzohHDsKUMDGqvR3SihuPWiQk0=; b=gcr0PgekbTEBRxWUau3As/crY33mFd7FvfR3INM/r/cFaY+6vHmvBkJ2VNLGOhs9GSXWTP g79s0Lb1R1Th5Ue1VpLTA+vfDbQgI4/Nd6lfX1rZ+qRRrEgOARyUQKEIt7po7loeJJd7Dc fZVlzP5PzbenRYF2NWcBFN+HKqX1OQs= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine); spf=pass (imf07.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.176 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708073736; a=rsa-sha256; cv=none; b=tCvcZCQQX5ReG6l46VDBh6m6G97ytixB2vlpj7str5k+Me8xQ3/HCxcNATot3PHoSoDp5L ntnBZLfYzQgn86Zi1mmRd4xUBtWf4+RVehI6PPr/GlIyaDjUeXpA6wFDdsPOsiVhEQ353t lNu4HVxQScUJP0hGgX9StjJkwjfqM7I= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Fri, 16 Feb 2024 08:55:04 +0000 Subject: [PATCH v3 1/2] mm/zswap: global lru and shrinker shared by all zswap_pools MIME-Version: 1.0 Message-Id: <20240210-zswap-global-lru-v3-1-200495333595@bytedance.com> References: <20240210-zswap-global-lru-v3-0-200495333595@bytedance.com> In-Reply-To: <20240210-zswap-global-lru-v3-0-200495333595@bytedance.com> To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Andrew Morton Cc: linux-mm@kvack.org, Yosry Ahmed , linux-kernel@vger.kernel.org, Chengming Zhou X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Stat-Signature: gadjh8nk7p3tooezpm5ih7753ttnycir X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B250940006 X-Rspam: Yes X-HE-Tag: 1708073736-815845 X-HE-Meta: U2FsdGVkX183KvzCjhvvvEiXI8X62Z2GBAKnuVXsPiHzFD+c2G2QMr0feJd4KEwGmWnGwqoQxBpnW0paFVlVcDgtWz0y8IhCJlyRLmLQhjqlVFDwoGYvx9xBPF053r72l8VJhuqHdRuVVKyfmUP5RlvIx5tydBAn5bkRISWNPyb60pcwEYIhvkjnHYni878CUVV7NZoLKb86t5Os60+Q9ZpWxQmPncJi6MLCvJnm23aCoriMXWjtTqE0V73QZcUgLX70weJV1bh/rjnF5FaZ6VUphg8yASoxhZUdUCf/VswertO+moSPSRyB5YBghfl88rgjyiAt4mV7QXAAJ8ZiJX2YNnrK/O2tErQLh99eP4xuLFfAaFramYR0dVosGnWAgRcNPR2hMMbY6egUuSi2jvvGJVcnoY6XorTdvKQ3j7c0zMqiXPAZVVHGkM0APSBqzBIqlhI/7DAWN6FWyPsimWtb/1gbR4VWqhTLpXQRmENYDKh7pT3spgQdPDmzaNPZD3PtSLtXJ/hG+mKS6jHS+3glDqPQSb4O0B61ySVijUD4kOy9K5uMeRAt9qn5JdcrTNQy7DqkIipFbD09keKYWZNk2N7vUpkwuX0iWNhVmsj/AnodCMGXucLjTcW2l1hJnXBHLozQUkaApK2BmTbon5sz2kH0pjeB8HrpFhMovyhNcZiD9bIwoZiz+7jXc7ZgEvt/qoNvFaSyhVgguzedvvWCbzQ+AKyQ0UDnAD2ULeWtxlwf4CQ/64Mx2Tv5vt8zSa9BSNHjldMTHIOi33qFGgSEFHIEnD90oReNcFQTVpXRX5vCGaMQ3XAj6k794d/PosGgh1OU3rIOcGU5VomTjmp1rQ3UCjqquBhYYAuWp7drBtiFTveCfo4mAzgETLyAijoY5cafYVME7XZW9/tijDJ+4neo/711/Ap46qduAKczW9Os+pFvQwNftdL/n6BnSyj3YiN6akh+m2nCaGZ GiLoeQBS rAmvTPMOvPyl94GfCNNaIMnC0KXhph6ip+Ho5GCz14on7g0nKwKdZUvRT3Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Dynamic zswap_pool creation may create/reuse to have multiple zswap_pools in a list, only the first will be current used. Each zswap_pool has its own lru and shrinker, which is not necessary and has its problem: 1. When memory has pressure, all shrinker of zswap_pools will try to shrink its own lru, there is no order between them. 2. When zswap limit hit, only the last zswap_pool's shrink_work will try to shrink its lru list. The rationale here was to try and empty the old pool first so that we can completely drop it. However, since we only support exclusive loads now, the LRU ordering should be entirely decided by the order of stores, so the oldest entries on the LRU will naturally be from the oldest pool. Anyway, having a global lru and shrinker shared by all zswap_pools is better and efficient. Acked-by: Yosry Ahmed Signed-off-by: Chengming Zhou --- mm/zswap.c | 171 ++++++++++++++++++++++++------------------------------------- 1 file changed, 66 insertions(+), 105 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 62fe307521c9..d275eb523fc4 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -176,14 +176,19 @@ struct zswap_pool { struct kref kref; struct list_head list; struct work_struct release_work; - struct work_struct shrink_work; struct hlist_node node; char tfm_name[CRYPTO_MAX_ALG_NAME]; +}; + +static struct { struct list_lru list_lru; - struct mem_cgroup *next_shrink; - struct shrinker *shrinker; atomic_t nr_stored; -}; + struct shrinker *shrinker; + struct work_struct shrink_work; + struct mem_cgroup *next_shrink; + /* The lock protects next_shrink. */ + spinlock_t shrink_lock; +} zswap; /* * struct zswap_entry @@ -301,9 +306,6 @@ static void zswap_update_total_size(void) * pool functions **********************************/ -static void zswap_alloc_shrinker(struct zswap_pool *pool); -static void shrink_worker(struct work_struct *w); - static struct zswap_pool *zswap_pool_create(char *type, char *compressor) { int i; @@ -353,30 +355,16 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) if (ret) goto error; - zswap_alloc_shrinker(pool); - if (!pool->shrinker) - goto error; - - pr_debug("using %s compressor\n", pool->tfm_name); - /* being the current pool takes 1 ref; this func expects the * caller to always add the new pool as the current pool */ kref_init(&pool->kref); INIT_LIST_HEAD(&pool->list); - if (list_lru_init_memcg(&pool->list_lru, pool->shrinker)) - goto lru_fail; - shrinker_register(pool->shrinker); - INIT_WORK(&pool->shrink_work, shrink_worker); - atomic_set(&pool->nr_stored, 0); zswap_pool_debug("created", pool); return pool; -lru_fail: - list_lru_destroy(&pool->list_lru); - shrinker_free(pool->shrinker); error: if (pool->acomp_ctx) free_percpu(pool->acomp_ctx); @@ -434,15 +422,8 @@ static void zswap_pool_destroy(struct zswap_pool *pool) zswap_pool_debug("destroying", pool); - shrinker_free(pool->shrinker); cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->node); free_percpu(pool->acomp_ctx); - list_lru_destroy(&pool->list_lru); - - spin_lock(&zswap_pools_lock); - mem_cgroup_iter_break(NULL, pool->next_shrink); - pool->next_shrink = NULL; - spin_unlock(&zswap_pools_lock); for (i = 0; i < ZSWAP_NR_ZPOOLS; i++) zpool_destroy_pool(pool->zpools[i]); @@ -529,24 +510,6 @@ static struct zswap_pool *zswap_pool_current_get(void) return pool; } -static struct zswap_pool *zswap_pool_last_get(void) -{ - struct zswap_pool *pool, *last = NULL; - - rcu_read_lock(); - - list_for_each_entry_rcu(pool, &zswap_pools, list) - last = pool; - WARN_ONCE(!last && zswap_has_pool, - "%s: no page storage pool!\n", __func__); - if (!zswap_pool_get(last)) - last = NULL; - - rcu_read_unlock(); - - return last; -} - /* type and compressor must be null-terminated */ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) { @@ -816,15 +779,11 @@ void zswap_folio_swapin(struct folio *folio) void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) { - struct zswap_pool *pool; - - /* lock out zswap pools list modification */ - spin_lock(&zswap_pools_lock); - list_for_each_entry(pool, &zswap_pools, list) { - if (pool->next_shrink == memcg) - pool->next_shrink = mem_cgroup_iter(NULL, pool->next_shrink, NULL); - } - spin_unlock(&zswap_pools_lock); + /* lock out zswap shrinker walking memcg tree */ + spin_lock(&zswap.shrink_lock); + if (zswap.next_shrink == memcg) + zswap.next_shrink = mem_cgroup_iter(NULL, zswap.next_shrink, NULL); + spin_unlock(&zswap.shrink_lock); } /********************************* @@ -923,9 +882,9 @@ static void zswap_entry_free(struct zswap_entry *entry) if (!entry->length) atomic_dec(&zswap_same_filled_pages); else { - zswap_lru_del(&entry->pool->list_lru, entry); + zswap_lru_del(&zswap.list_lru, entry); zpool_free(zswap_find_zpool(entry), entry->handle); - atomic_dec(&entry->pool->nr_stored); + atomic_dec(&zswap.nr_stored); zswap_pool_put(entry->pool); } if (entry->objcg) { @@ -1288,7 +1247,6 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker, { struct lruvec *lruvec = mem_cgroup_lruvec(sc->memcg, NODE_DATA(sc->nid)); unsigned long shrink_ret, nr_protected, lru_size; - struct zswap_pool *pool = shrinker->private_data; bool encountered_page_in_swapcache = false; if (!zswap_shrinker_enabled || @@ -1299,7 +1257,7 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker, nr_protected = atomic_long_read(&lruvec->zswap_lruvec_state.nr_zswap_protected); - lru_size = list_lru_shrink_count(&pool->list_lru, sc); + lru_size = list_lru_shrink_count(&zswap.list_lru, sc); /* * Abort if we are shrinking into the protected region. @@ -1316,7 +1274,7 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker, return SHRINK_STOP; } - shrink_ret = list_lru_shrink_walk(&pool->list_lru, sc, &shrink_memcg_cb, + shrink_ret = list_lru_shrink_walk(&zswap.list_lru, sc, &shrink_memcg_cb, &encountered_page_in_swapcache); if (encountered_page_in_swapcache) @@ -1328,7 +1286,6 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker, static unsigned long zswap_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc) { - struct zswap_pool *pool = shrinker->private_data; struct mem_cgroup *memcg = sc->memcg; struct lruvec *lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(sc->nid)); unsigned long nr_backing, nr_stored, nr_freeable, nr_protected; @@ -1342,8 +1299,8 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); #else /* use pool stats instead of memcg stats */ - nr_backing = get_zswap_pool_size(pool) >> PAGE_SHIFT; - nr_stored = atomic_read(&pool->nr_stored); + nr_backing = zswap_pool_total_size >> PAGE_SHIFT; + nr_stored = atomic_read(&zswap.nr_stored); #endif if (!nr_stored) @@ -1351,7 +1308,7 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, nr_protected = atomic_long_read(&lruvec->zswap_lruvec_state.nr_zswap_protected); - nr_freeable = list_lru_shrink_count(&pool->list_lru, sc); + nr_freeable = list_lru_shrink_count(&zswap.list_lru, sc); /* * Subtract the lru size by an estimate of the number of pages * that should be protected. @@ -1367,23 +1324,24 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, return mult_frac(nr_freeable, nr_backing, nr_stored); } -static void zswap_alloc_shrinker(struct zswap_pool *pool) +static struct shrinker *zswap_alloc_shrinker(void) { - pool->shrinker = + struct shrinker *shrinker; + + shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE, "mm-zswap"); - if (!pool->shrinker) - return; + if (!shrinker) + return NULL; - pool->shrinker->private_data = pool; - pool->shrinker->scan_objects = zswap_shrinker_scan; - pool->shrinker->count_objects = zswap_shrinker_count; - pool->shrinker->batch = 0; - pool->shrinker->seeks = DEFAULT_SEEKS; + shrinker->scan_objects = zswap_shrinker_scan; + shrinker->count_objects = zswap_shrinker_count; + shrinker->batch = 0; + shrinker->seeks = DEFAULT_SEEKS; + return shrinker; } static int shrink_memcg(struct mem_cgroup *memcg) { - struct zswap_pool *pool; int nid, shrunk = 0; if (!mem_cgroup_zswap_writeback_enabled(memcg)) @@ -1396,32 +1354,25 @@ static int shrink_memcg(struct mem_cgroup *memcg) if (memcg && !mem_cgroup_online(memcg)) return -ENOENT; - pool = zswap_pool_current_get(); - if (!pool) - return -EINVAL; - for_each_node_state(nid, N_NORMAL_MEMORY) { unsigned long nr_to_walk = 1; - shrunk += list_lru_walk_one(&pool->list_lru, nid, memcg, + shrunk += list_lru_walk_one(&zswap.list_lru, nid, memcg, &shrink_memcg_cb, NULL, &nr_to_walk); } - zswap_pool_put(pool); return shrunk ? 0 : -EAGAIN; } static void shrink_worker(struct work_struct *w) { - struct zswap_pool *pool = container_of(w, typeof(*pool), - shrink_work); struct mem_cgroup *memcg; int ret, failures = 0; /* global reclaim will select cgroup in a round-robin fashion. */ do { - spin_lock(&zswap_pools_lock); - pool->next_shrink = mem_cgroup_iter(NULL, pool->next_shrink, NULL); - memcg = pool->next_shrink; + spin_lock(&zswap.shrink_lock); + zswap.next_shrink = mem_cgroup_iter(NULL, zswap.next_shrink, NULL); + memcg = zswap.next_shrink; /* * We need to retry if we have gone through a full round trip, or if we @@ -1435,7 +1386,7 @@ static void shrink_worker(struct work_struct *w) * memcg is not killed when we are reclaiming. */ if (!memcg) { - spin_unlock(&zswap_pools_lock); + spin_unlock(&zswap.shrink_lock); if (++failures == MAX_RECLAIM_RETRIES) break; @@ -1445,15 +1396,15 @@ static void shrink_worker(struct work_struct *w) if (!mem_cgroup_tryget_online(memcg)) { /* drop the reference from mem_cgroup_iter() */ mem_cgroup_iter_break(NULL, memcg); - pool->next_shrink = NULL; - spin_unlock(&zswap_pools_lock); + zswap.next_shrink = NULL; + spin_unlock(&zswap.shrink_lock); if (++failures == MAX_RECLAIM_RETRIES) break; goto resched; } - spin_unlock(&zswap_pools_lock); + spin_unlock(&zswap.shrink_lock); ret = shrink_memcg(memcg); /* drop the extra reference */ @@ -1467,7 +1418,6 @@ static void shrink_worker(struct work_struct *w) resched: cond_resched(); } while (!zswap_can_accept()); - zswap_pool_put(pool); } static int zswap_is_page_same_filled(void *ptr, unsigned long *value) @@ -1508,7 +1458,6 @@ bool zswap_store(struct folio *folio) struct zswap_entry *entry, *dupentry; struct obj_cgroup *objcg = NULL; struct mem_cgroup *memcg = NULL; - struct zswap_pool *shrink_pool; VM_WARN_ON_ONCE(!folio_test_locked(folio)); VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); @@ -1576,7 +1525,7 @@ bool zswap_store(struct folio *folio) if (objcg) { memcg = get_mem_cgroup_from_objcg(objcg); - if (memcg_list_lru_alloc(memcg, &entry->pool->list_lru, GFP_KERNEL)) { + if (memcg_list_lru_alloc(memcg, &zswap.list_lru, GFP_KERNEL)) { mem_cgroup_put(memcg); goto put_pool; } @@ -1607,8 +1556,8 @@ bool zswap_store(struct folio *folio) } if (entry->length) { INIT_LIST_HEAD(&entry->lru); - zswap_lru_add(&entry->pool->list_lru, entry); - atomic_inc(&entry->pool->nr_stored); + zswap_lru_add(&zswap.list_lru, entry); + atomic_inc(&zswap.nr_stored); } spin_unlock(&tree->lock); @@ -1640,9 +1589,7 @@ bool zswap_store(struct folio *folio) return false; shrink: - shrink_pool = zswap_pool_last_get(); - if (shrink_pool && !queue_work(shrink_wq, &shrink_pool->shrink_work)) - zswap_pool_put(shrink_pool); + queue_work(shrink_wq, &zswap.shrink_work); goto reject; } @@ -1804,6 +1751,22 @@ static int zswap_setup(void) if (ret) goto hp_fail; + shrink_wq = alloc_workqueue("zswap-shrink", + WQ_UNBOUND|WQ_MEM_RECLAIM, 1); + if (!shrink_wq) + goto shrink_wq_fail; + + zswap.shrinker = zswap_alloc_shrinker(); + if (!zswap.shrinker) + goto shrinker_fail; + if (list_lru_init_memcg(&zswap.list_lru, zswap.shrinker)) + goto lru_fail; + shrinker_register(zswap.shrinker); + + INIT_WORK(&zswap.shrink_work, shrink_worker); + atomic_set(&zswap.nr_stored, 0); + spin_lock_init(&zswap.shrink_lock); + pool = __zswap_pool_create_fallback(); if (pool) { pr_info("loaded using pool %s/%s\n", pool->tfm_name, @@ -1815,19 +1778,17 @@ static int zswap_setup(void) zswap_enabled = false; } - shrink_wq = alloc_workqueue("zswap-shrink", - WQ_UNBOUND|WQ_MEM_RECLAIM, 1); - if (!shrink_wq) - goto fallback_fail; - if (zswap_debugfs_init()) pr_warn("debugfs initialization failed\n"); zswap_init_state = ZSWAP_INIT_SUCCEED; return 0; -fallback_fail: - if (pool) - zswap_pool_destroy(pool); +lru_fail: + shrinker_free(zswap.shrinker); +shrinker_fail: + destroy_workqueue(shrink_wq); +shrink_wq_fail: + cpuhp_remove_multi_state(CPUHP_MM_ZSWP_POOL_PREPARE); hp_fail: kmem_cache_destroy(zswap_entry_cache); cache_fail: From patchwork Fri Feb 16 08:55:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13559703 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47046C48BEC for ; Fri, 16 Feb 2024 08:55:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C79898D0014; Fri, 16 Feb 2024 03:55:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C03C58D0001; Fri, 16 Feb 2024 03:55:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA1D98D0014; Fri, 16 Feb 2024 03:55:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8FA218D0001 for ; Fri, 16 Feb 2024 03:55:40 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5D02A1603B0 for ; Fri, 16 Feb 2024 08:55:40 +0000 (UTC) X-FDA: 81797058840.06.012CBDD Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com [91.218.175.184]) by imf10.hostedemail.com (Postfix) with ESMTP id 90BA2C0009 for ; Fri, 16 Feb 2024 08:55:38 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine); spf=pass (imf10.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.184 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708073738; a=rsa-sha256; cv=none; b=HgffLkdU/RuGZph3PvoJsJAwXhInkIIEbWaO9AgrTCZKk0LWmFSK6lzfsCEOj81SpDMPl+ qEDyi577tJGed1vZBOIQEiTG7DSqO7MoeCXktIUPYYkWpCLO9SV9hfDwjEhQ+zUCrOZl+2 1QmV+liqgtwQj23C7zYMPhA7i0LY7uc= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine); spf=pass (imf10.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.184 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708073738; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l7jMbQSDWATokAiFpRFYHIQIVCg9/U265+bQ86tP450=; b=FG9oRQ5X3CpwJX+tRBGsV8Qo+OK2QOKb++3EsYLNM23ZipEL2JAxt+BFHomk3T1okKS/cY YcF5pQ9pzeQ2pX0YgnAl9fVKbgpXW/UKXJSeSumaa+mOPKMUpgL1NjT/iqjhmUhSAdShV+ To7obQja1udDRy791zPj5plEjjJOFVA= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Fri, 16 Feb 2024 08:55:05 +0000 Subject: [PATCH v3 2/2] mm/zswap: change zswap_pool kref to percpu_ref MIME-Version: 1.0 Message-Id: <20240210-zswap-global-lru-v3-2-200495333595@bytedance.com> References: <20240210-zswap-global-lru-v3-0-200495333595@bytedance.com> In-Reply-To: <20240210-zswap-global-lru-v3-0-200495333595@bytedance.com> To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Andrew Morton Cc: linux-mm@kvack.org, Yosry Ahmed , linux-kernel@vger.kernel.org, Chengming Zhou X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 90BA2C0009 X-Stat-Signature: ertki8yyzq9e4xemnyky8os5r1unsru4 X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam-User: X-Rspam: Yes X-HE-Tag: 1708073738-280415 X-HE-Meta: U2FsdGVkX19SKE/xnDUZNPp8i+WWb70HGg8haCy5m9NyvInoXuaHq6mGhYjzi9DbYeFjlDUUDbVsg9toC3w3ZAvkISULxg9RvVwL0f2GRwhYkhthZEvP5F9jPocrZ+YagutRGmpzTgs+jOO2UX0/rM+yURz501bdjw0ltm7mYRmYLuRt6pZpXldL3NmIxbdlES6hpd5i+C2HqPTB7rL7CmmIeJ2y4zCsTgYttZLGGsgWWqoQUPmu4fe6ga3L0CLeG28i8xIVl6FMHGGpyWMheIqIC69ttxppXK+mDJas7ijq4j4RHyKoAAezRQA2oA9IJfSzxt0tmqzC/YlqI3YieiMi3QSN3DVvykuFnwLBEElU6Zyb68hKppocT7oSxhQ3jzsAtO6ZG7QSiwXiP/AtOe9LQgqe6JC2FtconMkM2g2Ga3RUEqfn7RtYpnSt3MQN+hcA9yGOvQaiA1zuMqCsAT5AQ1CuE3Lk6uMbRebciNu/0vGT3TJb89g2Vet3T5pMazIthBLWXgCUNyH5Y6hRAgjX6U3YGeTJ4SLkqlHjWnGm5JQclFLth55GvxacFAsAGskZW13ayTSWR6xUZxDSClYBeZPQbaKBGy17R8/4XVQAJxpxkruJIfbwCGhqQ0SPDKKherAQzAFhj5x22SD/e/1DCrOUAsJelb/2uNodIeq/E5EmcXQ5/YcOGGjvuascxsyv6V2tfzM1Bx1BGmiYvsIMOczGLyZUbeEZ+byGs/W+GGPwBjwHhqsw4MMKeU+/oeWDRvI3pTTeiBrXjtgnXZJ77EuGjys8KnwNORgtfIucx17m+uJ+R3cQxWr2HVWvNJewyvu6H+IahkLi62FQhMDOpG6kwsJzvYgGWmLLw8DiCGIzpDa+PbVihUcSTIzh6b4mN/L29pSclOb5JoIl+GgIBA8lrfF/OaWlEwlsR+S4Ev7w9kw1dVAHFUW0RX9Jrno6C7no/oZbHnqvk9i 0gQV/Gss SBYPVnXOsgpGFOKC+NS28k8WdII9gj4uFikzL+N/1jSv80VbcTecM095+RRGsWH4bU7LNEQRh3LC3MIHy+qeecKMpw56nJUWLGNN8DMjsH+eZ8lgn6zwm2bAF/Qe156L8vO1/E1dMUSnJrSjIs0CV+FjEyirDFWsCWDKpZs/Hc3vUoQ885m8VDC4YCsVAlNGrPcJUTrMRlAhxya0hhODtPmvNW2MnjP8zRig2KTvByE1cS584i8YPbDwA0RM8D+EB1osdS4fr7ayMjAe//BOod5oo2J1XG6zoETp+C1wMtsbBNIdiS7EFQe7Ooumqu4IzM485vLzDBc23Zc2PebBjNAUOId8jxJkilnL+2hU8bTsQPkXjG1+RT2KjWw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: All zswap entries will take a reference of zswap_pool when zswap_store(), and drop it when free. Change it to use the percpu_ref is better for scalability performance. Although percpu_ref use a bit more memory which should be ok for our use case, since we almost have only one zswap_pool to be using. The performance gain is for zswap_store/load hotpath. Testing kernel build (32 threads) in tmpfs with memory.max=2GB. (zswap shrinker and writeback enabled with one 50GB swapfile, on a 128 CPUs x86-64 machine, below is the average of 5 runs) mm-unstable zswap-global-lru real 63.20 63.12 user 1061.75 1062.95 sys 268.74 264.44 Signed-off-by: Chengming Zhou Reviewed-by: Nhat Pham --- mm/zswap.c | 36 +++++++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index d275eb523fc4..961349162997 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -173,7 +173,7 @@ struct crypto_acomp_ctx { struct zswap_pool { struct zpool *zpools[ZSWAP_NR_ZPOOLS]; struct crypto_acomp_ctx __percpu *acomp_ctx; - struct kref kref; + struct percpu_ref ref; struct list_head list; struct work_struct release_work; struct hlist_node node; @@ -305,6 +305,7 @@ static void zswap_update_total_size(void) /********************************* * pool functions **********************************/ +static void __zswap_pool_empty(struct percpu_ref *ref); static struct zswap_pool *zswap_pool_create(char *type, char *compressor) { @@ -358,13 +359,18 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) /* being the current pool takes 1 ref; this func expects the * caller to always add the new pool as the current pool */ - kref_init(&pool->kref); + ret = percpu_ref_init(&pool->ref, __zswap_pool_empty, + PERCPU_REF_ALLOW_REINIT, GFP_KERNEL); + if (ret) + goto ref_fail; INIT_LIST_HEAD(&pool->list); zswap_pool_debug("created", pool); return pool; +ref_fail: + cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->node); error: if (pool->acomp_ctx) free_percpu(pool->acomp_ctx); @@ -437,8 +443,9 @@ static void __zswap_pool_release(struct work_struct *work) synchronize_rcu(); - /* nobody should have been able to get a kref... */ - WARN_ON(kref_get_unless_zero(&pool->kref)); + /* nobody should have been able to get a ref... */ + WARN_ON(!percpu_ref_is_zero(&pool->ref)); + percpu_ref_exit(&pool->ref); /* pool is now off zswap_pools list and has no references. */ zswap_pool_destroy(pool); @@ -446,11 +453,11 @@ static void __zswap_pool_release(struct work_struct *work) static struct zswap_pool *zswap_pool_current(void); -static void __zswap_pool_empty(struct kref *kref) +static void __zswap_pool_empty(struct percpu_ref *ref) { struct zswap_pool *pool; - pool = container_of(kref, typeof(*pool), kref); + pool = container_of(ref, typeof(*pool), ref); spin_lock(&zswap_pools_lock); @@ -469,12 +476,12 @@ static int __must_check zswap_pool_get(struct zswap_pool *pool) if (!pool) return 0; - return kref_get_unless_zero(&pool->kref); + return percpu_ref_tryget(&pool->ref); } static void zswap_pool_put(struct zswap_pool *pool) { - kref_put(&pool->kref, __zswap_pool_empty); + percpu_ref_put(&pool->ref); } static struct zswap_pool *__zswap_pool_current(void) @@ -604,6 +611,17 @@ static int __zswap_param_set(const char *val, const struct kernel_param *kp, if (!pool) pool = zswap_pool_create(type, compressor); + else { + /* + * Restore the initial ref dropped by percpu_ref_kill() + * when the pool was decommissioned and switch it again + * to percpu mode. + */ + percpu_ref_resurrect(&pool->ref); + + /* Drop the ref from zswap_pool_find_get(). */ + zswap_pool_put(pool); + } if (pool) ret = param_set_charp(s, kp); @@ -642,7 +660,7 @@ static int __zswap_param_set(const char *val, const struct kernel_param *kp, * or the new pool we failed to add */ if (put_pool) - zswap_pool_put(put_pool); + percpu_ref_kill(&put_pool->ref); return ret; }