From patchwork Tue Jun 25 04:39:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13710593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0ECE5C30659 for ; Tue, 25 Jun 2024 04:40:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A67D6B0106; Tue, 25 Jun 2024 00:40:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 72F526B010B; Tue, 25 Jun 2024 00:40:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A98B6B0111; Tue, 25 Jun 2024 00:40:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 38A246B0106 for ; Tue, 25 Jun 2024 00:40:21 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B71A2121607 for ; Tue, 25 Jun 2024 04:40:20 +0000 (UTC) X-FDA: 82268159400.03.4E38063 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) by imf10.hostedemail.com (Postfix) with ESMTP id 4BBD3C000E for ; Tue, 25 Jun 2024 04:40:18 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=I8C6K7+o; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf10.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.183 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719290407; a=rsa-sha256; cv=none; b=tInUDJAOPrLYLjIty7Hx2qYrjxvvXvawRJN67gcbkXmVqRl1uWSKSI4VYV18bp1i/VDX+H Fau8P/0o3jw8h3Ni2boFfqGLr61HqgBHQ1f9Qt8Jswyp3t63vlFPgtKBdYfadpqnoTjsSg kWg8o6dgkYmL8VFaoJvTI1rVPztIMkQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=I8C6K7+o; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf10.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.183 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719290407; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qhaupN8u6oWxn/2WP6WAT1uabRmOBUWQszk75vxByX0=; b=bwGh1yrEQwzerwTgfSvo0c8q8S6/cY48pmHCywwLDn3JPqEPqnN6Sobs4N5hPh/W+L9fF/ 3YTHUjHh4QA9UG9FEUIU3JsWwfeM+InB7FA073n+0NkCcZqEJ3f5UUNz/rDB4aegTWV0zB /4bOMJIc0rc1P11Oo61JJqfv/y8C4vU= X-Envelope-To: senozhatsky@chromium.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719290416; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qhaupN8u6oWxn/2WP6WAT1uabRmOBUWQszk75vxByX0=; b=I8C6K7+ohmTA5rAW7lEillHIOZM4lfIoq3EJ3P4+Cj2G0rO+n424qAYem2AABNvLopz8jJ vXF3OH3deGn9A3q5WdFvjTmgoUewDAVb8bZj5KLMT27X21nb2lE0gSnGRlFj5LID81KzWr vDpZnk/YjIWCT7GpEudK4i9dp+4Ahms= X-Envelope-To: yuzhao@google.com X-Envelope-To: flintglass@gmail.com X-Envelope-To: akpm@linux-foundation.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: nphamcs@gmail.com X-Envelope-To: zhouchengming@bytedance.com X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: dan.carpenter@linaro.org X-Envelope-To: minchan@kernel.org X-Envelope-To: chengming.zhou@linux.dev X-Envelope-To: yosryahmed@google.com X-Envelope-To: linux-kernel@vger.kernel.org X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Tue, 25 Jun 2024 12:39:04 +0800 Subject: [PATCH v3 1/2] mm/zsmalloc: change back to per-size_class lock MIME-Version: 1.0 Message-Id: <20240625-zsmalloc-lock-mm-everything-v3-1-ad941699cb61@linux.dev> References: <20240625-zsmalloc-lock-mm-everything-v3-0-ad941699cb61@linux.dev> In-Reply-To: <20240625-zsmalloc-lock-mm-everything-v3-0-ad941699cb61@linux.dev> To: Minchan Kim , Sergey Senozhatsky , Andrew Morton , Johannes Weiner , Yosry Ahmed , Nhat Pham Cc: Yu Zhao , Takero Funaki , Chengming Zhou , Dan Carpenter , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chengming Zhou X-Developer-Signature: v=1; a=ed25519-sha256; t=1719290409; l=10529; i=chengming.zhou@linux.dev; s=20240617; h=from:subject:message-id; bh=faQksk11dxR3SfFd861ErjpnhC1vlj1hUyaviRtbZ8o=; b=cxW5zW/pX2WH7sDibcpuKSGbC8nowpAtIWa10VzNQU/DVxjcDjAgKYa3/xpEe/dxEJBMfgJrd H9du98vG3/oANsXybNqMxCFj60pCcGRS053aP0YENtao/jgg4bc3144 X-Developer-Key: i=chengming.zhou@linux.dev; a=ed25519; pk=/XPhIutBo+zyUeQyf4Ni5JYk/PEIWxIeUQqy2DYjmhI= X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 4BBD3C000E X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: n7tu6qmiz4w81jbcr3yas55gzjs3ibzy X-HE-Tag: 1719290418-123114 X-HE-Meta: U2FsdGVkX1+F3wnKwmnA4CKIMkGv5Fgz8amujGTLDqjRoDJzsE9w1OmyH26Irl2KOW9v06oNAWe2VyD+ojr+2EyRSppmtt85mflKVbIngfO8J7Nv5UI8fFnoxptOcxsiPnJT1gYgbBg2niD74pp/97gI+73cTPayft9SNGl8rFT2fr6dys7j80y5uF2eCGGrdOm+No4PRRuZ6yuQyVqCoVJydi477GY/SHvuWJ0P0GHYNBAkQUdqoBpu3J+jaEjqgZsJK4fxwCiPGrok+l7UT5F+JiGwOm82FMD6911ZYkfwANij72HS5cf3al6QVVEorSi79/+NUeGBd7r0x0hZInOdelBIjC7QhiAzoc6Y02kpAYZvXnzTcsPRvzsanHE0UDawXYO05QyyvbJZCN20ser5k6HzSRPe5aaT1VxFqmhjuuJVcbG5MAr3QqAEY/95nnYhBPc2ThmYl+isxjKiZxo88r90YlQO+c3TBO0yCcs2/GskLGRGvE+7zDFoxEKgxBhBX0Mc8Gr6cOzdex8XqcfwvsKKr60fQTVlRxUx88Ko8uafE4yBGT909ODxLiiPNg0XCzeHCi/+8zBWpkCVukg7nwJE8YiwPzBJ2XsLG1qIuv/MmGmdJyvY75cu/rmnhH2ho5pqbl0KngN2AEKb3XnLbp+99TkIm/jclghWUZoIQtx4Xniv07BjcTq4fa0vAFrtrTGBIMQt4UdD2lEGhhMxXaaJX45TjZQuqiZ+WKjLPtgLmAWwIL9w/fQEot7jx0VpVSUpviLnsgQfrJYQMfCGJo/Td4YaLwUd8VCRDMg0VD+YNzj2Yc/oa5Fhz1tjy8TuR2NpH/kmRZJbaMih1rJdfve+wg5sLz7eYp6V33+go1xy6R3jXbmjtLFQd+032ELjH+lKEheHQKOfTdHUhcpWA9ttn7sQp9dsgoh+zueOG666iWfyVbHKVJouvJYXkBPgtJs1IGB/44XmaxW SHRXEnoD awkHWjulUEzZBjg+kIxb0XucycO1efZTy3WJDXjXG5+8oSmvjjsZOdxYDnEUU62r4OJznJMGDU+uPOfQ4MfkY3f9I677Jhs7Tgo0R8WS6Tqy8JJlYEx5ETML4Dz70wDTKQ9Ck4geoDhvlZaPjSPvOyIm0Wm7TZX+lXvmfdrYqyWdLugDyenCZeQBkZTLMVD7Y5ZEo0BeOtKjuMBk/bp96CgReFEYNlHkzxHYL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch is almost the revert of the commit c0547d0b6a4b ("zsmalloc: consolidate zs_pool's migrate_lock and size_class's locks"), which changed to use a global pool->lock instead of per-size_class lock and pool->migrate_lock, was preparation for suppporting reclaim in zsmalloc. Then reclaim in zsmalloc had been dropped in favor of LRU reclaim in zswap. In theory, per-size_class is more fine-grained than the pool->lock, since a pool can have many size_classes. As for the additional pool->migrate_lock, only free() and map() need to grab it to access stable handle to get zspage, and only in read lock mode. Reviewed-by: Sergey Senozhatsky Signed-off-by: Chengming Zhou --- mm/zsmalloc.c | 85 +++++++++++++++++++++++++++++++++++------------------------ 1 file changed, 50 insertions(+), 35 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 44e0171d6003..fec1a39e5bbe 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -34,7 +34,8 @@ /* * lock ordering: * page_lock - * pool->lock + * pool->migrate_lock + * class->lock * zspage->lock */ @@ -183,6 +184,7 @@ static struct dentry *zs_stat_root; static size_t huge_class_size; struct size_class { + spinlock_t lock; struct list_head fullness_list[NR_FULLNESS_GROUPS]; /* * Size of objects stored in this class. Must be multiple @@ -237,7 +239,8 @@ struct zs_pool { #ifdef CONFIG_COMPACTION struct work_struct free_work; #endif - spinlock_t lock; + /* protect page/zspage migration */ + rwlock_t migrate_lock; atomic_t compaction_in_progress; }; @@ -336,7 +339,7 @@ static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage) kmem_cache_free(pool->zspage_cachep, zspage); } -/* pool->lock(which owns the handle) synchronizes races */ +/* class->lock(which owns the handle) synchronizes races */ static void record_obj(unsigned long handle, unsigned long obj) { *(unsigned long *)handle = obj; @@ -431,7 +434,7 @@ static __maybe_unused int is_first_page(struct page *page) return PagePrivate(page); } -/* Protected by pool->lock */ +/* Protected by class->lock */ static inline int get_zspage_inuse(struct zspage *zspage) { return zspage->inuse; @@ -569,7 +572,7 @@ static int zs_stats_size_show(struct seq_file *s, void *v) if (class->index != i) continue; - spin_lock(&pool->lock); + spin_lock(&class->lock); seq_printf(s, " %5u %5u ", i, class->size); for (fg = ZS_INUSE_RATIO_10; fg < NR_FULLNESS_GROUPS; fg++) { @@ -580,7 +583,7 @@ static int zs_stats_size_show(struct seq_file *s, void *v) obj_allocated = zs_stat_get(class, ZS_OBJS_ALLOCATED); obj_used = zs_stat_get(class, ZS_OBJS_INUSE); freeable = zs_can_compact(class); - spin_unlock(&pool->lock); + spin_unlock(&class->lock); objs_per_zspage = class->objs_per_zspage; pages_used = obj_allocated / objs_per_zspage * @@ -837,7 +840,7 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class, { struct page *page, *next; - assert_spin_locked(&pool->lock); + assert_spin_locked(&class->lock); VM_BUG_ON(get_zspage_inuse(zspage)); VM_BUG_ON(zspage->fullness != ZS_INUSE_RATIO_0); @@ -1196,19 +1199,19 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, BUG_ON(in_interrupt()); /* It guarantees it can get zspage from handle safely */ - spin_lock(&pool->lock); + read_lock(&pool->migrate_lock); obj = handle_to_obj(handle); obj_to_location(obj, &page, &obj_idx); zspage = get_zspage(page); /* - * migration cannot move any zpages in this zspage. Here, pool->lock + * migration cannot move any zpages in this zspage. Here, class->lock * is too heavy since callers would take some time until they calls * zs_unmap_object API so delegate the locking from class to zspage * which is smaller granularity. */ migrate_read_lock(zspage); - spin_unlock(&pool->lock); + read_unlock(&pool->migrate_lock); class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); @@ -1364,8 +1367,8 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) size += ZS_HANDLE_SIZE; class = pool->size_class[get_size_class_index(size)]; - /* pool->lock effectively protects the zpage migration */ - spin_lock(&pool->lock); + /* class->lock effectively protects the zpage migration */ + spin_lock(&class->lock); zspage = find_get_zspage(class); if (likely(zspage)) { obj = obj_malloc(pool, zspage, handle); @@ -1377,7 +1380,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) goto out; } - spin_unlock(&pool->lock); + spin_unlock(&class->lock); zspage = alloc_zspage(pool, class, gfp); if (!zspage) { @@ -1385,7 +1388,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) return (unsigned long)ERR_PTR(-ENOMEM); } - spin_lock(&pool->lock); + spin_lock(&class->lock); obj = obj_malloc(pool, zspage, handle); newfg = get_fullness_group(class, zspage); insert_zspage(class, zspage, newfg); @@ -1397,7 +1400,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) /* We completely set up zspage so mark them as movable */ SetZsPageMovable(pool, zspage); out: - spin_unlock(&pool->lock); + spin_unlock(&class->lock); return handle; } @@ -1442,14 +1445,16 @@ void zs_free(struct zs_pool *pool, unsigned long handle) return; /* - * The pool->lock protects the race with zpage's migration + * The pool->migrate_lock protects the race with zpage's migration * so it's safe to get the page from handle. */ - spin_lock(&pool->lock); + read_lock(&pool->migrate_lock); obj = handle_to_obj(handle); obj_to_page(obj, &f_page); zspage = get_zspage(f_page); class = zspage_class(pool, zspage); + spin_lock(&class->lock); + read_unlock(&pool->migrate_lock); class_stat_dec(class, ZS_OBJS_INUSE, 1); obj_free(class->size, obj); @@ -1458,7 +1463,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle) if (fullness == ZS_INUSE_RATIO_0) free_zspage(pool, class, zspage); - spin_unlock(&pool->lock); + spin_unlock(&class->lock); cache_free_handle(pool, handle); } EXPORT_SYMBOL_GPL(zs_free); @@ -1780,12 +1785,16 @@ static int zs_page_migrate(struct page *newpage, struct page *page, pool = zspage->pool; /* - * The pool's lock protects the race between zpage migration + * The pool migrate_lock protects the race between zpage migration * and zs_free. */ - spin_lock(&pool->lock); + write_lock(&pool->migrate_lock); class = zspage_class(pool, zspage); + /* + * the class lock protects zpage alloc/free in the zspage. + */ + spin_lock(&class->lock); /* the migrate_write_lock protects zpage access via zs_map_object */ migrate_write_lock(zspage); @@ -1815,9 +1824,10 @@ static int zs_page_migrate(struct page *newpage, struct page *page, replace_sub_page(class, zspage, newpage, page); /* * Since we complete the data copy and set up new zspage structure, - * it's okay to release the pool's lock. + * it's okay to release migration_lock. */ - spin_unlock(&pool->lock); + write_unlock(&pool->migrate_lock); + spin_unlock(&class->lock); migrate_write_unlock(zspage); get_page(newpage); @@ -1861,20 +1871,20 @@ static void async_free_zspage(struct work_struct *work) if (class->index != i) continue; - spin_lock(&pool->lock); + spin_lock(&class->lock); list_splice_init(&class->fullness_list[ZS_INUSE_RATIO_0], &free_pages); - spin_unlock(&pool->lock); + spin_unlock(&class->lock); } list_for_each_entry_safe(zspage, tmp, &free_pages, list) { list_del(&zspage->list); lock_zspage(zspage); - spin_lock(&pool->lock); class = zspage_class(pool, zspage); + spin_lock(&class->lock); __free_zspage(pool, class, zspage); - spin_unlock(&pool->lock); + spin_unlock(&class->lock); } }; @@ -1938,7 +1948,8 @@ static unsigned long __zs_compact(struct zs_pool *pool, * protect the race between zpage migration and zs_free * as well as zpage allocation/free */ - spin_lock(&pool->lock); + write_lock(&pool->migrate_lock); + spin_lock(&class->lock); while (zs_can_compact(class)) { int fg; @@ -1964,13 +1975,15 @@ static unsigned long __zs_compact(struct zs_pool *pool, src_zspage = NULL; if (get_fullness_group(class, dst_zspage) == ZS_INUSE_RATIO_100 - || spin_is_contended(&pool->lock)) { + || rwlock_is_contended(&pool->migrate_lock)) { putback_zspage(class, dst_zspage); dst_zspage = NULL; - spin_unlock(&pool->lock); + spin_unlock(&class->lock); + write_unlock(&pool->migrate_lock); cond_resched(); - spin_lock(&pool->lock); + write_lock(&pool->migrate_lock); + spin_lock(&class->lock); } } @@ -1980,7 +1993,8 @@ static unsigned long __zs_compact(struct zs_pool *pool, if (dst_zspage) putback_zspage(class, dst_zspage); - spin_unlock(&pool->lock); + spin_unlock(&class->lock); + write_unlock(&pool->migrate_lock); return pages_freed; } @@ -1992,10 +2006,10 @@ unsigned long zs_compact(struct zs_pool *pool) unsigned long pages_freed = 0; /* - * Pool compaction is performed under pool->lock so it is basically + * Pool compaction is performed under pool->migrate_lock so it is basically * single-threaded. Having more than one thread in __zs_compact() - * will increase pool->lock contention, which will impact other - * zsmalloc operations that need pool->lock. + * will increase pool->migrate_lock contention, which will impact other + * zsmalloc operations that need pool->migrate_lock. */ if (atomic_xchg(&pool->compaction_in_progress, 1)) return 0; @@ -2117,7 +2131,7 @@ struct zs_pool *zs_create_pool(const char *name) return NULL; init_deferred_free(pool); - spin_lock_init(&pool->lock); + rwlock_init(&pool->migrate_lock); atomic_set(&pool->compaction_in_progress, 0); pool->name = kstrdup(name, GFP_KERNEL); @@ -2189,6 +2203,7 @@ struct zs_pool *zs_create_pool(const char *name) class->index = i; class->pages_per_zspage = pages_per_zspage; class->objs_per_zspage = objs_per_zspage; + spin_lock_init(&class->lock); pool->size_class[i] = class; fullness = ZS_INUSE_RATIO_0;