From patchwork Fri Jan 19 11:22:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13523673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16627C47DAF for ; Fri, 19 Jan 2024 11:22:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 449F56B0088; Fri, 19 Jan 2024 06:22:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 35DB66B0089; Fri, 19 Jan 2024 06:22:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D79A6B008A; Fri, 19 Jan 2024 06:22:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0CBBD6B0088 for ; Fri, 19 Jan 2024 06:22:38 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D94501C1162 for ; Fri, 19 Jan 2024 11:22:37 +0000 (UTC) X-FDA: 81695822754.21.190CFF8 Received: from out-172.mta1.migadu.com (out-172.mta1.migadu.com [95.215.58.172]) by imf17.hostedemail.com (Postfix) with ESMTP id F21354000C for ; Fri, 19 Jan 2024 11:22:35 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705663356; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ikO956GB18FfcGvDIq2WkVL/ckirbExJKqWn8GVfx8c=; b=ZMOnFEoYoFJrBDKeyD/YPl2z+nqaAngl9KPyp1+Pd44ZF4g75JfXZtstZygQHFJ7YZnTAX QZH5fr2JLZHF+3W998EjJOGPlKZT0hBRP6IVy8ZtiMxWd/VdEiHjc/1jRojEs3CUcQjuLb dmAWTHFeaQ8/EJLAZfb+C/oMiplRk7A= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705663356; a=rsa-sha256; cv=none; b=kzqkerShTXkCZ+VDDF2ErUkAvWWQDpGlynKRoFMkx5lxWAxnvIUWBpYqxfRI4fw23quZDg HoGm/Wjcp9PHMVpW6wx6eEXkybHhcuVTS3ar1dYZBFmlJ6hxEhbnWVnBlqWDz0Fzi6hiQq WvLoRlbfzrw6Q/3Ipb3zkmyxKHYYsD4= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Fri, 19 Jan 2024 11:22:23 +0000 Subject: [PATCH v2 2/2] mm/zswap: split zswap rb-tree MIME-Version: 1.0 Message-Id: <20240117-b4-zswap-lock-optimize-v2-2-b5cc55479090@bytedance.com> References: <20240117-b4-zswap-lock-optimize-v2-0-b5cc55479090@bytedance.com> In-Reply-To: <20240117-b4-zswap-lock-optimize-v2-0-b5cc55479090@bytedance.com> To: Andrew Morton Cc: Nhat Pham , Yosry Ahmed , Chris Li , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Chengming Zhou X-Developer-Signature: v=1; a=ed25519-sha256; t=1705663348; l=7114; i=zhouchengming@bytedance.com; s=20231204; h=from:subject:message-id; bh=STTBS161JIPRA1SjEhSl0dRb0yjkdHTr74XohKywC4w=; b=bFo0QeQtyjGFtSnh6n45b8T++jbrYv6goBtMlmOhBp57yOKq5hpghI9HgNN1SmSD5s74vuMrs IZviwVVhWvnB0bY8nwHQvC2IyjnrT1wztQQGlfihj14bSKT5rd31aq+ X-Developer-Key: i=zhouchengming@bytedance.com; a=ed25519; pk=xFTmRtMG3vELGJBUiml7OYNdM393WOMv0iWWeQEVVdA= X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: F21354000C X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam-User: X-Stat-Signature: a5ohb9gkeux5mcgq1uzz4n557n7t5j6i X-Rspamd-Server: rspam01 X-Rspam: Yes X-HE-Tag: 1705663355-713269 X-HE-Meta: U2FsdGVkX182KwNObYv4Nasuc60WLsP1KhX+KB3RgH09/4mExBufJA4PqnQc3Ydt5XpGdgVHnEtzapEdPPdzbWjrRzSYgeojyFioS4Amb4a9CWMCG3PTSfqPSGe4lRVqIkXpgW2eBsYKVi2sN7u9bdpA8z9r+4YLHo/Z4ABOn0MtzuGQ0yqxU+O6udzskqMeENHwqyyGZ72/xCoR7aOwZuVxsI+vTPqPJ2VqM5jPjNTJ4T6h1/n84rPF0/cBl+zgCAk8aHlK55Q8LaFhUdFTdOmldZT5KrOjOqzyXXFH5vI7UPKoPU16tA2GJp4OFEUuss6Ueo7kTno+C5hjJWkZG+NDLwSOEiD7y9lDwkIq4AuPVwFKcNafl/Xl6c+mMvi+ZcSHLI/F4XnM1cWcsQyGFuSZkZBlRaQfidS37r4Uet5o+8SJfxa65gZ01Uf9etQ25WtjHTz47al00BT7N+fQKY70cCCefOeQHcqTsE1TSUGSnK3nf2pitehcEsHWsQ/+1v3wagbO2tKGEdMgWZIWWQ1JuOWBZ/beISjIQYT6OOqB71TOMQOuecVjMF4okyYi44X05c9vvShYshRVvQfbzJUDnUmf7LJR5dN9jFn+8qW9e5MvOwzaBsQdOGTWJ+bmww4RlEvF1APQ9qM6RtyhLnWZ3Kox6U4QZGdZa9cQH9hlAJq7lpyNMReB6OPyn5IHm6CocVGREGxm0H+kcNyaP9N+exowEg3ka+kHTv3BQz1D/Kx1iGvanII3yCPFXPCCc5Qnzou4P0n1Un+FmsI/5NIpeV37Iid0blyTTzvUJkzoSgZAHJTvcQzFut+E1rWlfoo1/VwYFIfdwWKhYLuJOfdfCGIrH6H8k+sSNL9myTGnuAJIg3N/HliRqqzVvJS3lExs3BKGH8DCHHsuMEOWjOGqUYHYM0p8HP8/2C/26IZOoziY+6k89Gfit/fSwf0L9Gi6xsunLCkHJTLjRoK 2k3uaJTy 1tFdnd7BOB62udV08bFDUhs91CXS3YYm4Oy6rzZ7qyB3nOpFu55ayh1k4q7Y++5VcOpdCBjxZJoYfNCM4YSKpF/Uxh82Zp/Bah90Dj4JNuJIFOLukrB6PXqSG4DtOjJH9UGdmd2oxlj6aticaYpxAzQoqZ77RVFlrwsimjLeAg2TLkFTOjO1lpkRcjlvx8qo9RsUDk0XFdxc0ZXouqMBxpVEnrIPqedp6HBQmqDcpFVRIAA7pcrbfz0kHsDzuRFTQctleySxBAFA3nUS+GXxnRC+fDSC3lH9ZQ2x4aUV2nX0p4VdoyvqtpJWeeCzpgfD6vZofpYer/AQDNEYZhDz6vTHnRVnYxvihAmEkjuNvbOC5b52qSXa8oEq+TUTGPXne4DHToe09Uq2A16hd4CIUcGsdfOaHR6uEUXU2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Each swapfile has one rb-tree to search the mapping of swp_entry_t to zswap_entry, that use a spinlock to protect, which can cause heavy lock contention if multiple tasks zswap_store/load concurrently. Optimize the scalability problem by splitting the zswap rb-tree into multiple rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M), just like we did in the swap cache address_space splitting. Although this method can't solve the spinlock contention completely, it can mitigate much of that contention. Below is the results of kernel build in tmpfs with zswap shrinker enabled: linux-next zswap-lock-optimize real 1m9.181s 1m3.820s user 17m44.036s 17m40.100s sys 7m37.297s 4m54.622s So there are clearly improvements. Acked-by: Johannes Weiner Acked-by: Nhat Pham Signed-off-by: Chengming Zhou Acked-by: Yosry Ahmed --- include/linux/zswap.h | 4 +-- mm/swapfile.c | 2 +- mm/zswap.c | 71 +++++++++++++++++++++++++++++++++------------------ 3 files changed, 49 insertions(+), 28 deletions(-) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index eca388229d9a..91895ce1fdbc 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -30,7 +30,7 @@ struct zswap_lruvec_state { bool zswap_store(struct folio *folio); bool zswap_load(struct folio *folio); void zswap_invalidate(int type, pgoff_t offset); -int zswap_swapon(int type); +int zswap_swapon(int type, unsigned long nr_pages); void zswap_swapoff(int type); void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg); void zswap_lruvec_state_init(struct lruvec *lruvec); @@ -51,7 +51,7 @@ static inline bool zswap_load(struct folio *folio) } static inline void zswap_invalidate(int type, pgoff_t offset) {} -static inline int zswap_swapon(int type) +static inline int zswap_swapon(int type, unsigned long nr_pages) { return 0; } diff --git a/mm/swapfile.c b/mm/swapfile.c index 6c53ea06626b..35aa17b2a2fa 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3164,7 +3164,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (error) goto bad_swap_unlock_inode; - error = zswap_swapon(p->type); + error = zswap_swapon(p->type, maxpages); if (error) goto free_swap_address_space; diff --git a/mm/zswap.c b/mm/zswap.c index d88faea85978..2885f4fb6dcb 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -239,6 +239,7 @@ struct zswap_tree { }; static struct zswap_tree *zswap_trees[MAX_SWAPFILES]; +static unsigned int nr_zswap_trees[MAX_SWAPFILES]; /* RCU-protected iteration */ static LIST_HEAD(zswap_pools); @@ -265,6 +266,12 @@ static bool zswap_has_pool; * helpers and fwd declarations **********************************/ +static inline struct zswap_tree *swap_zswap_tree(swp_entry_t swp) +{ + return &zswap_trees[swp_type(swp)][swp_offset(swp) + >> SWAP_ADDRESS_SPACE_SHIFT]; +} + #define zswap_pool_debug(msg, p) \ pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ zpool_get_type((p)->zpools[0])) @@ -865,7 +872,7 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o * until the entry is verified to still be alive in the tree. */ swpoffset = swp_offset(entry->swpentry); - tree = zswap_trees[swp_type(entry->swpentry)]; + tree = swap_zswap_tree(entry->swpentry); list_lru_isolate(l, item); /* * It's safe to drop the lock here because we return either @@ -1494,10 +1501,9 @@ static void zswap_fill_page(void *ptr, unsigned long value) bool zswap_store(struct folio *folio) { swp_entry_t swp = folio->swap; - int type = swp_type(swp); pgoff_t offset = swp_offset(swp); struct page *page = &folio->page; - struct zswap_tree *tree = zswap_trees[type]; + struct zswap_tree *tree = swap_zswap_tree(swp); struct zswap_entry *entry, *dupentry; struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; @@ -1569,7 +1575,7 @@ bool zswap_store(struct folio *folio) src = kmap_local_page(page); if (zswap_is_page_same_filled(src, &value)) { kunmap_local(src); - entry->swpentry = swp_entry(type, offset); + entry->swpentry = swp; entry->length = 0; entry->value = value; atomic_inc(&zswap_same_filled_pages); @@ -1651,7 +1657,7 @@ bool zswap_store(struct folio *folio) mutex_unlock(&acomp_ctx->mutex); /* populate entry */ - entry->swpentry = swp_entry(type, offset); + entry->swpentry = swp; entry->handle = handle; entry->length = dlen; @@ -1711,10 +1717,9 @@ bool zswap_store(struct folio *folio) bool zswap_load(struct folio *folio) { swp_entry_t swp = folio->swap; - int type = swp_type(swp); pgoff_t offset = swp_offset(swp); struct page *page = &folio->page; - struct zswap_tree *tree = zswap_trees[type]; + struct zswap_tree *tree = swap_zswap_tree(swp); struct zswap_entry *entry; u8 *dst; @@ -1757,7 +1762,7 @@ bool zswap_load(struct folio *folio) void zswap_invalidate(int type, pgoff_t offset) { - struct zswap_tree *tree = zswap_trees[type]; + struct zswap_tree *tree = swap_zswap_tree(swp_entry(type, offset)); struct zswap_entry *entry; /* find */ @@ -1772,37 +1777,53 @@ void zswap_invalidate(int type, pgoff_t offset) spin_unlock(&tree->lock); } -int zswap_swapon(int type) +int zswap_swapon(int type, unsigned long nr_pages) { - struct zswap_tree *tree; + struct zswap_tree *trees, *tree; + unsigned int nr, i; - tree = kzalloc(sizeof(*tree), GFP_KERNEL); - if (!tree) { + nr = DIV_ROUND_UP(nr_pages, SWAP_ADDRESS_SPACE_PAGES); + trees = kvcalloc(nr, sizeof(*tree), GFP_KERNEL); + if (!trees) { pr_err("alloc failed, zswap disabled for swap type %d\n", type); return -ENOMEM; } - tree->rbroot = RB_ROOT; - spin_lock_init(&tree->lock); - zswap_trees[type] = tree; + for (i = 0; i < nr; i++) { + tree = trees + i; + tree->rbroot = RB_ROOT; + spin_lock_init(&tree->lock); + } + + nr_zswap_trees[type] = nr; + zswap_trees[type] = trees; return 0; } void zswap_swapoff(int type) { - struct zswap_tree *tree = zswap_trees[type]; - struct zswap_entry *entry, *n; + struct zswap_tree *trees = zswap_trees[type]; + unsigned int i; - if (!tree) + if (!trees) return; - /* walk the tree and free everything */ - spin_lock(&tree->lock); - rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode) - zswap_free_entry(entry); - tree->rbroot = RB_ROOT; - spin_unlock(&tree->lock); - kfree(tree); + for (i = 0; i < nr_zswap_trees[type]; i++) { + struct zswap_tree *tree = trees + i; + struct zswap_entry *entry, *n; + + /* walk the tree and free everything */ + spin_lock(&tree->lock); + rbtree_postorder_for_each_entry_safe(entry, n, + &tree->rbroot, + rbnode) + zswap_free_entry(entry); + tree->rbroot = RB_ROOT; + spin_unlock(&tree->lock); + } + + kvfree(trees); + nr_zswap_trees[type] = 0; zswap_trees[type] = NULL; }