From patchwork Fri May 10 11:47:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13661475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F0C1C25B10 for ; Fri, 10 May 2024 11:50:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0E926B00D6; Fri, 10 May 2024 07:50:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ABE046B00D7; Fri, 10 May 2024 07:50:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 938876B00D8; Fri, 10 May 2024 07:50:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 73AF06B00D6 for ; Fri, 10 May 2024 07:50:46 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 33653A0349 for ; Fri, 10 May 2024 11:50:46 +0000 (UTC) X-FDA: 82102319292.05.8EA0AC9 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf03.hostedemail.com (Postfix) with ESMTP id 4CCDC20011 for ; Fri, 10 May 2024 11:50:44 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Ta/Xoec9"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715341844; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2cHgobIkFZqi9tZy8wGcGHO8km9uOWAjV7RKPjmck1g=; b=51YFZ0asCqkcmoh4CPI+9RUCCRgqihbBVsuQZFynHHWZqWOZRDLAdhQ6jOS6wTTUHNMGuN HUSAyHJ2PF8O2geEyoKmRsMkpDJtInWxGaS3Pj9ke6X/XhRfv3lmEpo2ns/xs0yo5go6DC 95xtX/It/whdgXT6/8NWLTvmxeA+K8g= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715341844; a=rsa-sha256; cv=none; b=ubMSwqji3pi9pzAY8gG5BzIsBCDOOsMo1LcfP5Dkie11yMVd69rwhvUVykjSMnL77YPrAr 8wtRkqakmgusBYbzjfbDH2z9NWJAtNRsP/bbZRmZIO3wnq7KCD4u2IneefgSUvlvZgwSIu 2ubG24OdvTaG/TCAEH4JM7Df6/KYoag= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Ta/Xoec9"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com Received: by mail-pg1-f177.google.com with SMTP id 41be03b00d2f7-5f80aa2d4a3so1667476a12.0 for ; Fri, 10 May 2024 04:50:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715341843; x=1715946643; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=2cHgobIkFZqi9tZy8wGcGHO8km9uOWAjV7RKPjmck1g=; b=Ta/Xoec9hvzNnJLmumAtpwLwntcN5pb7C1U5XhHltVYqKLOQQJU3QeLA6lp3jrxuw7 BJJvsUvejgA7aDdKfdatJsk+sU7KoIWENTuaUVbafRmDPXkwANQ8/MvJWyv3Rzc8uVN/ t076O96h3XLhLAkWaG7Vd0T2wIekcZcyP9LiGmH1q7JHOBAZpA3Jlh8EN21JAKDwmYNF Loe5yHSrirQ4E2lqO0/J/lZpFSnBzkbuK+Tw3Fqps8fw9lCnP435LXCu7dTI6F8Dvo6x 4xMiOze2Povgp6NE94ceS72CslmeK1Fu623tTrxGZ0DSyj+DB/z5aJElKBkKJIn4oEy2 oTrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715341843; x=1715946643; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=2cHgobIkFZqi9tZy8wGcGHO8km9uOWAjV7RKPjmck1g=; b=MzwU5ek9otQd+/k/m4HiVVl93wBLYVw4FU19BGzbGXFzFnVjgSq3JJrrpBiMThH54w /Ok9DbExcIwnY+FoSurOu4oPEtwvNYoMAeYr51GaGwrsoZoDtLoaNPJdL5gYdo76OHSb zMA3fZsGKLRpcBD0YTCewX2xUy0aPHwkMqxuVuVY56a2UuoBsoVSmyAGS16knOWoYnfe CIlowNc5SgwBqixGPLZHcuOWkcVuIOXKKbNim64JV2kfWZatOKP9CEKGdmt0NpQbaDe0 QkiLmVeRXG6klWvMRvFI1TyldkJY2okt5yxxUsgcaKCnq8WLEQuYeaxIpipPb1WYrtIX fPjw== X-Gm-Message-State: AOJu0Yx53w19KRyMnwoU8GY3LMWTEPe/B6c2/xiNtqZnLrebtoE3gR9x 7rDEJxxC0cBOFa7uuvKbCKgi15KZrubp7OKxbMftTp+QvEHKfw3MgmD1g9epwTEgsA== X-Google-Smtp-Source: AGHT+IGk029woWQIMSrmpOx6WkZvEeZxmkaZ13U9riSBclJ33JSyP7Sue0jUtAdVtSG6k8EJeM0BEg== X-Received: by 2002:a05:6a20:914f:b0:1a8:2cc0:290a with SMTP id adf61e73a8af0-1afde0dc8bcmr3107513637.30.1715341842750; Fri, 10 May 2024 04:50:42 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([43.132.141.20]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1ef0c134155sm30183825ad.231.2024.05.10.04.50.38 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 10 May 2024 04:50:42 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , "Huang, Ying" , Matthew Wilcox , Chris Li , Barry Song , Ryan Roberts , Neil Brown , Minchan Kim , David Hildenbrand , Hugh Dickins , Yosry Ahmed , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v5 12/12] mm/swap: reduce swap cache search space Date: Fri, 10 May 2024 19:47:47 +0800 Message-ID: <20240510114747.21548-13-ryncsn@gmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240510114747.21548-1-ryncsn@gmail.com> References: <20240510114747.21548-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 4CCDC20011 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: jzi784adi85ezob8df6dggt5gd8ffzbs X-HE-Tag: 1715341844-363782 X-HE-Meta: U2FsdGVkX1+/r6h2rYfh4cZbmRf7SDOrejTJb5biKkO1sjQBzXe8wJfnoqWKHiVO7iS0G2bHYgJkMyi3KuExn2tF64MMW3Je/9I0lqJAvnK4fwOXFrd/2lBEu2FDgGK5IUlItpOsAzqAitgAjPy+wACXlmq/Wfj3Ch9yrJ9ixwCt9V2L44ZNNann4+oyXFLxOQqc9AhBFQd6DC88Ly0C3RQ3HESU1UGGaH6IZHzLNhcszRhkyEru5IVDLcGEttwLLNHIRBpQPkqiuKRuYoNcBzccBkRkIlZD47+mfwL12iCRDYza0DUbqaHL18s75UJnVS9p2jLGC2Hy2UxlBtqfUZ8AfHYe+Y3KSmNjDwDVZ68NTkfbNx+tUojqdNxz9/qxU0P87Tzziu46zX+adCzjac7PxN+5DuGw9HjtvzOvS+YDB7Dl+7Nxv24bUruUS24r5bcIG0YCpdFkYPjrPAhvKD2hzaKcgBHZ/jNTEGzqsr3UjyIvQHr8RBW46EnC7koAEOhpcnkKju+FRjno1x9khsYN+5GGP870Nc/lFqmSTQiwIn+Nm16sdpah8fdbGPDB0D33+0XOn6tGFoVx5ibsGjQAVT4LnvPLA4jIJuT8LHNr53Ar+QkPjrxZ8QbDISM0M9U084usHfTWycaV94wwXIvCf1F6Vgv4yMFiy/0bCsqrC1MGuKQl22pcx+GYQIcTjTi48wMA+CiCPQa5AtSujhJToc+s3G7bSrgrvvWdgjjEy9+V8MOAczUwxLqMgijU5APuNDxtUBDjKuoDHi7iCNEwQ+tZBRRUIkLE8Kl1IpGM3itBcPJOJPXLjQ5hMG61LgwSaBmNJsiKSW7AlhzhmTTzLxRrEs/lnSYlGRf+FuyRU5i5iAOt4bX6ic92Do3vI347QxUm/92HULlc1A6+fnUrNNZ8CeBI8Snu8P62osekMMtqI/iycmqW/zYLsbkv2SM3tnla0Nb72kmAWAp bnRbzOJQ SK7yuKwSCNfpW8xsp/wzZgrv33yaivNGfGuyxu8DX2eOtenTer5ZuVx43DRJklPuUWZWo+s1JINT1Ff/Z4ttFH5JjAGf7rU0S7JhZfLgGvirRcJ/PthAnwZcGdmx0soMTnD+zxptknLtzPCgZ0bfkVOJqOQfghrWLyYmKT+l6PFIa0qIb1jPSW0VGyL7jf4PycLGTxKGeLVHbCea0Jgimh6qDktRTPIzfwaTCmdXcKWjpof6a5CpL2XIrsjvPqmGLe5xvjB6pfvdm1Hogbd+RIPmWFxwVL7x7976vdCFHt1qBFh5Ztt2+TBkYrX7HNfXXmO8BrvS6nB8sZzDrpw5Ic9gV7hsvqD3YMITbG8drj/3hcbRmEFFzeNeqfWemUlh8pq8FVonMCrJgjjwud/10JrZhwk/WfIt8PKQehQub3PO7I69MXWe/EbfA15pc0jN+gmZNzi6sndO11O2lOYL93575KjD1mhmBEBR1KnbLwOQKd1hoHU8JUd8Ooq2mnDRjAmauKzc5TnFqg/Ky8tUBnqO3YGsvJErQ8hZ/S8pAsPojLCIwYU0UDhICA1w582Rn4qrI7gaxyRScbDGAJTebLWVbJJxKYKv6MQp/N45drc7h+TJhddg8n9QMLf8IlAQYzW+a X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Currently we use one swap_address_space for every 64M chunk to reduce lock contention, this is like having a set of smaller swap files inside one swap device. But when doing swap cache look up or insert, we are still using the offset of the whole large swap device. This is OK for correctness, as the offset (key) is unique. But Xarray is specially optimized for small indexes, it creates the radix tree levels lazily to be just enough to fit the largest key stored in one Xarray. So we are wasting tree nodes unnecessarily. For 64M chunk it should only take at most 3 levels to contain everything. But if we are using the offset from the whole swap device, the offset (key) value will be way beyond 64M, and so will the tree level. Optimize this by using a new helper swap_cache_index to get a swap entry's unique offset in its own 64M swap_address_space. I see a ~1% performance gain in benchmark and actual workload with high memory pressure. Test with `time memhog 128G` inside a 8G memcg using 128G swap (ramdisk with SWP_SYNCHRONOUS_IO dropped, tested 3 times, results are stable. The test result is similar but the improvement is smaller if SWP_SYNCHRONOUS_IO is enabled, as swap out path can never skip swap cache): Before: 6.07user 250.74system 4:17.26elapsed 99%CPU (0avgtext+0avgdata 8373376maxresident)k 0inputs+0outputs (55major+33555018minor)pagefaults 0swaps After (1.8% faster): 6.08user 246.09system 4:12.58elapsed 99%CPU (0avgtext+0avgdata 8373248maxresident)k 0inputs+0outputs (54major+33555027minor)pagefaults 0swaps Similar result with MySQL and sysbench using swap: Before: 94055.61 qps After (0.8% faster): 94834.91 qps Radix tree slab usage is also very slightly lower. Signed-off-by: Kairui Song Reviewed-by: "Huang, Ying" --- mm/huge_memory.c | 2 +- mm/memcontrol.c | 2 +- mm/mincore.c | 2 +- mm/shmem.c | 2 +- mm/swap.h | 15 +++++++++++++++ mm/swap_state.c | 17 +++++++++-------- mm/swapfile.c | 6 +++--- 7 files changed, 31 insertions(+), 15 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 317de2afd371..fcc0e86a2589 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2838,7 +2838,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, split_page_memcg(head, order, new_order); if (folio_test_anon(folio) && folio_test_swapcache(folio)) { - offset = swp_offset(folio->swap); + offset = swap_cache_index(folio->swap); swap_cache = swap_address_space(folio->swap); xa_lock(&swap_cache->i_pages); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d127c9c5fabf..024aeb64d0be 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6153,7 +6153,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, * Because swap_cache_get_folio() updates some statistics counter, * we call find_get_page() with swapper_space directly. */ - page = find_get_page(swap_address_space(ent), swp_offset(ent)); + page = find_get_page(swap_address_space(ent), swap_cache_index(ent)); entry->val = ent.val; return page; diff --git a/mm/mincore.c b/mm/mincore.c index dad3622cc963..e31cf1bde614 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -139,7 +139,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } else { #ifdef CONFIG_SWAP *vec = mincore_page(swap_address_space(entry), - swp_offset(entry)); + swap_cache_index(entry)); #else WARN_ON(1); *vec = 1; diff --git a/mm/shmem.c b/mm/shmem.c index fa2a0ed97507..326315c12feb 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1756,7 +1756,7 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp, old = *foliop; entry = old->swap; - swap_index = swp_offset(entry); + swap_index = swap_cache_index(entry); swap_mapping = swap_address_space(entry); /* diff --git a/mm/swap.h b/mm/swap.h index 82023ab93205..2c0e96272d49 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -27,6 +27,7 @@ void __swap_writepage(struct folio *folio, struct writeback_control *wbc); /* One swap address space for each 64M swap space */ #define SWAP_ADDRESS_SPACE_SHIFT 14 #define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) +#define SWAP_ADDRESS_SPACE_MASK (SWAP_ADDRESS_SPACE_PAGES - 1) extern struct address_space *swapper_spaces[]; #define swap_address_space(entry) \ (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ @@ -40,6 +41,15 @@ static inline loff_t swap_dev_pos(swp_entry_t entry) return ((loff_t)swp_offset(entry)) << PAGE_SHIFT; } +/* + * Return the swap cache index of the swap entry. + */ +static inline pgoff_t swap_cache_index(swp_entry_t entry) +{ + BUILD_BUG_ON((SWP_OFFSET_MASK | SWAP_ADDRESS_SPACE_MASK) != SWP_OFFSET_MASK); + return swp_offset(entry) & SWAP_ADDRESS_SPACE_MASK; +} + void show_swap_cache_info(void); bool add_to_swap(struct folio *folio); void *get_shadow_from_swap_cache(swp_entry_t entry); @@ -86,6 +96,11 @@ static inline struct address_space *swap_address_space(swp_entry_t entry) return NULL; } +static inline pgoff_t swap_cache_index(swp_entry_t entry) +{ + return 0; +} + static inline void show_swap_cache_info(void) { } diff --git a/mm/swap_state.c b/mm/swap_state.c index 642c30d8376c..6e86c759dc1d 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -72,7 +72,7 @@ void show_swap_cache_info(void) void *get_shadow_from_swap_cache(swp_entry_t entry) { struct address_space *address_space = swap_address_space(entry); - pgoff_t idx = swp_offset(entry); + pgoff_t idx = swap_cache_index(entry); void *shadow; shadow = xa_load(&address_space->i_pages, idx); @@ -89,7 +89,7 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry, gfp_t gfp, void **shadowp) { struct address_space *address_space = swap_address_space(entry); - pgoff_t idx = swp_offset(entry); + pgoff_t idx = swap_cache_index(entry); XA_STATE_ORDER(xas, &address_space->i_pages, idx, folio_order(folio)); unsigned long i, nr = folio_nr_pages(folio); void *old; @@ -144,7 +144,7 @@ void __delete_from_swap_cache(struct folio *folio, struct address_space *address_space = swap_address_space(entry); int i; long nr = folio_nr_pages(folio); - pgoff_t idx = swp_offset(entry); + pgoff_t idx = swap_cache_index(entry); XA_STATE(xas, &address_space->i_pages, idx); xas_set_update(&xas, workingset_update_node); @@ -253,13 +253,14 @@ void clear_shadow_from_swap_cache(int type, unsigned long begin, for (;;) { swp_entry_t entry = swp_entry(type, curr); + unsigned long index = curr & SWAP_ADDRESS_SPACE_MASK; struct address_space *address_space = swap_address_space(entry); - XA_STATE(xas, &address_space->i_pages, curr); + XA_STATE(xas, &address_space->i_pages, index); xas_set_update(&xas, workingset_update_node); xa_lock_irq(&address_space->i_pages); - xas_for_each(&xas, old, end) { + xas_for_each(&xas, old, min(index + (end - curr), SWAP_ADDRESS_SPACE_PAGES)) { if (!xa_is_value(old)) continue; xas_store(&xas, NULL); @@ -350,7 +351,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, { struct folio *folio; - folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry)); + folio = filemap_get_folio(swap_address_space(entry), swap_cache_index(entry)); if (!IS_ERR(folio)) { bool vma_ra = swap_use_vma_readahead(); bool readahead; @@ -420,7 +421,7 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping, si = get_swap_device(swp); if (!si) return ERR_PTR(-ENOENT); - index = swp_offset(swp); + index = swap_cache_index(swp); folio = filemap_get_folio(swap_address_space(swp), index); put_swap_device(si); return folio; @@ -447,7 +448,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * that would confuse statistics. */ folio = filemap_get_folio(swap_address_space(entry), - swp_offset(entry)); + swap_cache_index(entry)); if (!IS_ERR(folio)) goto got_folio; diff --git a/mm/swapfile.c b/mm/swapfile.c index 0b0ae6e8c764..4f0e8b2ac8aa 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -142,7 +142,7 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, struct folio *folio; int ret = 0; - folio = filemap_get_folio(swap_address_space(entry), offset); + folio = filemap_get_folio(swap_address_space(entry), swap_cache_index(entry)); if (IS_ERR(folio)) return 0; /* @@ -2158,7 +2158,7 @@ static int try_to_unuse(unsigned int type) (i = find_next_to_unuse(si, i)) != 0) { entry = swp_entry(type, i); - folio = filemap_get_folio(swap_address_space(entry), i); + folio = filemap_get_folio(swap_address_space(entry), swap_cache_index(entry)); if (IS_ERR(folio)) continue; @@ -3476,7 +3476,7 @@ EXPORT_SYMBOL_GPL(swapcache_mapping); pgoff_t __folio_swap_cache_index(struct folio *folio) { - return swp_offset(folio->swap); + return swap_cache_index(folio->swap); } EXPORT_SYMBOL_GPL(__folio_swap_cache_index);