From patchwork Wed Apr 17 16:08:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13633572 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63BBAC4345F for ; Wed, 17 Apr 2024 16:10:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E9F956B00A2; Wed, 17 Apr 2024 12:10:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E4FC26B00A3; Wed, 17 Apr 2024 12:10:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF0D86B00A4; Wed, 17 Apr 2024 12:10:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B10506B00A2 for ; Wed, 17 Apr 2024 12:10:20 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A01C5A0ECC for ; Wed, 17 Apr 2024 16:10:19 +0000 (UTC) X-FDA: 82019510958.16.7001091 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf22.hostedemail.com (Postfix) with ESMTP id BB589C0004 for ; Wed, 17 Apr 2024 16:10:17 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ug2l4UEI; spf=pass (imf22.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713370217; a=rsa-sha256; cv=none; b=YVTXBLpEeoLTc9iqhdG7D9Oa7bBXtAVXtl9G3WRG3RaLIjTxb4+4ZLTTJxHeEIG2OenZ+G MKqiFBK8bXysSLFH3hCgJfnj3TYk5j99NSiIzG7mqXzy+SYSmg1D0RCfhKo11GvW7t7bel AFlu9rpc82lcU1SgekM4lKmfaf0nOrk= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ug2l4UEI; spf=pass (imf22.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713370217; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NbiFeHGTwOcTfQDQTaNNyMygitoQq98RBksPTjxdp5g=; b=Sf3G6pk1sE4ZM2P8Pg5QF2ODvfnRdbY4ELhTTVO/Fu9VAi6rwV6KajIx4dxCH0p2OoPiQ5 OKZm4WVmUjEVDCfZVz4Yc0WiKtNKemrdnWj6rO7dOg6FRZxRBW2LlZj9e+ko4Mt4ly85At mhBgHqtVBcRdxxGmtlSd7Vh22KB3nD0= Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6e46dcd8feaso2696591b3a.2 for ; Wed, 17 Apr 2024 09:10:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713370216; x=1713975016; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=NbiFeHGTwOcTfQDQTaNNyMygitoQq98RBksPTjxdp5g=; b=Ug2l4UEI/4nEuCY6Sw2yzWNIhW08Yhvf/pJPqPDOhKJjGojCGbWj50plPoUqc6G/Wq xMKSGztMTbVr4uAAg0oYNx9a6CqlQLx2sBSX4yvbLC/1OK6Zfii5WzjgXyXGmDk2Zg6e k2jjadTWTK7jd7dbkbmXz+7Xt7wo0qCv1IhRlA0S2sgMlpYt3V3FHtZhFPPaIWBIthM4 R89yZRwAtAyPuZ1W+bCCe6TOHqIu5t3gJi7sPs63h2axXIvE1kKbJjj/cOolqR2VKsJn fs22l1a3RmtWBeQzvCgGY/r8zfG6UAbNSEPJ/UoZsDHvwwjHPYTYj8x+JhtlPfiEVWUk uVPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713370216; x=1713975016; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=NbiFeHGTwOcTfQDQTaNNyMygitoQq98RBksPTjxdp5g=; b=v6m+FDSDrIZR0XdJSMsIzsJjqjtd3ptZsLcZ/PQipqT5NgYNsOTdlcplNaMwOMgVYd QKo+qcTkAIZstpdqf52d/dxxIGU7bj6Eb9uh2YVP5ztSHaz/LfbkwAq0CX4xACFzmi5f C23t1zJ3mmG/IQ1F/7zYn+nw/ZIFrnnjSIsk2xiyU13QdhbKm5s3E7whaSBYSa1CU2v/ abQc6cysQQFV/beAcJfnVsapNDxWj5CvWgog+qVuWyOwHVadpTpG41kUv95VlcrNOWWs o6Kokx/cAq5BixIIFbc52Kb5XwGWPW7HKldD/I48nVkzti6eXeVWfkRoJv4oSIOgTdkZ q6kQ== X-Gm-Message-State: AOJu0Yz23WFF1XmlCuKZ9Mp2w/bW2j/YUl5GHgUulNkWdcGetrLxSxZk MLY6nlNCw/ra+llKnftCV8DC6deL1nKER9t5t6N7g3uIYUARjbFsj4ucF54e8N0o9w== X-Google-Smtp-Source: AGHT+IHeqf5gBYmEe5ujb4SeEb6/MooBt6l5k+RX+WFyCklLUnpFVTfLPyz4C4t7qhymJtByjYRG3w== X-Received: by 2002:a05:6a21:8803:b0:1a9:90e0:4760 with SMTP id ta3-20020a056a21880300b001a990e04760mr61057pzc.56.1713370215887; Wed, 17 Apr 2024 09:10:15 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([115.171.40.106]) by smtp.gmail.com with ESMTPSA id h189-20020a6383c6000000b005f75cf4db92sm5708366pge.82.2024.04.17.09.10.11 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Apr 2024 09:10:15 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , "Huang, Ying" , Matthew Wilcox , Chris Li , Barry Song , Ryan Roberts , Neil Brown , Minchan Kim , Hugh Dickins , David Hildenbrand , Yosry Ahmed , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 8/8] mm/swap: reduce swap cache search space Date: Thu, 18 Apr 2024 00:08:42 +0800 Message-ID: <20240417160842.76665-9-ryncsn@gmail.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240417160842.76665-1-ryncsn@gmail.com> References: <20240417160842.76665-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: BB589C0004 X-Stat-Signature: ys9rbgofwkkz8e3kbd9kxf4tzidsb45m X-Rspam-User: X-HE-Tag: 1713370217-485748 X-HE-Meta: U2FsdGVkX1/p3gk/k0r2xTSsgdwK3KVzHv/5IjqSvo8563L0/UGUOf781F/U+gNTpwsfRJOtj5IRX46JF/1u1hLxdJ9DgCuXA4ykR/nKz0oGrAXOL8+adp0hqf/42YRW5OnZsAMZUsCTx18PXVCR1rw3M+ZUuTp338frzZv2AS9bzgb7ybz2kc9/ZZEIpBYSp4uig/z6G8f+dq9A8FOLnut6d42gZHy0Lt51cJ2TQCynAAmLcIaYI+VXZobeibYoQwQ1fxacfQ/b+2yqPke9qZNJ9pIN7tj1DOc1ODD4yawCShfZDVnI5LRnWxQbYQi7j+tplzU2QJhHgV8xFpTc/G72RyWMOhpXjOfA4GwiCtbOnf7hITNJE7w6SL0dF1OReNASUKVjP/8E1SjZLyWPXroXsisD12QHgvP0/8R2BgeiHV9Z+clMwJwyjxjggln92YynlNGRGS6+Fl2s/sGBtClKZSAMyoAQ8em4ksMTqsp4Xrc+rJEgYxFKFbfiqPG28jit3OrniBDm2MzhScC1EjmtJDyA3T1na0WuGXKJ5ropNptxDi+to1G0h0+zZxGsLolh59ucRS5UdQG+xmXOnmelvp7ywG8Ff1PwOlWzFuZsO6PaIfC4iNPWnyT1MeY5QSQJ4z8bK6ADExdidBShfSPF9AlJ48XchHJ/UBRbnuJ1F+KfllHgiL7H2hciE4NcYmrTnUOX20qptePQO+qM+EOedICD8gbnVjiB+ZaCxM3t5cP2/MF9FYegzc76nsqAkw1CAXAO5g54BkfXwsalpwSND2pZjVPYc9G0AxLWgC9/1C8CZQVpmemW7vsAFXo5T2fHsj1m991SlNI5fwnZM+JoqK79W1EhGgS+yYuz5GAD6iDZo7+8wLlpj3EbT2Fn5/yRwEWg4DPxqGMTwNGaSdKURgGSxwvAfEIFVnpaDwWusfky439cbvxrWYgQ7iEuyFszYjMW2aJZ1le/H0s Z4cOc3aS vT3UaDS65OU8feIizoJbx1wMoR8/mzlBxtGlZonFYcUUTf4Znk+P0AdslYvBUSESOijhl/HVOQ0FfgPcMZxD6H5FFdw5F1xaFLGQ0BieWgs81OSUubpSa3LF9dQsMzu31G4HrqzGM7Nr2XoMS6t0L2zVzlmsSIQ/bUpONr7HdomO/iWm+ZwDLv1w4kCHl1kWFExk/MufD0BTKeBHtp4YZRrSqxHCEJg6Q30gCueT25upEP8m53WZgxuXSQ6wt61tghcBuDZM/pkXFMSYGQeRXxAZ69I67GClJMxOQhfkMdsJ+A5MbpiOUww/JZDo7E5ofoFSrX2YkoqSVdX6N/LovYNgpI7KKUU8YV+5s4xE6+xkcMIKouUSoMrgKUguSWvrlGJZzo1xi3YAQBewS3p5A9f8Iv1u285WZk/RAEiFjvh3tbJZFELTA75Wz8+pMYCC7q8sVg2RaigjpMEVPxlVDBaAHp0LN33wbSzZ431r7p40FE8adJgUo7CwTYfdklTlPnF0YpHaPe3x2mCo1xQ8TvyzYHcQiKZSlIzfVfeR3ciKvp1CCE/DgVwkIKiHOlw8RZ98VMzx15ZFo0zIGt429F85mVAQq/eImr0s9BGA9PLEPH/8FxKxUStU9bnw/sh4tfqIJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Currently we use one swap_address_space for every 64M chunk to reduce lock contention, this is like having a set of smaller swap files inside one big swap file. But when doing swap cache look up or insert, we are still using the offset of the whole large swap file. This is OK for correctness, as the offset (key) is unique. But Xarray is specially optimized for small indexes, it creates the radix tree levels lazily to be just enough to fit the largest key stored in one Xarray. So we are wasting tree nodes unnecessarily. For 64M chunk it should only take at most 3 levels to contain everything. But we are using the offset from the whole swap file, so the offset (key) value will be way beyond 64M, and so will the tree level. Optimize this by using a new helper swap_cache_index to get a swap entry's unique offset in its own 64M swap_address_space. I see a ~1% performance gain in benchmark and actual workload with high memory pressure. Test with `time memhog 128G` inside a 8G memcg using 128G swap (ramdisk with SWP_SYNCHRONOUS_IO dropped, tested 3 times, results are stable. The test result is similar but the improvement is smaller if SWP_SYNCHRONOUS_IO is enabled, as swap out path can never skip swap cache): Before: 6.07user 250.74system 4:17.26elapsed 99%CPU (0avgtext+0avgdata 8373376maxresident)k 0inputs+0outputs (55major+33555018minor)pagefaults 0swaps After (1.8% faster): 6.08user 246.09system 4:12.58elapsed 99%CPU (0avgtext+0avgdata 8373248maxresident)k 0inputs+0outputs (54major+33555027minor)pagefaults 0swaps Similar result with MySQL and sysbench using swap: Before: 94055.61 qps After (0.8% faster): 94834.91 qps Radix tree slab usage is also very slightly lower. Signed-off-by: Kairui Song --- mm/huge_memory.c | 2 +- mm/memcontrol.c | 2 +- mm/mincore.c | 2 +- mm/shmem.c | 2 +- mm/swap.h | 7 +++++++ mm/swap_state.c | 12 ++++++------ mm/swapfile.c | 6 +++--- 7 files changed, 20 insertions(+), 13 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9859aa4f7553..1208d60792f0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2903,7 +2903,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, split_page_memcg(head, order, new_order); if (folio_test_anon(folio) && folio_test_swapcache(folio)) { - offset = swp_offset(folio->swap); + offset = swap_cache_index(folio->swap); swap_cache = swap_address_space(folio->swap); xa_lock(&swap_cache->i_pages); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fabce2b50c69..04d7be7f30dc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5934,7 +5934,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, * Because swap_cache_get_folio() updates some statistics counter, * we call find_get_page() with swapper_space directly. */ - page = find_get_page(swap_address_space(ent), swp_offset(ent)); + page = find_get_page(swap_address_space(ent), swap_cache_index(ent)); entry->val = ent.val; return page; diff --git a/mm/mincore.c b/mm/mincore.c index dad3622cc963..e31cf1bde614 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -139,7 +139,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } else { #ifdef CONFIG_SWAP *vec = mincore_page(swap_address_space(entry), - swp_offset(entry)); + swap_cache_index(entry)); #else WARN_ON(1); *vec = 1; diff --git a/mm/shmem.c b/mm/shmem.c index 0aad0d9a621b..cbe33ab52a73 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1762,7 +1762,7 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp, old = *foliop; entry = old->swap; - swap_index = swp_offset(entry); + swap_index = swap_cache_index(entry); swap_mapping = swap_address_space(entry); /* diff --git a/mm/swap.h b/mm/swap.h index 2de83729aaa8..6ef237d2b029 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -31,11 +31,18 @@ void __swap_writepage(struct folio *folio, struct writeback_control *wbc); /* One swap address space for each 64M swap space */ #define SWAP_ADDRESS_SPACE_SHIFT 14 #define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) +#define SWAP_ADDRESS_SPACE_MASK (BIT(SWAP_ADDRESS_SPACE_SHIFT) - 1) extern struct address_space *swapper_spaces[]; #define swap_address_space(entry) \ (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ >> SWAP_ADDRESS_SPACE_SHIFT]) +static inline pgoff_t swap_cache_index(swp_entry_t entry) +{ + BUILD_BUG_ON((SWP_OFFSET_MASK | SWAP_ADDRESS_SPACE_MASK) != SWP_OFFSET_MASK); + return swp_offset(entry) & SWAP_ADDRESS_SPACE_MASK; +} + void show_swap_cache_info(void); bool add_to_swap(struct folio *folio); void *get_shadow_from_swap_cache(swp_entry_t entry); diff --git a/mm/swap_state.c b/mm/swap_state.c index bfc7e8c58a6d..9dbb54c72770 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -72,7 +72,7 @@ void show_swap_cache_info(void) void *get_shadow_from_swap_cache(swp_entry_t entry) { struct address_space *address_space = swap_address_space(entry); - pgoff_t idx = swp_offset(entry); + pgoff_t idx = swap_cache_index(entry); struct page *page; page = xa_load(&address_space->i_pages, idx); @@ -89,7 +89,7 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry, gfp_t gfp, void **shadowp) { struct address_space *address_space = swap_address_space(entry); - pgoff_t idx = swp_offset(entry); + pgoff_t idx = swap_cache_index(entry); XA_STATE_ORDER(xas, &address_space->i_pages, idx, folio_order(folio)); unsigned long i, nr = folio_nr_pages(folio); void *old; @@ -144,7 +144,7 @@ void __delete_from_swap_cache(struct folio *folio, struct address_space *address_space = swap_address_space(entry); int i; long nr = folio_nr_pages(folio); - pgoff_t idx = swp_offset(entry); + pgoff_t idx = swap_cache_index(entry); XA_STATE(xas, &address_space->i_pages, idx); xas_set_update(&xas, workingset_update_node); @@ -350,7 +350,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, { struct folio *folio; - folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry)); + folio = filemap_get_folio(swap_address_space(entry), swap_cache_index(entry)); if (!IS_ERR(folio)) { bool vma_ra = swap_use_vma_readahead(); bool readahead; @@ -420,7 +420,7 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping, si = get_swap_device(swp); if (!si) return ERR_PTR(-ENOENT); - index = swp_offset(swp); + index = swap_cache_index(swp); folio = filemap_get_folio(swap_address_space(swp), index); put_swap_device(si); return folio; @@ -447,7 +447,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * that would confuse statistics. */ folio = filemap_get_folio(swap_address_space(entry), - swp_offset(entry)); + swap_cache_index(entry)); if (!IS_ERR(folio)) goto got_folio; diff --git a/mm/swapfile.c b/mm/swapfile.c index 0c36a5c2400f..2e8df95977b7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -138,7 +138,7 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, struct folio *folio; int ret = 0; - folio = filemap_get_folio(swap_address_space(entry), offset); + folio = filemap_get_folio(swap_address_space(entry), swap_cache_index(entry)); if (IS_ERR(folio)) return 0; /* @@ -2110,7 +2110,7 @@ static int try_to_unuse(unsigned int type) (i = find_next_to_unuse(si, i)) != 0) { entry = swp_entry(type, i); - folio = filemap_get_folio(swap_address_space(entry), i); + folio = filemap_get_folio(swap_address_space(entry), swap_cache_index(entry)); if (IS_ERR(folio)) continue; @@ -3421,7 +3421,7 @@ EXPORT_SYMBOL_GPL(swapcache_mapping); pgoff_t __folio_swap_cache_index(struct folio *folio) { - return swp_offset(folio->swap); + return swap_cache_index(folio->swap); } EXPORT_SYMBOL_GPL(__folio_swap_cache_index);