From patchwork Mon Apr 29 19:04:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13647540 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0638C4345F for ; Mon, 29 Apr 2024 19:06:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04AB36B0085; Mon, 29 Apr 2024 15:06:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F3BCF6B0087; Mon, 29 Apr 2024 15:06:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E038F6B0088; Mon, 29 Apr 2024 15:06:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C32106B0085 for ; Mon, 29 Apr 2024 15:06:50 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 73FD71C09F1 for ; Mon, 29 Apr 2024 19:06:50 +0000 (UTC) X-FDA: 82063501380.05.1E23013 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf30.hostedemail.com (Postfix) with ESMTP id B22C480016 for ; Mon, 29 Apr 2024 19:06:48 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CWH6m+aa; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714417608; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Ja/heJXLrjQvSsIFYz+/qDLl0VJBPXaNzgR3hX76y+s=; b=CCv+oBNecAwxqyar/UvOXVbnnPJYt+N+4PMEviWD02st7OTJWCLJlgZjBiKmGZVGtuYUgC FhcIA3WumkIuaE4+4t00VScWHvNgGXOCFO/A4q+BQDdCx9hd42SSqzgDFtk5LCwgeFmOa0 F7EhmJUtLgYjWrrRkT3QxZRQn2xtiww= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CWH6m+aa; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714417608; a=rsa-sha256; cv=none; b=AIXzLyB8Np35in2ImAX6RcbdyN5y4VM4/BsLdMbEGsRnBVDZXr/MQc/DO209vBrohRbwug THiH1HB2856YDeKFzI6MNB/Vi2SomT4sR+ZeR/rM7BNEwxenTY1S5GZ1yXmD1GKEp3jOrv anG3Lov4fMF3FdV0tzOWdSGJ7xvuNM4= Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-1e36b7e7dd2so39749415ad.1 for ; Mon, 29 Apr 2024 12:06:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714417607; x=1715022407; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=Ja/heJXLrjQvSsIFYz+/qDLl0VJBPXaNzgR3hX76y+s=; b=CWH6m+aaY44MB1j8XH0DaqCbCh9C9HHLdY2URoxi3cxEJT03aVHU8Z4YO8kv4fv9WT X8K/ecjbXW56fF0kgWXaXTIW+X2JW3djowTEIZ1uREk7etJZg06hyH2GTBECAAkD8gp/ 7cB7WtnpBOwX+tMC2Cp8FkA4bJc7WASo7GD5pKopQYZDerRnYUGYAqEljtlgfzysh4ts RSjFXY0/mXyxJ6hPlMmJ1l1O9CZK0s+bR1shq8RtJeGKVDCfFfBK5IjRmQO5L/KCO+xP i1hI309E/zglaEGpawkfFMUYgtYlao5gpxpKoM3aBUNnUopsbn44FwEV4UbHnaUl2pkt 4Wyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714417607; x=1715022407; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Ja/heJXLrjQvSsIFYz+/qDLl0VJBPXaNzgR3hX76y+s=; b=Tuxy74Lq/oAdcV4Rw4SZcdQ4aqv+NvppZfa+tFyhHCNIjepVxfwYg1lMtbTM0nHYqU B+cN2U+KIZn0IfpfXkap/EwfOsxPytb6AX6lvE0EExdb8AKGxXVaWy1O9LqaXZC2Vx+L rLodU0hak34igzjO4raoot75TN4uA1ozl6jdojYvSdXawYQrO2ZiDttyvgHuhnhTIaA4 XifhnlKz+iComJf6o82SVzEp7u/YMn6fwIUKp0hCuYWB8uYsehvpn79cHde6DgIiLwFT BOn2ceMPcZofdrCifwqd3oqGqmJmB7DLqLf5Mdszu77uAbKtCmHmIZY9qosgbw6pfGrU nwag== X-Gm-Message-State: AOJu0Yw64+lG12tfsJ6d0J5svzbV1KQTA1U0SyYWzEHdxW/JcEqqykUg lxIDGYo4Bms8w90ysRNkhK5x15O3EFGLmkfAMMUA91Oub1ZfXGRoMmh5U1ubHIo= X-Google-Smtp-Source: AGHT+IFMW80dbs/pY++dtNtvzHb1WYcUEESLpcAgRLqSeB/Ioaz6lpTvNIJkRCEMb2pG4WzbE3HdTw== X-Received: by 2002:a17:903:983:b0:1ea:26bf:928 with SMTP id mb3-20020a170903098300b001ea26bf0928mr8877605plb.50.1714417607002; Mon, 29 Apr 2024 12:06:47 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([1.203.116.31]) by smtp.gmail.com with ESMTPSA id y4-20020a170902864400b001e49428f313sm20619356plt.261.2024.04.29.12.06.43 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 12:06:46 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , "Huang, Ying" , Matthew Wilcox , Chris Li , Barry Song , Ryan Roberts , Neil Brown , Minchan Kim , Hugh Dickins , David Hildenbrand , Yosry Ahmed , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 00/12] mm/swap: clean up and optimize swap cache index Date: Tue, 30 Apr 2024 03:04:48 +0800 Message-ID: <20240429190500.30979-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.44.0 Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B22C480016 X-Stat-Signature: 33qe7q5z5akcct4zqx14oxmp4ms3js31 X-HE-Tag: 1714417608-784871 X-HE-Meta: U2FsdGVkX19oB2L7THEBguU4vDHUCiaCPifL0JVz1SllMefu+3vu7vt21LDhbsS6+Ij0KhXqVqsXMplogH2S3LTgdusxHYHNEcu191I3a6xSTsOItWO25kv9wYnHaxn91pbd2uNfw5KSF55tMtRVhCVBgwjkvkCD9A8XoNF/kJN/l8UNwoa7mx5G+3fX6SNShKaVr5XaRWU4QAbW7KAT21yt7r2hyjJdJjQgc/tyTIvP6EMOj/gI0yCQbDRg4Ih6FeK8HAYo+aAHq1Ue3ttKer7n7CdSDsHj1hhXLrZZXGp/h8NnTcAciTeAmKO8FgUjU6d6TXmiqcx1XQrTq2eyCkLQAo44jCn8F+d5k6EHo8mTJz3K4O3A4LosMLOg5PCgKZ6WN1Zvz2J8ftlWq6lppVKicVfrlK2cKgN8ODvnbKCRvxJjIjTkcYc1e78LNDXsmChpdQR2u8AAVXuhhLdcOMn+usnotgQ2ZEBYf3vnygKiS/Oz7+vnbTL45DP0cg/dEVtqVLAOVHhYkY/+SgUtYuKLYpjVP3UIEMicW/e9iJgNqOqg4p10Gy54LDcoJ+3pcsHMtNkmLKQXVuMlWSPB9rIfO9+q63jkrvNW5YIAg0kTYg2bXqu9vhLYxM7SBgrd5pztNg8FnFc6XRXi08ZdTacRGnvZlzB4mng4KrQgCm16fx1zKnlgf7EuhcaVOUiw2q2wUxCOR12chiNw7c1rS9K/rx7jDKRo+GbZRkAEcIU9wFlkzrO6k4kHLFynN70xokcXxDPAnInte2EDuqBVXV+nO7vGB21k2+xJsBVgz39R3soM0NG3XSI9PbRNA6f6wlbW2Ai/aDosoqYuw3JS96oIOwfp1WSiD9l4ku9KfAeh0wFi37PD1FjL9Cz9Z4mko41GJ9d3YvNsuJT/uqw1M4piVNhM8xgWItB/hJRGqB7utOb6d4fXL9P6WQ9l8FH6iUJpdd2t1puz5KfaOeR AbatbV2P z9yR9SjMlK99XwLInKnkINcDnqspkDK2DIXIwkMdB2Q5LoMrVrXanZ/ctKi7wo9uhC6sikOpiFgYeeNHbf0NT6GGnI7tzyjtDADja6+asBhkROPCfWnJX3OOhklhRd5juRKXzL5SskaD0q/VvVEdV+XFo0LsvVTNx+rKMjn6nbfGR5NXKy7dvk35RfqfrhK/E7u+cNzIEs8kap4DRI15P5r9nQKULPUN1ZKX/Nhh+nAKFYlEzJ8JL3DLyGJFwGtatmxZySfOpiUrdJFrGPiI+yA2eVPseETvIjcuNAimTsvcvFDvOqDgj/TisoTwLFb3l4wOcXiA9z1qw88kYR3G8zp+Q2tKIxs243ian5M3r/tYnHwD+3eaeuXzuUZPwmcr2rMeXat0BI3E3gyRTlQLTtXh2LKvJObfeAP+loEZzlX1HyJIdMu2LU+02k61whxA8SRHr4hppmbCEw8i/cRn2AII9bv8llWJ7f+0xt8yUx8Agc3LjmV0rF1WwqZPhFTWuQ5/PIZa2u8F3URcqaUN/8X8gqQ57QQTRr6+sj+Zr/WSS0Hs4Jiw6f0FF78FA7LxRlggNvMeAlZTwAZE/Ht3CVYkL9BX5+QCuyigzFC35jwHeN2lGY/m3WEGiFb/gqCaUoSX/R1OUrx41cXGtpyDUtSHuma4VmhllkolPAkyE8FRv4iW4iGnMN9Zz8gWCrk7vIJ+yNs8CU52iBHjEaso7aFUoAvKxZssmIG0sJ8Wvv8JBMDd7I0L7ctRMcpUDFrO8+f2p43wJ3whqPks= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song This is based on latest mm-unstable. Patch 1/12 might not be needed if f2fs converted .readahead to use folio, I included it for easier test and review. Currently we use one swap_address_space for every 64M chunk to reduce lock contention, this is like having a set of smaller swap files inside one big swap file. But when doing swap cache look up or insert, we are still using the offset of the whole large swap file. This is OK for correctness, as the offset (key) is unique. But Xarray is specially optimized for small indexes, it creates the redix tree levels lazily to be just enough to fit the largest key stored in one Xarray. So we are wasting tree nodes unnecessarily. For 64M chunk it should only take at most 3 level to contain everything. But we are using the offset from the whole swap file, so the offset (key) value will be way beyond 64M, and so will the tree level. Optimize this by reduce the swap cache search space into 64M scope. Test with `time memhog 128G` inside a 8G memcg using 128G swap (ramdisk with SWP_SYNCHRONOUS_IO dropped, tested 3 times, results are stable. The test result is similar but the improvement is smaller if SWP_SYNCHRONOUS_IO is enabled, as swap out path can never skip swap cache): Before: 6.07user 250.74system 4:17.26elapsed 99%CPU (0avgtext+0avgdata 8373376maxresident)k 0inputs+0outputs (55major+33555018minor)pagefaults 0swaps After (+1.8% faster): 6.08user 246.09system 4:12.58elapsed 99%CPU (0avgtext+0avgdata 8373248maxresident)k 0inputs+0outputs (54major+33555027minor)pagefaults 0swaps Similar result with MySQL and sysbench using swap: Before: 94055.61 qps After (+0.8% faster): 94834.91 qps There is alse a very slight drop of radix tree node slab usage: Before: 303952K After: 302224K For this series: There are multiple places that expect mixed type of pages (page cache or swap cache), eg. migration, huge memory split; There are four helpers for that: - page_index - page_file_offset - folio_index - folio_file_pos To keep the code clean and compatible, this series first cleaned up usage of them. First page_file_offset and folio_file_pos are historical helpes that can be simply dropped after clean up. And page_index can be all converted to folio_index or folio->index. Then introduce two new helpers swap_cache_index and swap_dev_pos for swap. Replace swp_offset with swap_cache_index when used to retrieve folio from swap cache, and use swap_dev_pos when needed to retrieve the device position of a swap entry. This way, swap_cache_index can return the optimized value with no compatibility issue. Idealy, in the future, we may want to reduce SWAP_ADDRESS_SPACE_SHIFT from 14 to 12: Default Xarray chunk offset is 6, so we have 3 level trees instead of 2 level trees just for 2 extra bits. But swap cache is based on address_space struct, with 4 times more metadata sparsely distributed in memory it waste more cacheline, the performance gain from this series is almost canceled according to my test. So first, just have a cleaner seperation of offsets and smaller search space. Patch 1/12 - 11/12: Clean up usage of above helpers. Patch 11/12: Apply the optmization. V2: https://lore.kernel.org/linux-mm/20240423170339.54131-1-ryncsn@gmail.com/ Update from V2: - Clean up usage of page_file_offset and folio_file_pos [Matthew Wilcox] https://lore.kernel.org/linux-mm/ZiiFHTwgu8FGio1k@casper.infradead.org/ - Use folio in nilfs_bmap_data_get_key [Ryusuke Konishi] V1: https://lore.kernel.org/all/20240417160842.76665-1-ryncsn@gmail.com/ Update from V1: - Convert more users to use folio directly when possible [Matthew Wilcox] - Rename swap_file_pos to swap_dev_pos [Huang, Ying] - Update comments and commit message. - Adjust headers and add dummy function to fix build error. This series is part of effort to reduce swap cache overhead, and ultimately remove SWP_SYNCHRONOUS_IO and unify swap cache usage as proposed before: https://lore.kernel.org/lkml/20240326185032.72159-1-ryncsn@gmail.com/ Kairui Song (12): f2fs: drop usage of page_index nilfs2: drop usage of page_index ceph: drop usage of page_index NFS: remove nfs_page_lengthg and usage of page_index cifs: drop usage of page_file_offset afs: drop usage of folio_file_pos netfs: drop usage of folio_file_pos nfs: drop usage of folio_file_pos mm/swap: get the swap file offset directly mm: remove page_file_offset and folio_file_pos mm: drop page_index and convert folio_index to use folio mm/swap: reduce swap cache search space fs/afs/dir.c | 6 +++--- fs/afs/dir_edit.c | 4 ++-- fs/ceph/dir.c | 2 +- fs/ceph/inode.c | 2 +- fs/f2fs/data.c | 2 +- fs/netfs/buffered_read.c | 4 ++-- fs/netfs/buffered_write.c | 2 +- fs/nfs/file.c | 2 +- fs/nfs/internal.h | 19 ------------------- fs/nfs/nfstrace.h | 4 ++-- fs/nfs/write.c | 6 +++--- fs/nilfs2/bmap.c | 3 +-- fs/smb/client/file.c | 2 +- include/linux/mm.h | 13 ------------- include/linux/pagemap.h | 25 ++++--------------------- mm/huge_memory.c | 2 +- mm/memcontrol.c | 2 +- mm/mincore.c | 2 +- mm/page_io.c | 6 +++--- mm/shmem.c | 2 +- mm/swap.h | 24 ++++++++++++++++++++++++ mm/swap_state.c | 12 ++++++------ mm/swapfile.c | 11 +++++------ 23 files changed, 65 insertions(+), 92 deletions(-)