From patchwork Tue Apr 9 08:26:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13621992 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11616C67861 for ; Tue, 9 Apr 2024 08:26:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 800A16B0092; Tue, 9 Apr 2024 04:26:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B1966B0093; Tue, 9 Apr 2024 04:26:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 650BD6B0095; Tue, 9 Apr 2024 04:26:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 472AF6B0092 for ; Tue, 9 Apr 2024 04:26:53 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 760D81C0469 for ; Tue, 9 Apr 2024 08:26:52 +0000 (UTC) X-FDA: 81989312664.29.9304354 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf02.hostedemail.com (Postfix) with ESMTP id E4A2880008 for ; Tue, 9 Apr 2024 08:26:50 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TnUnMMbo; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712651211; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=QEV6OQjVxTSY2zvJnEJkUcnGhBmxeH58je50dLMPk7M=; b=dLWCR0NNFIZGdgB/HbhYYPcm4laQCDI9wi2gayvXQ2eOGdlBdbRpFwzqJc6ErKFV+66hmy h7UDAQ5PWqo/Jwu0FzdsJHbzcF44wYhQY8o3LYoSJJSRHDHb1jIXz3C1oiIqRueubjOHNT 5hNS4oRtp5S9NH0N147pq42jnOVmeFw= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TnUnMMbo; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712651211; a=rsa-sha256; cv=none; b=wHSLEaQ7hunDNyMoJpqlZtYBSJiyy1Zp4X1STRPsViY3A5sZwmqC1DH9H2wTr1XJKl4bRM uLAps30mzHhUBisDwJb17vDKIRMvKCRngtVqqzMp00CmwwPTk+XvBJTKayrrX66/NHI+9r Q/2Lmoq2q1E3lg3d6TWnFSzScwaP3Dc= Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-6ecfeefe94cso2614532b3a.0 for ; Tue, 09 Apr 2024 01:26:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712651210; x=1713256010; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=QEV6OQjVxTSY2zvJnEJkUcnGhBmxeH58je50dLMPk7M=; b=TnUnMMboml2bhcHoAvtsGLc+YzvEJMx8uHCA/JkrXJ/DThbAfSqzI3wQ8Q/yc8lEvN PM4M5pp4yjV6qRuRJUPk+/wt8dlwwymz0Ws9xbQrSPFD4er2zrcFFANGcvr55ujGdnwp jwkO9yiSdSSqw7YjwKpI2cDL5iAnuDkQ/mc1uIEMft+vIm3u5ghN3s2BXlYRqQsqc19w FHOKnd9+yh6mHnv+JiOvlPb4Fma8jWFxTD8gbPwfzE5YABNPXmYglrpD8Z6YxPsMLPUD IhIg7h/bPfcIqr7uLlcf/pCuH3LqNX0nZFhaaCRVY6vRnhFr6ppL1ZDOiIIavphWnRTs Xd4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712651210; x=1713256010; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=QEV6OQjVxTSY2zvJnEJkUcnGhBmxeH58je50dLMPk7M=; b=b+v2qSlReOQyESQHkhmj+0FFCDsOfNPz9YaslBwGvgRrSKG9it3fKeYwBO8sGWi7zL Ajwg522Wt3FCBF2AyVO7T+t2oa9SuF+CEpKV0JvJfId36LEB/XyBT1ci41B1FCnJASEa SMU8TYm2I2uqeWDj53I3KDuJtlBVeDJ7azL7bI8kAWlHM+re4uV6wlSceFaXOt1UKkoz 18M2As4LWfoQ6eDtGUJBhynJTXNlpP74CjgkftM097vwmLnAgyH7rPqE+GefcY/xVj0/ v5HxfEsNwsPah02uNs3YxYaRnL77gazvgHqBoaOKOAErGw/+mGntABfYOnK9fEBMixh6 IYvg== X-Forwarded-Encrypted: i=1; AJvYcCWgszZyfLbbqdEGtwdSeQLyabmzGXdCclshRMB/INOCnptCjfOPbqTbNV9CsH49xdl/7fVvQMSeo1ls6pbIcAjP6Tg= X-Gm-Message-State: AOJu0YwlNq6/Cc8AfCU7FUFZTnMQd8qNp2h0HjS2Tluww4jvzma4ZQCL vtuhP+wdG6ILaS47CAQb5sxm9Kai6MlVlMAq0bbqeYyjx2QFhNre X-Google-Smtp-Source: AGHT+IHnSXKOHVSAMgfJF8XXWiIQpW7P/2klv+o3448Nul1gsbWu8MZY9Ik8wT9wAGSzIPftDqJIuA== X-Received: by 2002:a05:6a20:9784:b0:1a7:48de:b2a4 with SMTP id hx4-20020a056a20978400b001a748deb2a4mr2552682pzc.6.1712651209608; Tue, 09 Apr 2024 01:26:49 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id r16-20020a632050000000b005f05c88c149sm7594238pgm.71.2024.04.09.01.26.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Apr 2024 01:26:49 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hanchuanhua@oppo.com, hannes@cmpxchg.org, hughd@google.com, kasong@tencent.com, ryan.roberts@arm.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, ziy@nvidia.com, linux-kernel@vger.kernel.org Subject: [PATCH v2 0/5] large folios swap-in: handle refault cases first Date: Tue, 9 Apr 2024 20:26:26 +1200 Message-Id: <20240409082631.187483-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E4A2880008 X-Stat-Signature: 7j677gjeonqp7nuoqhi5qfqkb8bcbjz8 X-HE-Tag: 1712651210-843798 X-HE-Meta: U2FsdGVkX1+Pnmi79EewR3rnoyI7eWGhjFhrfMnU12dxaJ0p5h9jo6IcRM1nyYux3LGeJ/jePgkXPjsWbMD+QABKxZStVvm9ClV45Scw4H7ZvYXEporImqHMHsi+v/6a/o95muoOVl6a5tjx3eHrfvCR5RZ0IliYpnZco79NYXk4vQSRRF5HkzeSyimp5J0jmTHB+t3zR7uQZmgzd+aJPn0t5JR47NKLQGk9Dbs9iUHS2zF4fwndclC+xlrSOaVWpyTZgIF/7maFcyhIqfKB88yPeqt+WcqnCGvv/Re1Jqx1DqfqY2CvP7vfdBXWnNDdpKVJhNsZbBKgDE7EIfnSrFmIX3pLIgY2e4LhF8KLqqOtFjcaOCaSWjqbd4NMxDrzPmzaMuPhrSTdiM69yX6eYv1bGYz7Ffi/d285Vp8QluIj+9q/5YIvO2hmB4QIwftkZyclrhQrTGIfe8aEzBerSjG0U38Au/27klN25CfnNF/JWIo9dUr3h/JLiZWyYVuGGi1xwy3Mnduqwaep98FHSFFWjrRPBiL2yxc2cEIgGpiq35lqdZDh1zEdElogFD0zplu5OckIAT/qD6hlyxagPfKasf+fugme4Aq5tTOqVp8JAerbL4aEEqH+dFIZmDupat4Yt6y+crnCjs1aBLyePY18PImj7CPVTLgzQAKNDzVJp8Kbc8wt6wkTjcbMBLxNSptREs1JxELjJJh3On18u6fcbFVC4xywVBuT+KpJmmvwvcdBlPB75lStpsgtAMEgPIxNuiQ7srbsJbhSUuZ3j8DVcQh08bR8nsQctSV0iAmwxDaGg6h3MpuHjQ5fYYe1kRrJ4BbK1G8QDlNyamy6w3TwtAd34TrL0WLorNcZy7WvOv4NOasXbCmQl1eexrl8Dh5sUz+kI1rxcJN2zUnBlCp2GWNWp6lkU6bnkprss24bBr1U6zKOjD46Y6tznwxA8iTApJrgDwY/WdH1cSb Ymqp4JF/ y1N29zwK+HrdiumLXRKMtxe+/2KSqw7iabYwpK1PlKqezNJWNJCE5DyDCi5wLNDgNxWpMY3MtmUdvuGYxIuZHFTnTbpsdNC1XU5FygRiWWZS2vy9FhBgXphEX5w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song This patch is extracted from the large folio swapin series[1], primarily addressing the handling of scenarios involving large folios in the swap cache. Currently, it is particularly focused on addressing the refaulting of mTHP, which is still undergoing reclamation. This approach aims to streamline code review and expedite the integration of this segment into the MM tree. It relies on Ryan's swap-out series v7[2], leveraging the helper function swap_pte_batch() introduced by that series. Presently, do_swap_page only encounters a large folio in the swap cache before the large folio is released by vmscan. However, the code should remain equally useful once we support large folio swap-in via swapin_readahead(). This approach can effectively reduce page faults and eliminate most redundant checks and early exits for MTE restoration in recent MTE patchset[3]. The large folio swap-in for SWP_SYNCHRONOUS_IO and swapin_readahead() will be split into separate patch sets and sent at a later time. -v2: - rebase on top of mm-unstable in which Ryan's swap_pte_batch() has changed a lot. - remove folio_add_new_anon_rmap() for !folio_test_anon() as currently large folios are always anon(refault). - add mTHP swpin refault counters -v1: Link: https://lore.kernel.org/linux-mm/20240402073237.240995-1-21cnbao@gmail.com/ Differences with the original large folios swap-in series - collect r-o-b, acked; - rename swap_nr_free to swap_free_nr, according to Ryan; - limit the maximum kernel stack usage for swap_free_nr, Ryan; - add output argument in swap_pte_batch to expose if all entries are exclusive - many clean refinements, handle the corner case folio's virtual addr might not be naturally aligned [1] https://lore.kernel.org/linux-mm/20240304081348.197341-1-21cnbao@gmail.com/ [2] https://lore.kernel.org/linux-mm/20240408183946.2991168-1-ryan.roberts@arm.com/ [3] https://lore.kernel.org/linux-mm/20240322114136.61386-1-21cnbao@gmail.com/ Barry Song (1): mm: swap_pte_batch: add an output argument to reture if all swap entries are exclusive mm: add per-order mTHP swpin_refault counter Chuanhua Han (3): mm: swap: introduce swap_free_nr() for batched swap_free() mm: swap: make should_try_to_free_swap() support large-folio mm: swap: entirely map large folios found in swapcache include/linux/huge_mm.h | 1 + include/linux/swap.h | 5 +++ mm/huge_memory.c | 2 ++ mm/internal.h | 9 +++++- mm/madvise.c | 2 +- mm/memory.c | 69 ++++++++++++++++++++++++++++++++--------- mm/swapfile.c | 51 ++++++++++++++++++++++++++++++ 7 files changed, 123 insertions(+), 16 deletions(-) Appendix: The following program can generate numerous instances where large folios are hit in the swap cache if we enable 64KiB mTHP, #echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled #define DATA_SIZE (128UL * 1024) #define PAGE_SIZE (4UL * 1024) #define LARGE_FOLIO_SIZE (64UL * 1024) static void *write_data(void *addr) { unsigned long i; for (i = 0; i < DATA_SIZE; i += PAGE_SIZE) memset(addr + i, (char)i, PAGE_SIZE); } static void *read_data(void *addr) { unsigned long i; for (i = 0; i < DATA_SIZE; i += PAGE_SIZE) { if (*((char *)addr + i) != (char)i) { perror("mismatched data"); _exit(-1); } } } static void *pgout_data(void *addr) { madvise(addr, DATA_SIZE, MADV_PAGEOUT); } int main(int argc, char **argv) { for (int i = 0; i < 10000; i++) { pthread_t tid1, tid2; void *addr = mmap(NULL, DATA_SIZE * 2, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); unsigned long aligned_addr = ((unsigned long)addr + LARGE_FOLIO_SIZE) & ~(LARGE_FOLIO_SIZE - 1); if (addr == MAP_FAILED) { perror("fail to malloc"); return -1; } write_data(aligned_addr); if (pthread_create(&tid1, NULL, pgout_data, (void *)aligned_addr)) { perror("fail to pthread_create"); return -1; } if (pthread_create(&tid2, NULL, read_data, (void *)aligned_addr)) { perror("fail to pthread_create"); return -1; } pthread_join(tid1, NULL); pthread_join(tid2, NULL); munmap(addr, DATA_SIZE * 2); } return 0; } # cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/anon_swpout 932 # cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/anon_swpin_refault 1488