From patchwork Fri Oct 18 06:47:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 13841259 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48F02D3C54C for ; Fri, 18 Oct 2024 06:48:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D900C6B00AC; Fri, 18 Oct 2024 02:48:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D40336B00AE; Fri, 18 Oct 2024 02:48:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C08646B00AF; Fri, 18 Oct 2024 02:48:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9F1926B00AC for ; Fri, 18 Oct 2024 02:48:11 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1B59814185B for ; Fri, 18 Oct 2024 06:47:59 +0000 (UTC) X-FDA: 82685793330.15.054C9DA Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by imf24.hostedemail.com (Postfix) with ESMTP id E718D180009 for ; Fri, 18 Oct 2024 06:48:06 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fJ0aDIcZ; spf=pass (imf24.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729233942; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tfa3AhuC5v0Hnd+2+sLbYPM0N2OtZZfu9aqDB5NMoM8=; b=LsIR+aPk8ErOKuYkaiXwhVrPCN7rwbzFQoLDxeTzPpflHkUluTAtD2j81zc673lOIDwQCk csCeU0jClzLTu5Ayw8beVpCt0tQZuQ7u5atwLi+1b08eV5IFJaXjwN/PsQlJuipl64BgqF f7T32Y1VUA2F45p6Nnuvh1TcNykDnK0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729233942; a=rsa-sha256; cv=none; b=pSVIeGCwfBbieObF+cQtVYzYSA1HQDCimGu9yTSw0t9vQPKrJI6b8blICOzPnz1+LI9Ac5 ngECTurzPHDdtTDSKpxO6B6QikCbUjCGZgrhR0tOgzk7uh+zKp0ueoy/mPEPtprz0Z0A28 IhQ+CTLAPNh/xj7/+kG3Np+W7FG2e7U= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fJ0aDIcZ; spf=pass (imf24.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729234089; x=1760770089; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vgbP0deIxDuw/qX2EJ50fTnXNEzCSKwaUCbRMPN4Zeg=; b=fJ0aDIcZaTxjwPZtQzaCSLYiMdcN8th3q9avRVO5kKpDi7c1qeqLucaJ swWkpxixqYPJaWL8OAolLbtxvVh2FEKwzfYGEIUJPe3HPeVkphFFn5frX /yMb/50ySgx1tYmEbGCT9fENGAN8HmPsiHqbrne25+8K1/KdY964vzwrM CAkRjYuJwZeC49vaWa4djHhaYajkQzvXkTtOYGKCd3eilRLbu+Aj3/e15 6EXLAUI8SQ0cpQLymARW9o9sFMod+OIog2sjAWrnkntIx2QCBaBh1m6wH YQYma1ZfXyPnbsqva3eBb8uMXkTf9lP9ePscgY2klNjx3IbaKe8uks5Ne A==; X-CSE-ConnectionGUID: tbL/JpEhRHe9OLaUG9GxqQ== X-CSE-MsgGUID: FC8wbEC1Rzmxq3ygoX3+4A== X-IronPort-AV: E=McAfee;i="6700,10204,11228"; a="28963316" X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="28963316" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Oct 2024 23:48:07 -0700 X-CSE-ConnectionGUID: UUU+zR47R3umRvDc5akGeQ== X-CSE-MsgGUID: nOsiFCOlRUqm01m/QxIO7g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="82744499" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.6]) by fmviesa003.fm.intel.com with ESMTP; 17 Oct 2024 23:48:06 -0700 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, bfoster@redhat.com, dchinner@redhat.com, chrisl@kernel.org, david@redhat.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [RFC PATCH v1 1/7] mm: zswap: Config variable to enable zswap loads with decompress batching. Date: Thu, 17 Oct 2024 23:47:59 -0700 Message-Id: <20241018064805.336490-2-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> References: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> MIME-Version: 1.0 X-Stat-Signature: y11byyd56g5x53zo6y6ni84zcbny5836 X-Rspamd-Queue-Id: E718D180009 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729234086-913555 X-HE-Meta: U2FsdGVkX19P2adk75bEVZw5hEcyq5Pf5KbwAjxHsJE2QdPSH0aHCufkhs6DRXJ8jDQ2822i3yVIBdY/95xvtsmKhiN2RtVMspfoFz3by8atGWzHyxKuZwoSem6zNpF3CzXnAIuzdKkGaEfk6mFNcGrlO/BBY8BMr0txrRwq9xR11RD5JhWA0r49mlNxoyRshWY6u6JuLu+YYHwx/vC5Wq6PIEx8UCn9Uszf1rl7ddxgMR6+d4m7nAFmy644he5SgCApcT/dTrcBYHpxabs9kPgO29bzNqs18Qh+Z/tIY0yUOlWkEVHHzHRmFmYF11wMxipqCNafuvAsuIW6D7Vv7ppArgnWKRBERIYIPH66ikZNoURIzoRgsB4oK6Tl2jKDK8kbj3+rseydsS4odU2gdkdPfnVGpdgQvfsCEJxd2Qw7oT4U8UJiWr3HQ+xhtluOLi3+NNoO18+CoEkY3uJUWhg4FF0r8Or5JT7Q8AVE7FCrmLtp18Q8635hpl34LflrkYtaXKurDgLLk+hswnhchEX/z7dfxEck3kGTvaX5yBarP3lgOWABq9h4B3+XOikwCP1buqwNpliTuVr0Mw5lZTiB3MULefdGfCQvYGXoko4BBwqbJBslpYBqBM/OGujZL3JTvniv/Vf1NjjROmLF+Ti9dPNslJZA+jMYizzaJDGnulB0Tez5Jh82sZEkyOPqUbGh8kgSiLb4By84/eM6FL/mfW8kl0iJLE9oJHKaNRuVkmF/pe68JdTJD5DSoYRd/REun6lSlM8d8PqYORjYtVp1A55An/w58uuHBsErT5xojtZ8ANHeF28mx8JYvWPcE3LmPfZO684G1BLaD9t6ZYZgOeNn8o2G3TzoDwVzWzTdw4KHmwRL1OOYNT5i6MlbChfVYDo8c5mkvD/cCvrbmAiy28fifwunk8MbU1AwQvZ8Vo2NXIfCxQiimSq+e9sxFyLKsB0bJTigrhmx38r 18wIXR6/ vjecvBXGPCACpT9PgcjT/QC5cHRLw2rlV7IDGky0l+iJDHotCLvO7qCD8MHKfFhWJjjOtLcgsVorS9QB6fAM8DbgPQDCCCwL16aWMSXyLRgI4do2mIv+U1U6n3F0r3HP5b74wlQPJQE1KxQsniiEnYQ1RV240FO1FMiJr7Z/61ggJJfjmhx3h3VOlXA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add a new zswap config variable that controls whether zswap load will decompress a batch of 4K folios, for instance, the folios prefetched during swapin_readahead(): CONFIG_ZSWAP_LOAD_BATCHING_ENABLED The existing CONFIG_CRYPTO_DEV_IAA_CRYPTO variable added in commit ea7a5cbb4369 ("crypto: iaa - Add Intel IAA Compression Accelerator crypto driver core") is used to detect if the system has the Intel Analytics Accelerator (IAA), and the iaa_crypto module is available. If so, the kernel build will prompt for CONFIG_ZSWAP_LOAD_BATCHING_ENABLED. Hence, users have the ability to set CONFIG_ZSWAP_LOAD_BATCHING_ENABLED="y" only on systems that have Intel IAA. If CONFIG_ZSWAP_LOAD_BATCHING_ENABLED is enabled, and IAA is configured as the zswap compressor, the vm.page-cluster is used to prefetch up to 32 4K folios using swapin_readahead(). The readahead folios present in zswap are then loaded as a batch using IAA decompression batching. The patch also implements a zswap API that returns the status of this config variable. Signed-off-by: Kanchana P Sridhar --- include/linux/zswap.h | 8 ++++++++ mm/Kconfig | 13 +++++++++++++ mm/zswap.c | 12 ++++++++++++ 3 files changed, 33 insertions(+) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 328a1e09d502..294d13efbfb1 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -118,6 +118,9 @@ static inline void zswap_store_batch(struct swap_in_memory_cache_cb *simc) else __zswap_store_batch_single(simc); } + +bool zswap_load_batching_enabled(void); + unsigned long zswap_total_pages(void); bool zswap_store(struct folio *folio); bool zswap_load(struct folio *folio); @@ -145,6 +148,11 @@ static inline void zswap_store_batch(struct swap_in_memory_cache_cb *simc) { } +static inline bool zswap_load_batching_enabled(void) +{ + return false; +} + static inline bool zswap_store(struct folio *folio) { return false; diff --git a/mm/Kconfig b/mm/Kconfig index 26d1a5cee471..98e46a3cf0e3 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -137,6 +137,19 @@ config ZSWAP_STORE_BATCHING_ENABLED in the folio in hardware, thereby improving large folio compression throughput and reducing swapout latency. +config ZSWAP_LOAD_BATCHING_ENABLED + bool "Batching of zswap loads of 4K folios with Intel IAA" + depends on ZSWAP && CRYPTO_DEV_IAA_CRYPTO + default n + help + Enables zswap_load to swapin multiple 4K folios in batches of 8, + rather than a folio at a time, if the system has Intel IAA for hardware + acceleration of decompressions. swapin_readahead will be used to + prefetch a batch of folios to be swapped in along with the faulting + folio. If IAA is the zswap compressor, this will parallelize batch + decompression of upto 8 folios in hardware, thereby reducing swapin + and do_swap_page latency. + choice prompt "Default allocator" depends on ZSWAP diff --git a/mm/zswap.c b/mm/zswap.c index 68ce498ad000..fe7bc2a6672e 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -136,6 +136,13 @@ module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644); static bool __zswap_store_batching_enabled = IS_ENABLED( CONFIG_ZSWAP_STORE_BATCHING_ENABLED); +/* + * Enable/disable batching of decompressions of multiple 4K folios, if + * the system has Intel IAA. + */ +static bool __zswap_load_batching_enabled = IS_ENABLED( + CONFIG_ZSWAP_LOAD_BATCHING_ENABLED); + bool zswap_is_enabled(void) { return zswap_enabled; @@ -246,6 +253,11 @@ __always_inline bool zswap_store_batching_enabled(void) return __zswap_store_batching_enabled; } +__always_inline bool zswap_load_batching_enabled(void) +{ + return __zswap_load_batching_enabled; +} + static void __zswap_store_batch_core( int node_id, struct folio **folios, From patchwork Fri Oct 18 06:48:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 13841261 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7258D3C550 for ; Fri, 18 Oct 2024 06:48:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 30AFA6B00AF; Fri, 18 Oct 2024 02:48:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BA516B00B0; Fri, 18 Oct 2024 02:48:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 098C16B00B2; Fri, 18 Oct 2024 02:48:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DC8616B00AF for ; Fri, 18 Oct 2024 02:48:13 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 05D74C01D7 for ; Fri, 18 Oct 2024 06:48:01 +0000 (UTC) X-FDA: 82685792994.17.00F47F4 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by imf09.hostedemail.com (Postfix) with ESMTP id B17CE14000B for ; Fri, 18 Oct 2024 06:48:03 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=bADo1Nlp; spf=pass (imf09.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729233897; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S9nsf42tROqW3D8twnvZYfm32QHvk8fwp780g5mQo+E=; b=q854Xl/kzHf6eULmcIvxomEmQO1EKa6xmUW3B+QCSREB9kc3X8gOqy+QaDtkR38zAEsN4Q cUgn8ngF0u+hgSLfNFFgng1+LcgbJV3BKRrbaE1U4Le1NhSUENDm1jAJ4w5ET8EbFLd5A3 1FtZ/4vIJhmgGsIPmOBy0iDRn6jyk8Q= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=bADo1Nlp; spf=pass (imf09.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729233897; a=rsa-sha256; cv=none; b=g/I5ZaBBnMF4VkPFrv1GCtoNxuX6qvsI8fExO2LrlM6jB591vJxfBFBApWvI6aAkKGdw1+ Vh32h08seoU6w0Jy0jzVSD9xvTI4r5ViuyCn/IS85OkyVM/4g9v4BqvRoWeWTUUFt/In0Z WKnDpeP5iPrmS/mT+haI7xbJn68auqs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729234091; x=1760770091; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UPuvHsF/khjGuvZn4W/MjDOW3ucu5XL70qjTmnPaytQ=; b=bADo1Nlp1rd4HaFw98AJen6xbi6XE7N2lr2uUo1oUzEW+y75DvbHzNSb MrZe8n44jqaPidmGAez2fylKEEXuQrKzb0pIGfAhzeBBPAv2GWRRI2d+D +FGpRXvZD2NU6s9poLdaNiP3J6lz/GPHNrUjx+FApm4rBWVIbVteOJwqW 83l5RG4ycfgFmQmPryfSCCAfp758OyOQ7bg3rtLIq/pY+OOHvXI98swnR exn2pyHcnEQJV2I71lGMQ+aScpmVt+gP2izH2l5Mn9pshSZx/NbklD95e 1Gp0qbdoFfacks1Te2mk6xoVTPsRCRP/S0YnRN9QYrbkX96MCAza2QJgR A==; X-CSE-ConnectionGUID: HTV7zKLNRBaHYZcxRa7Xmw== X-CSE-MsgGUID: GeRE3O5pS/WCAP9URQtzYQ== X-IronPort-AV: E=McAfee;i="6700,10204,11228"; a="28963328" X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="28963328" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Oct 2024 23:48:07 -0700 X-CSE-ConnectionGUID: NWgiSYFkS0SuqFRm5HR5lA== X-CSE-MsgGUID: UXdDj26UTrKRjCKhs7vctg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="82744504" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.6]) by fmviesa003.fm.intel.com with ESMTP; 17 Oct 2024 23:48:06 -0700 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, bfoster@redhat.com, dchinner@redhat.com, chrisl@kernel.org, david@redhat.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [RFC PATCH v1 2/7] mm: swap: Add IAA batch decompression API swap_crypto_acomp_decompress_batch(). Date: Thu, 17 Oct 2024 23:48:00 -0700 Message-Id: <20241018064805.336490-3-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> References: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B17CE14000B X-Stat-Signature: pe3fptzr13pr333k1tsjkzht631euaqz X-Rspam-User: X-HE-Tag: 1729234083-606090 X-HE-Meta: U2FsdGVkX18StdyqIDybPknpJloWehineDTEYa9Yf5KgzcoauHF9UPSC+8/Sf3YibKd9grv/hwTbIUXg3G19WT69C/Ec9/DJas2KfqKuLr+HCyCNEeiWTKex2ZeM0hVkD+aSyjfE+y8SWy3ExB6u+t5mhm1I+IA+JVfaV/prjiutJvoguerpfeJnmifU2ZheH8XRH44iP8Kb9L5A7G3L6yA0jRbeOrCCXw7W8AXYeShIM42XezA40CU8yiHEMq4pb/Dl9slIw+86PS2crJzU6BGnSwnlIwVoH2zCdB4p3bC/GNxiGRVExAw4S6ViW/k0hy8q8sLjtfM2pdLNkzXDcrBrYA4MUiyRil6xAOJBFw5/JW3ZUKeQdtGPDnR3zB/mPVQivK7qmqducMQlwuo7zKWQNh+Y4klS5TOcfOi0JtWXrppkCYqbuq1kYaEHqGZonD4LTdvYo8JhNJvjPdpEU1E6xM/dDrV397nI9Ah6FdXb4HnFU9pQJnDUWH/UhEl680nYXyKtnqYkUrBwNRA+Ho7uneT1hhHWfqbq/6nFgl32Bx/0uAyc95wwj6/at9t1xP/uTckuu51RA49y9UoENtLXY/+JkPcACRfqHlyAWzvRdhLlROf5Wd7uazTayzpWvievFBqZv3AdOH/BnWo9dJYyxLc1NkAFs4IHYjRpjIqMMULiW3Bd8+J7FwayPVTa2bmw8DB7n/13ipQ7qcrlf1+gtFvlhSTOFTcaU+PJ8FLo0Sq5qYqagXNZvG/QXi0aMseRAp8ALuWXNILjfQUVRdOsAFdEfihu7GBqIV5w5f1RJWJowY0WS3YZbMpAawKuSY63ty81YX2ZkN6cnPzr751ZQkzOpnCthdreCK+JBJKNKVk0kDZIrZvok/iSG+1OGBpwGmVEY6kxKqXHSwf+EmCX+WIRFxZDdjFrSdccFj0bgStCCrXAB59Wl7ZCBluPX8rv1NLEXOfzVm8ebhS /TmjXTrv K4mPfy+uOdEaTWVPvFL30J8n9nZinihk7r0xltlir+OCrIRLEARR+thUBbOVY5AR0olwq4vrmWtx24ELy4/insJ+oRf9XSqJMcvxH/jRmcknVkkD4f76tI+VgDTdscIMzWy7s8my4WK6Y8KhkDFg3FLiDb5vt7EUXr3YaIcOPiRlRr7I12lfz/ZT04MgCwyld0v5csmdZpg2AESaNUq14IJyqjtHvFMp/XDmX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Added a new API swap_crypto_acomp_decompress_batch() that does batch decompression. A system that has Intel IAA can avail of this API to submit a batch of decompress jobs for parallel decompression in the hardware, to improve performance. On a system without IAA, this API will process each decompress job sequentially. The purpose of this API is to be invocable from any swap module that needs to decompress multiple 4K folios, or a batch of pages in the general case. For instance, zswap would decompress up to (1UL << SWAP_RA_ORDER_CEILING) folios in batches of SWAP_CRYPTO_SUB_BATCH_SIZE (i.e. 8 if the system has IAA) pages prefetched by swapin_readahead(), which would improve readahead performance. Towards this eventual goal, the swap_crypto_acomp_decompress_batch() interface is implemented in swap_state.c and exported via mm/swap.h. It would be preferable for swap_crypto_acomp_decompress_batch() to be exported via include/linux/swap.h so that modules outside mm (for e.g. zram) can potentially use the API for batch decompressions with IAA, since the swapin_readahead() batching interface is common to all swap modules. I would appreciate RFC comments on this. Signed-off-by: Kanchana P Sridhar --- mm/swap.h | 42 +++++++++++++++++-- mm/swap_state.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 147 insertions(+), 4 deletions(-) diff --git a/mm/swap.h b/mm/swap.h index 08c04954304f..0bb386b5fdee 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -10,11 +10,12 @@ struct mempolicy; #include /* - * For IAA compression batching: - * Maximum number of IAA acomp compress requests that will be processed - * in a sub-batch. + * For IAA compression/decompression batching: + * Maximum number of IAA acomp compress/decompress requests that will be + * processed in a sub-batch. */ -#if defined(CONFIG_ZSWAP_STORE_BATCHING_ENABLED) +#if defined(CONFIG_ZSWAP_STORE_BATCHING_ENABLED) || \ + defined(CONFIG_ZSWAP_LOAD_BATCHING_ENABLED) #define SWAP_CRYPTO_SUB_BATCH_SIZE 8UL #else #define SWAP_CRYPTO_SUB_BATCH_SIZE 1UL @@ -60,6 +61,29 @@ void swap_crypto_acomp_compress_batch( int nr_pages, struct crypto_acomp_ctx *acomp_ctx); +/** + * This API provides IAA decompress batching functionality for use by swap + * modules. + * The acomp_ctx mutex should be locked/unlocked before/after calling this + * procedure. + * + * @srcs: The src buffers to be decompressed. + * @pages: The pages to store the buffers decompressed by IAA. + * @slens: src buffers' compressed lengths. + * @errors: Will contain a 0 if the page was successfully decompressed, or a + * non-0 error value to be processed by the calling function. + * @nr_pages: The number of pages, up to SWAP_CRYPTO_SUB_BATCH_SIZE, + * to be decompressed. + * @acomp_ctx: The acomp context for iaa_crypto/other compressor. + */ +void swap_crypto_acomp_decompress_batch( + u8 *srcs[], + struct page *pages[], + unsigned int slens[], + int errors[], + int nr_pages, + struct crypto_acomp_ctx *acomp_ctx); + /* linux/mm/vmscan.c, linux/mm/page_io.c, linux/mm/zswap.c */ /* For batching of compressions in reclaim path. */ struct swap_in_memory_cache_cb { @@ -204,6 +228,16 @@ static inline void swap_write_in_memory_cache_unplug( { } +static inline void swap_crypto_acomp_decompress_batch( + u8 *srcs[], + struct page *pages[], + unsigned int slens[], + int errors[], + int nr_pages, + struct crypto_acomp_ctx *acomp_ctx) +{ +} + static inline void swap_read_folio(struct folio *folio, struct swap_iocb **plug) { } diff --git a/mm/swap_state.c b/mm/swap_state.c index 117c3caa5679..3cebbff40804 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -855,6 +855,115 @@ void swap_crypto_acomp_compress_batch( } EXPORT_SYMBOL_GPL(swap_crypto_acomp_compress_batch); +/** + * This API provides IAA decompress batching functionality for use by swap + * modules. + * The acomp_ctx mutex should be locked/unlocked before/after calling this + * procedure. + * + * @srcs: The src buffers to be decompressed. + * @pages: The pages to store the buffers decompressed by IAA. + * @slens: src buffers' compressed lengths. + * @errors: Will contain a 0 if the page was successfully decompressed, or a + * non-0 error value to be processed by the calling function. + * @nr_pages: The number of pages, up to SWAP_CRYPTO_SUB_BATCH_SIZE, + * to be decompressed. + * @acomp_ctx: The acomp context for iaa_crypto/other compressor. + */ +void swap_crypto_acomp_decompress_batch( + u8 *srcs[], + struct page *pages[], + unsigned int slens[], + int errors[], + int nr_pages, + struct crypto_acomp_ctx *acomp_ctx) +{ + struct scatterlist inputs[SWAP_CRYPTO_SUB_BATCH_SIZE]; + struct scatterlist outputs[SWAP_CRYPTO_SUB_BATCH_SIZE]; + unsigned int dlens[SWAP_CRYPTO_SUB_BATCH_SIZE]; + bool decompressions_done = false; + int i, j; + + BUG_ON(nr_pages > SWAP_CRYPTO_SUB_BATCH_SIZE); + + /* + * Prepare and submit acomp_reqs to IAA. + * IAA will process these decompress jobs in parallel in async mode. + * If the compressor does not support a poll() method, or if IAA is + * used in sync mode, the jobs will be processed sequentially using + * acomp_ctx->req[0] and acomp_ctx->wait. + */ + for (i = 0; i < nr_pages; ++i) { + j = acomp_ctx->acomp->poll ? i : 0; + + dlens[i] = PAGE_SIZE; + sg_init_one(&inputs[i], srcs[i], slens[i]); + sg_init_table(&outputs[i], 1); + sg_set_page(&outputs[i], pages[i], PAGE_SIZE, 0); + acomp_request_set_params(acomp_ctx->req[j], &inputs[i], + &outputs[i], slens[i], dlens[i]); + /* + * If the crypto_acomp provides an asynchronous poll() + * interface, submit the request to the driver now, and poll for + * a completion status later, after all descriptors have been + * submitted. If the crypto_acomp does not provide a poll() + * interface, submit the request and wait for it to complete, + * i.e., synchronously, before moving on to the next request. + */ + if (acomp_ctx->acomp->poll) { + errors[i] = crypto_acomp_decompress(acomp_ctx->req[j]); + + if (errors[i] != -EINPROGRESS) + errors[i] = -EINVAL; + else + errors[i] = -EAGAIN; + } else { + errors[i] = crypto_wait_req( + crypto_acomp_decompress(acomp_ctx->req[j]), + &acomp_ctx->wait); + if (!errors[i]) { + dlens[i] = acomp_ctx->req[j]->dlen; + BUG_ON(dlens[i] != PAGE_SIZE); + } + } + } + + /* + * If not doing async decompressions, the batch has been processed at + * this point and we can return. + */ + if (!acomp_ctx->acomp->poll) + return; + + /* + * Poll for and process IAA decompress job completions + * in out-of-order manner. + */ + while (!decompressions_done) { + decompressions_done = true; + + for (i = 0; i < nr_pages; ++i) { + /* + * Skip, if the decompression has already completed + * successfully or with an error. + */ + if (errors[i] != -EAGAIN) + continue; + + errors[i] = crypto_acomp_poll(acomp_ctx->req[i]); + + if (errors[i]) { + if (errors[i] == -EAGAIN) + decompressions_done = false; + } else { + dlens[i] = acomp_ctx->req[i]->dlen; + BUG_ON(dlens[i] != PAGE_SIZE); + } + } + } +} +EXPORT_SYMBOL_GPL(swap_crypto_acomp_decompress_batch); + #endif /* CONFIG_SWAP */ static int swap_vma_ra_win(struct vm_fault *vmf, unsigned long *start, From patchwork Fri Oct 18 06:48:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 13841262 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CF02D3C552 for ; Fri, 18 Oct 2024 06:48:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5BA9E6B00B0; Fri, 18 Oct 2024 02:48:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 445706B00B3; Fri, 18 Oct 2024 02:48:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 134916B00B1; Fri, 18 Oct 2024 02:48:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DD9486B00B0 for ; Fri, 18 Oct 2024 02:48:13 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E0BBD120571 for ; Fri, 18 Oct 2024 06:48:02 +0000 (UTC) X-FDA: 82685792994.20.3D606DE Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by imf24.hostedemail.com (Postfix) with ESMTP id 4DC0F180009 for ; Fri, 18 Oct 2024 06:48:09 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=A9P0PveW; spf=pass (imf24.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729233945; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jdi1Tv04WXVxopnAtIvN+CKkIJ4gn1En4c8wA0c7RIg=; b=FfjszwBfrC645SS70VnApYcWQ3sXAcTkovgkg4u+Qu1rnVJvF0YEVeAzHZEqBm6Od6m1t9 aoRLYMEJiMmZX9Ssyw9jEOzBDelYBRiCf21eOjWOQSV5ZdjRykBdnWTCDuh+CeER8gPqQA rzzB3CVK1MdSAUsEqmtyKG/qomBpyf0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729233945; a=rsa-sha256; cv=none; b=GePmlslmdHy8hHXtM6qgehEv9nwsXm6JJfOnc3ZX22oo5QqZ8cLgrVXl51n4h2QwNKYtVW 4kDQ4mgrcEvNqmgC8RUyxVCEoR7dEhek4YJ4eS2kcdBJZxuGGmLOUHotcj7doW51B/XCyj gQfInX7987TgVomvVh01NoO+3N7SV7A= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=A9P0PveW; spf=pass (imf24.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729234091; x=1760770091; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0287k7mWEmWYuvXNZAukIKoRrD9gl2Lhv4dAhuzEN6E=; b=A9P0PveWAhnXrQaO70Kw2LRlhuiQYfKsB43yJLLcPMQ6tQxxpgElR9mP PLdIlZ6vS1ziw2gXNtoiPw1rI0pWZelzVa7qQq4r0hKTxFwzXwMNQbkZN 3wKAs+FcYXiCP64hay6fuL9yxqWcTFEnLBcAMe9JD3lHo1U7ivpDz38eh vBszH6tegxlRsYAQftgQL+lBAtcClDxIxuvZ6CPnk8r5k6nXH5eyaWAff I5xZrZ1az7ucMNXyqgIdkBjAMviJLm4bHi5m3RgkXFT0QQVOkfbiwU4GM ADNfxuy7nhpvaobOYGFt05m1VXGzSrxDZRWqHShM/we9eOAXq7FZzTJVy A==; X-CSE-ConnectionGUID: UPUSoFXBQ4qIRPLEFl90GA== X-CSE-MsgGUID: QFwYFMVhS+2OfqrIIwo6dg== X-IronPort-AV: E=McAfee;i="6700,10204,11228"; a="28963340" X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="28963340" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Oct 2024 23:48:08 -0700 X-CSE-ConnectionGUID: /UVWgjWZT22cO5WouhsKFg== X-CSE-MsgGUID: +vVqT/OlTbeg2bOeVG+nKg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="82744509" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.6]) by fmviesa003.fm.intel.com with ESMTP; 17 Oct 2024 23:48:07 -0700 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, bfoster@redhat.com, dchinner@redhat.com, chrisl@kernel.org, david@redhat.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [RFC PATCH v1 3/7] pagevec: struct folio_batch changes for decompress batching interface. Date: Thu, 17 Oct 2024 23:48:01 -0700 Message-Id: <20241018064805.336490-4-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> References: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> MIME-Version: 1.0 X-Stat-Signature: hk3agwjo69qxrwc4zowtnn7dentimife X-Rspamd-Queue-Id: 4DC0F180009 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729234089-750458 X-HE-Meta: U2FsdGVkX1+nHKUYL9Nixscr/wRl3jHtSVcGrGlbCnqnnbJ01D4jqd8xaaDvCgQpMinYIT+OLjUc0rf83B3+LT1ij175QjNBkR2ksLWhLuB0TE9IroY7XDfnNlk57QHODPFjfjaU87KP6kd/KYqjgbyVldPpHnXcs0QI8hQEG5FCipnTJbO3WCbzoxGzwc+SqSEeh9C5P/qm+z3yfyJGSI1uRIQ+QlmSVXDruiSPehE2Z1QsCld8kr/ii4nekkbYQRnYv0G2SmVccYYUOki2egDUxEmeC3KBfcQNETBOo6w8/ZlkfhkzyexeOYAd0vmW6EYz+pvlx6dl24TQq4IfxApqL88RbYckwlMGKKz0ivK9TE9/hH3ozPv+NXY+d5buieWjQbzKjU1g1ZEnwSoY+lN4U7IcxUQUH5ToDhgvVRf/fVP3UJPjppT0xYUvjqmi5DU+ksteKALphi2jOrcVLtonhYNsS9qjnbmVWuoA/hiYr1SEvUBP3lC2P1rc87FvnbbzLZTS7UkOK5x3SKdmK5bOHDwp19ByDxArLSA6zHN5b8WitKK1QlLmi63Bi5+h64mueIykDuz99KF0vXEoZ4mJl+G0QUGao4RY8q8OO8BtKKZCflUBUDWI7j9PKbKXk6ZWwqwlL1nbhlkg5sRaYcHzzCy/tQVX1FUa6VvASgy87U6B7VAzJq5ppJ25Fon4jY33T1UqmUUO/33puOBv8uCzVvdkvE9qisbM6aSB4CaAI9Rp0n0as7OtnPAvcN8KYHqL9BaVv+Diici27iFp5iZMvYQq+6rCyK246Ivs+fOSEaLDfyv3Fa5TBk3ylQcXJ1trgUVJXbuG4zCwmGBQ6mEus6VJnOtRMicOlpeQzPNC5+IjgH2z3Q7xDEXHe5KukBf8002LQjX1TpUv1k/myS6cAPVAQwURno2A2aFcfW8GeXOfdSHXF8drFaCCpmV5iKSczKmkdCKPBJyBiU5 oLpREA4U FZBuGUvAAb6ub7mijYu5oux07kj0W/EjfCqr2soYKyRFN4hr0R8UhsALe5KsQxtbTSfYYTxLNWWcLIstwJ1ZkreyrXJejxceIDMNOLAC3tyG5CG3fn7GS9ABOYTlYqSBFElfRzHb7JA7UyWxTRvi+S3FkNqmMm7e1o9Zdel/0qcPRkqnx1oy45sgvsEVlWO5Rihag/wo5HfdFB0uyus0Ps16d5gpwSqK6NoX68Qri+JaUYkM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Made these changes to "struct folio_batch" for use in the swapin_readahead() based zswap load batching interface for parallel decompressions with IAA: 1) Moved SWAP_RA_ORDER_CEILING definition to pagevec.h. 2) Increased PAGEVEC_SIZE to (1UL << SWAP_RA_ORDER_CEILING), because vm.page-cluster=5 requires capacity for 32 folios. 3) Made folio_batch_add() more fail-safe. Signed-off-by: Kanchana P Sridhar --- include/linux/pagevec.h | 13 ++++++++++--- mm/swap_state.c | 2 -- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h index 5d3a0cccc6bf..c9bab240fb6e 100644 --- a/include/linux/pagevec.h +++ b/include/linux/pagevec.h @@ -11,8 +11,14 @@ #include -/* 31 pointers + header align the folio_batch structure to a power of two */ -#define PAGEVEC_SIZE 31 +/* + * For page-cluster of 5, I noticed that space for 31 pointers was + * insufficient. Increasing this to meet the requirements for folio_batch + * usage in the swap read decompress batching interface that is based on + * swapin_readahead(). + */ +#define SWAP_RA_ORDER_CEILING 5 +#define PAGEVEC_SIZE (1UL << SWAP_RA_ORDER_CEILING) struct folio; @@ -74,7 +80,8 @@ static inline unsigned int folio_batch_space(struct folio_batch *fbatch) static inline unsigned folio_batch_add(struct folio_batch *fbatch, struct folio *folio) { - fbatch->folios[fbatch->nr++] = folio; + if (folio_batch_space(fbatch) > 0) + fbatch->folios[fbatch->nr++] = folio; return folio_batch_space(fbatch); } diff --git a/mm/swap_state.c b/mm/swap_state.c index 3cebbff40804..0673593d363c 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -44,8 +44,6 @@ struct address_space *swapper_spaces[MAX_SWAPFILES] __read_mostly; static unsigned int nr_swapper_spaces[MAX_SWAPFILES] __read_mostly; static bool enable_vma_readahead __read_mostly = true; -#define SWAP_RA_ORDER_CEILING 5 - #define SWAP_RA_WIN_SHIFT (PAGE_SHIFT / 2) #define SWAP_RA_HITS_MASK ((1UL << SWAP_RA_WIN_SHIFT) - 1) #define SWAP_RA_HITS_MAX SWAP_RA_HITS_MASK From patchwork Fri Oct 18 06:48:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 13841263 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9563D3C54C for ; Fri, 18 Oct 2024 06:48:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 778806B00B1; Fri, 18 Oct 2024 02:48:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 728BC6B00B3; Fri, 18 Oct 2024 02:48:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52DA56B00B4; Fri, 18 Oct 2024 02:48:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 23BE66B00B1 for ; Fri, 18 Oct 2024 02:48:15 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C8D391A0522 for ; Fri, 18 Oct 2024 06:47:53 +0000 (UTC) X-FDA: 82685793330.27.60B5B0C Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by imf22.hostedemail.com (Postfix) with ESMTP id B04FEC0011 for ; Fri, 18 Oct 2024 06:47:59 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mJcU6saZ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf22.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729233974; a=rsa-sha256; cv=none; b=Z76/Xe7L26hFWiFSsD9Ia7V8Nga27ds8RveR2NR+6Ss5KqtKFJA801PGPqQkDT5fElb6/2 JVsLbVboATp0nzNye4XAVEV1V3yq+Zwfn5CO2qKkcC0paw3W6jOOMUHlxIBU3giWanDZa5 DBL2XIL7hp0p/78cMIk3ZxM8ad8bX9I= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mJcU6saZ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf22.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729233974; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8jJHEtOjsJteC8KRlRLdtjGYEj0q3/99gdSH10HRr4Q=; b=2FUNFwfgq9wQvK4KnrXxj4HvvP8iaoqGvxt+2hf12cpDnGTCNw0R0PZ1nPDqqCnV2JcJOr 8607FsK39vX3+fW5iwzoz7jrNU4+H8Nu3qfCp/DSwrXOJBx04O/a00UcaVbAz7fdIilO0O 2qWT0wGfx+U950naFjssMH+9FxRs3EE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729234093; x=1760770093; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Wk7y1NCaq5vrYXmMN8sm97ljo3rd4bPykdD24mhfkEY=; b=mJcU6saZyHpDM90rDsTmOPfxFoI4t3K93rpOShEu/X/j3aHMCKsnTOSV q8f83kHbdG7lJ3vA/K/NmhJeD72rZpHLuOz9z4Fi5/AWOoA3UtCrQcSre 7stmz+S940khKkImp4Q9QJXgjCZS+6wKaUNtMNF4zEOTWk0VGeVa+LHxX HqIEEmCBd5IQWYJfL0VMZVEl5hQsQBfVVJZiX1PYOL7IIqzzZF0dt5HIr 2YxIcqBEuCxkKd+zXZoegr1xXjAEvbryAALXXa5o0suuw3XbxOgwN+hqu W4kIXTMIgYLnncjZCwxT6hypgW4t7utPCP+Mxst40NOx4DeCqNS4pfRIS g==; X-CSE-ConnectionGUID: sVEEhtWWQcuuAa/g5Kzl0g== X-CSE-MsgGUID: CVvDVwtzQ8SxAlFJXYXBEg== X-IronPort-AV: E=McAfee;i="6700,10204,11228"; a="28963352" X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="28963352" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Oct 2024 23:48:09 -0700 X-CSE-ConnectionGUID: 2dVk8TXESRGmpz6l97XWIg== X-CSE-MsgGUID: 3TYx2nS6SfmEhvdE+9apQQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="82744516" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.6]) by fmviesa003.fm.intel.com with ESMTP; 17 Oct 2024 23:48:08 -0700 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, bfoster@redhat.com, dchinner@redhat.com, chrisl@kernel.org, david@redhat.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [RFC PATCH v1 4/7] mm: swap: swap_read_folio() can add a folio to a folio_batch if it is in zswap. Date: Thu, 17 Oct 2024 23:48:02 -0700 Message-Id: <20241018064805.336490-5-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> References: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: B04FEC0011 X-Stat-Signature: g8yxc5sgpygf81maqs6i588446funaqt X-Rspam-User: X-HE-Tag: 1729234079-126099 X-HE-Meta: U2FsdGVkX19ehs9Nyel+MRIUG6z5mPPUSYo2jDEe99d0hahXQlNXxS5zAwkMqnGh6YH21MBennfrUVw1dUcX223H5Y6rAYWjTxIsdSLyfTF2NqP8yN9VKJPkqI2L+dUl4FhqR677MBGK3hvBRgTgnmjG/bZlEvPV1GJkGgCcea7ro+OtzQtDCWAqVcwkpXVRfhp8hWiThEvJ6WzMXpr/BnP/buEHSKuxdFBFtUIS+DoKww8lI9m/RdOt/7mJWl5uNCgaM3xQSZrztwhRaRHah0+/JlIe8cFiMSvwFbWh1GnDzEpX5zBb+BHlsQlntclKkq81JO/oeAbgeojVt4goO2AZef/UEquiyS8JVxS9FtpHjminw7DREYS3hoT0CgS+oOAbHyt67i8Xfl6U5vzz9JjEb9fJJvfGU6WSWd9n4CGL3RT0UCnGwlY5QnUYhrirVXci4onDs/PMmrtVxs7vf8yJAztk1TB37s8lh7JNB52EufyKa+GzjfMnnQ7ZqxITYfFdRc0auVR819IDVr/sCIpGy2atZOyEffgVoJICiUJViJB5clDDcF2BprpoWsrcT8HiKU0HgpPifq0oRE6ZxCZISYKnaggOQGzGeuEtsEE57lnjy/wexCEK+n0Q5/W43GDNskQltvfAHO0siM2ep3xTyL7E7snePbigMVI0h7f9PnAviKJsN/Ap9yxjVu/H43SoWd6mdb6iQG67KD2x25S3gbLNfQokMRLzFc73bdI3Het0s/sRGgZchr0kNOaVK0CPcryYNAyrYrRXvaMtZrZwg1iYwjg9mKu/M5lvvhAud8R9Y6uc1T5L3Q8EQUxmf7DmO7b+WnMNsA0vGnzpEsfujOWvxxkSSqSaXsJopPrAhkbEj4W92LJB0RV6ebi51D7pG1MxM0UlLVgEc+HJ+vTMD9GQOP7Oa50wTfhNexJrIx9PkU4POfLSu8cDYpgHg+MWFaVSUpgfV/OFaCU nzvOIMIs u14HFVhTlTYlITR/o4TGYXjwlCmYl7TmKPo9MmMvctvycNdfqjY5QD3Mi+3YCwsrFtM8cqvO8ny+6honvsF9c6Z7neY7O7jRldBo0txk3wvNxs+CK3u0Nl+RpdUGCCs6fKqjgH0trVePitXDcePCAwPxvqK39FdsP23Nx5LNzgxv/v2To7hcS4C7mu1yNDpD4mulw7GQ0mPEcaSHqph/bb2thNM9/48JWPLxNcNrBC4eQCY61ie6XK7P1fwVUi2QEqbas X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch modifies swap_read_folio() to check if the swap entry is present in zswap, and if so, it will be added to a "zswap_batch" folio_batch, if the caller (e.g. swapin_readahead()) has passed in a valid "zswap_batch". If the swap entry is found in zswap, it will be added the next available index in a sub-batch. This sub-batch is part of "struct zswap_decomp_batch" which progressively constructs SWAP_CRYPTO_SUB_BATCH_SIZE arrays of zswap entries/xarrays/pages/source-lengths ready for batch decompression in IAA. The function that does this, zswap_add_load_batch(), will return true to swap_read_folio(). If the entry is not found in zswap, it will return false. If the swap entry was not found in zswap, and if zswap_load_batching_enabled() and a valid "non_zswap_batch" folio_batch is passed to swap_read_folio(), the folio will be added to the "non_zswap_batch" batch. Finally, the code falls through to the usual/existing swap_read_folio() flow. Signed-off-by: Kanchana P Sridhar --- include/linux/zswap.h | 35 +++++++++++++++++ mm/memory.c | 2 +- mm/page_io.c | 26 ++++++++++++- mm/swap.h | 31 ++++++++++++++- mm/swap_state.c | 10 ++--- mm/zswap.c | 89 +++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 183 insertions(+), 10 deletions(-) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 294d13efbfb1..1d6de281f243 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -12,6 +12,8 @@ extern atomic_long_t zswap_stored_pages; #ifdef CONFIG_ZSWAP struct swap_in_memory_cache_cb; +struct zswap_decomp_batch; +struct zswap_entry; struct zswap_lruvec_state { /* @@ -120,6 +122,19 @@ static inline void zswap_store_batch(struct swap_in_memory_cache_cb *simc) } bool zswap_load_batching_enabled(void); +void zswap_load_batch_init(struct zswap_decomp_batch *zd_batch); +void zswap_load_batch_reinit(struct zswap_decomp_batch *zd_batch); +bool __zswap_add_load_batch(struct zswap_decomp_batch *zd_batch, + struct folio *folio); +static inline bool zswap_add_load_batch( + struct zswap_decomp_batch *zd_batch, + struct folio *folio) +{ + if (zswap_load_batching_enabled()) + return __zswap_add_load_batch(zd_batch, folio); + + return false; +} unsigned long zswap_total_pages(void); bool zswap_store(struct folio *folio); @@ -138,6 +153,8 @@ struct zswap_lruvec_state {}; struct zswap_store_sub_batch_page {}; struct zswap_store_pipeline_state {}; struct swap_in_memory_cache_cb; +struct zswap_decomp_batch; +struct zswap_entry; static inline bool zswap_store_batching_enabled(void) { @@ -153,6 +170,24 @@ static inline bool zswap_load_batching_enabled(void) return false; } +static inline void zswap_load_batch_init( + struct zswap_decomp_batch *zd_batch) +{ +} + +static inline void zswap_load_batch_reinit( + struct zswap_decomp_batch *zd_batch) +{ +} + +static inline bool zswap_add_load_batch( + struct folio *folio, + struct zswap_entry *entry, + struct zswap_decomp_batch *zd_batch) +{ + return false; +} + static inline bool zswap_store(struct folio *folio) { return false; diff --git a/mm/memory.c b/mm/memory.c index 0f614523b9f4..b5745b9ffdf7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4322,7 +4322,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* To provide entry to swap_read_folio() */ folio->swap = entry; - swap_read_folio(folio, NULL); + swap_read_folio(folio, NULL, NULL, NULL); folio->private = NULL; } } else { diff --git a/mm/page_io.c b/mm/page_io.c index 065db25309b8..9750302d193b 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -744,11 +744,17 @@ static void swap_read_folio_bdev_async(struct folio *folio, submit_bio(bio); } -void swap_read_folio(struct folio *folio, struct swap_iocb **plug) +/* + * Returns true if the folio was read, and false if the folio was added to + * the zswap_decomp_batch for batched decompression. + */ +bool swap_read_folio(struct folio *folio, struct swap_iocb **plug, + struct zswap_decomp_batch *zswap_batch, + struct folio_batch *non_zswap_batch) { struct swap_info_struct *sis = swp_swap_info(folio->swap); bool synchronous = sis->flags & SWP_SYNCHRONOUS_IO; - bool workingset = folio_test_workingset(folio); + bool workingset; unsigned long pflags; bool in_thrashing; @@ -756,11 +762,26 @@ void swap_read_folio(struct folio *folio, struct swap_iocb **plug) VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(folio_test_uptodate(folio), folio); + /* + * If entry is found in zswap xarray, and zswap load batching + * is enabled, this is a candidate for zswap batch decompression. + */ + if (zswap_batch && zswap_add_load_batch(zswap_batch, folio)) + return false; + + if (zswap_load_batching_enabled() && non_zswap_batch) { + BUG_ON(!folio_batch_space(non_zswap_batch)); + folio_batch_add(non_zswap_batch, folio); + return false; + } + /* * Count submission time as memory stall and delay. When the device * is congested, or the submitting cgroup IO-throttled, submission * can be a significant part of overall IO time. */ + workingset = folio_test_workingset(folio); + if (workingset) { delayacct_thrashing_start(&in_thrashing); psi_memstall_enter(&pflags); @@ -792,6 +813,7 @@ void swap_read_folio(struct folio *folio, struct swap_iocb **plug) psi_memstall_leave(&pflags); } delayacct_swapin_end(); + return true; } void __swap_read_unplug(struct swap_iocb *sio) diff --git a/mm/swap.h b/mm/swap.h index 0bb386b5fdee..310f99007fe6 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -84,6 +84,27 @@ void swap_crypto_acomp_decompress_batch( int nr_pages, struct crypto_acomp_ctx *acomp_ctx); +#if defined(CONFIG_ZSWAP_LOAD_BATCHING_ENABLED) +#define MAX_NR_ZSWAP_LOAD_SUB_BATCHES DIV_ROUND_UP(PAGEVEC_SIZE, \ + SWAP_CRYPTO_SUB_BATCH_SIZE) +#else +#define MAX_NR_ZSWAP_LOAD_SUB_BATCHES 1UL +#endif /* CONFIG_ZSWAP_LOAD_BATCHING_ENABLED */ + +/* + * Note: If PAGEVEC_SIZE or SWAP_CRYPTO_SUB_BATCH_SIZE + * exceeds 256, change the u8 to u16. + */ +struct zswap_decomp_batch { + struct folio_batch fbatch; + bool swapcache[PAGEVEC_SIZE]; + struct xarray *trees[MAX_NR_ZSWAP_LOAD_SUB_BATCHES][SWAP_CRYPTO_SUB_BATCH_SIZE]; + struct zswap_entry *entries[MAX_NR_ZSWAP_LOAD_SUB_BATCHES][SWAP_CRYPTO_SUB_BATCH_SIZE]; + struct page *pages[MAX_NR_ZSWAP_LOAD_SUB_BATCHES][SWAP_CRYPTO_SUB_BATCH_SIZE]; + unsigned int slens[MAX_NR_ZSWAP_LOAD_SUB_BATCHES][SWAP_CRYPTO_SUB_BATCH_SIZE]; + u8 nr_decomp[MAX_NR_ZSWAP_LOAD_SUB_BATCHES]; +}; + /* linux/mm/vmscan.c, linux/mm/page_io.c, linux/mm/zswap.c */ /* For batching of compressions in reclaim path. */ struct swap_in_memory_cache_cb { @@ -101,7 +122,9 @@ struct swap_in_memory_cache_cb { /* linux/mm/page_io.c */ int sio_pool_init(void); struct swap_iocb; -void swap_read_folio(struct folio *folio, struct swap_iocb **plug); +bool swap_read_folio(struct folio *folio, struct swap_iocb **plug, + struct zswap_decomp_batch *zswap_batch, + struct folio_batch *non_zswap_batch); void __swap_read_unplug(struct swap_iocb *plug); static inline void swap_read_unplug(struct swap_iocb *plug) { @@ -238,8 +261,12 @@ static inline void swap_crypto_acomp_decompress_batch( { } -static inline void swap_read_folio(struct folio *folio, struct swap_iocb **plug) +struct zswap_decomp_batch {}; +static inline bool swap_read_folio(struct folio *folio, struct swap_iocb **plug, + struct zswap_decomp_batch *zswap_batch, + struct folio_batch *non_zswap_batch) { + return false; } static inline void swap_write_unplug(struct swap_iocb *sio) { diff --git a/mm/swap_state.c b/mm/swap_state.c index 0673593d363c..0aa938e4c34d 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -570,7 +570,7 @@ struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, mpol_cond_put(mpol); if (page_allocated) - swap_read_folio(folio, plug); + swap_read_folio(folio, plug, NULL, NULL); return folio; } @@ -687,7 +687,7 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!folio) continue; if (page_allocated) { - swap_read_folio(folio, &splug); + swap_read_folio(folio, &splug, NULL, NULL); if (offset != entry_offset) { folio_set_readahead(folio); count_vm_event(SWAP_RA); @@ -703,7 +703,7 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, &page_allocated, false); if (unlikely(page_allocated)) - swap_read_folio(folio, NULL); + swap_read_folio(folio, NULL, NULL, NULL); return folio; } @@ -1057,7 +1057,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, if (!folio) continue; if (page_allocated) { - swap_read_folio(folio, &splug); + swap_read_folio(folio, &splug, NULL, NULL); if (addr != vmf->address) { folio_set_readahead(folio); count_vm_event(SWAP_RA); @@ -1075,7 +1075,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx, &page_allocated, false); if (unlikely(page_allocated)) - swap_read_folio(folio, NULL); + swap_read_folio(folio, NULL, NULL, NULL); return folio; } diff --git a/mm/zswap.c b/mm/zswap.c index fe7bc2a6672e..1d293f95d525 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -2312,6 +2312,95 @@ bool zswap_load(struct folio *folio) return true; } +/* Code for zswap load batch with batch decompress. */ + +__always_inline void zswap_load_batch_init(struct zswap_decomp_batch *zd_batch) +{ + unsigned int sb; + + folio_batch_init(&zd_batch->fbatch); + + for (sb = 0; sb < MAX_NR_ZSWAP_LOAD_SUB_BATCHES; ++sb) + zd_batch->nr_decomp[sb] = 0; +} + +__always_inline void zswap_load_batch_reinit(struct zswap_decomp_batch *zd_batch) +{ + unsigned int sb; + + folio_batch_reinit(&zd_batch->fbatch); + + for (sb = 0; sb < MAX_NR_ZSWAP_LOAD_SUB_BATCHES; ++sb) + zd_batch->nr_decomp[sb] = 0; +} + +/* + * All folios in zd_batch are allocated into the swapcache + * in swapin_readahead(), before being added to the zd_batch + * for batch decompression. + */ +bool __zswap_add_load_batch(struct zswap_decomp_batch *zd_batch, + struct folio *folio) +{ + swp_entry_t swp = folio->swap; + pgoff_t offset = swp_offset(swp); + bool swapcache = folio_test_swapcache(folio); + struct xarray *tree = swap_zswap_tree(swp); + struct zswap_entry *entry; + unsigned int batch_idx, sb; + + VM_WARN_ON_ONCE(!folio_test_locked(folio)); + + if (zswap_never_enabled()) + return false; + + /* + * Large folios should not be swapped in while zswap is being used, as + * they are not properly handled. Zswap does not properly load large + * folios, and a large folio may only be partially in zswap. + * + * Returning false here will cause the large folio to be added to + * the "non_zswap_batch" in swap_read_folio(), which will eventually + * call zswap_load() if the folio is not in the zeromap. Finally, + * zswap_load() will return true without marking the folio uptodate + * so that an IO error is emitted (e.g. do_swap_page() will sigbus). + */ + if (WARN_ON_ONCE(folio_test_large(folio))) + return false; + + /* + * When reading into the swapcache, invalidate our entry. The + * swapcache can be the authoritative owner of the page and + * its mappings, and the pressure that results from having two + * in-memory copies outweighs any benefits of caching the + * compression work. + */ + if (swapcache) + entry = xa_erase(tree, offset); + else + entry = xa_load(tree, offset); + + if (!entry) + return false; + + BUG_ON(!folio_batch_space(&zd_batch->fbatch)); + folio_batch_add(&zd_batch->fbatch, folio); + + batch_idx = folio_batch_count(&zd_batch->fbatch) - 1; + zd_batch->swapcache[batch_idx] = swapcache; + sb = batch_idx / SWAP_CRYPTO_SUB_BATCH_SIZE; + + if (entry->length) { + zd_batch->trees[sb][zd_batch->nr_decomp[sb]] = tree; + zd_batch->entries[sb][zd_batch->nr_decomp[sb]] = entry; + zd_batch->pages[sb][zd_batch->nr_decomp[sb]] = &folio->page; + zd_batch->slens[sb][zd_batch->nr_decomp[sb]] = entry->length; + zd_batch->nr_decomp[sb]++; + } + + return true; +} + void zswap_invalidate(swp_entry_t swp) { pgoff_t offset = swp_offset(swp); From patchwork Fri Oct 18 06:48:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 13841264 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B65B9D3C551 for ; Fri, 18 Oct 2024 06:48:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7A4A6B00B4; Fri, 18 Oct 2024 02:48:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACFC06B00B6; Fri, 18 Oct 2024 02:48:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FEC36B00B5; Fri, 18 Oct 2024 02:48:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 63DA96B00B3 for ; Fri, 18 Oct 2024 02:48:16 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DE38D160584 for ; Fri, 18 Oct 2024 06:48:02 +0000 (UTC) X-FDA: 82685793288.03.27A7B06 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by imf24.hostedemail.com (Postfix) with ESMTP id B6F1C180008 for ; Fri, 18 Oct 2024 06:48:11 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dYo+kkKn; spf=pass (imf24.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729233947; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/NINXGQBvJUBNLK4hx4Evqu0NzhoLpwSPoTRKM8ufRk=; b=3Ww2OKLbohlSq6YEuieVCIBAGAUcXPSqICQGb8zx2jNKgubxGsVUTCGMlxH469jh0Svdhj lfioLubt2M502AyZ/4pVBBCjdKnrc7ZdvrpwYno7IEToEj+aBZzvfZHFoFWRfWC5poX9XL +yzKcvDyE46dyHrD8w9X0v1mdOkDwBI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729233947; a=rsa-sha256; cv=none; b=Cm90Vgr4NGo7FWMfaG4amS3Sj/Q/nnWskrZ2q2a3Zo+YMnTP1b1eEvwhf1NkuHCURX3sIe ZeiuoAQecl0/C/u1IP99sWJm/kRKSofY/feYCnDEc8iObY7fFSgHLLxlGND0fxKW8RY/95 jGz4IX44u2IQwYr2JavdA2/st58LP2U= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dYo+kkKn; spf=pass (imf24.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729234094; x=1760770094; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EVMRjfiuNxd9amTtVRjIPzNA0R9fg2+OAdURffS6kSM=; b=dYo+kkKn3M8MHJ6MvzrEQvUPy/GHEJ6msi158lOoFYJbJrsORfhayuHd s65YcMmEdSPt5xh9k8w1AsTrZaD45+cozQSu/DRlNzHe5U+Tech4321eP INULkOwRRvYQ+60E7cqRCioV2bpW7zS56Y1FgL1fM4UoJ56on4tdQ4aoT 55RFF9oOWX4MAdeUT7d33oLw/FHZkCIlMkyRgkuW/DnINaWd7RkPLwl/B WFu9N/KQbbRrftQyVrXux3TKX9Ynk2BZXza6nZapd+IY3ObRfJ050wxTG gK4Wxn0mWAc+b9J147X+ahVeVVX6lBcO4etrG1lj6MpMUnTbAI3Om/kPP A==; X-CSE-ConnectionGUID: drZX2FK0QuGkokVud0S/Sg== X-CSE-MsgGUID: g5anUPF4SXq7B4zfsNY83A== X-IronPort-AV: E=McAfee;i="6700,10204,11228"; a="28963364" X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="28963364" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Oct 2024 23:48:09 -0700 X-CSE-ConnectionGUID: bK28v/+wSFW12salxHZpOw== X-CSE-MsgGUID: 2L41HZYESMSBem9+UQfdBw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="82744521" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.6]) by fmviesa003.fm.intel.com with ESMTP; 17 Oct 2024 23:48:08 -0700 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, bfoster@redhat.com, dchinner@redhat.com, chrisl@kernel.org, david@redhat.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [RFC PATCH v1 5/7] mm: swap, zswap: zswap folio_batch processing with IAA decompression batching. Date: Thu, 17 Oct 2024 23:48:03 -0700 Message-Id: <20241018064805.336490-6-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> References: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> MIME-Version: 1.0 X-Stat-Signature: w97ogr4e6r8o4e53w14wtcutf7ttfpdx X-Rspamd-Queue-Id: B6F1C180008 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729234091-870487 X-HE-Meta: U2FsdGVkX1/olRTK1/WvrOB7r2XFuLiEcXDFHjoSwbn80YuX5sphDxtD6WbGO/H1qqB2yAgPeFxFwEVPLrURaFJQApW3Su6q4R8NoQ09e4kx8s1NsyF9SvMRsX1w90o/69cj0O8r8eABc7m1sZe8JMq6iEvRT/EUtIASyhVBhOmn07/+/npgVqAv/Pih53lrQAh3dDFiJUKgX16/keD9YQJAAMUMzDVer8/wixv9ZlKoZJjPPd5vTOT7ygtXMumVH2NtY6HsjTcq+fS12l8/XNLKhaPwquAYXpbLv6IDTYoT/pyh6oaducP9FO6InYf3niJGI2MpTQcIxAlR2Q2O9esNFLuVymIpamvAnffDCx8FHDgTLwPGZvT+S84rZfDEApB1OatfP7xgMbk4oPe0QBekiwKoJgRemhucqOZ9Q+yV1G+E2v2v2RO89eaMA0CJEbCsJntfi8oq3FY5RhXiFIuNiWA4mILYoPuUJGxtdv7QIDM4eX5DW98leGsYpZeG41tTdRrLFImipP8TbpYbw4A2EFnDbc1t6EQR+05wyeQ5jp0oeFGnR29Rw9tRqCxO8VuGyU3QXv9btjzyru/LDXSZiwqm3S+KE8NkQjgCJgPjn39BLwnOx7HbnJCYTiDMDH9gogdet08XAKeMt8SASYTTti+IDmLvKpjHjfFvsXMUnYuOxCQv3JM3LhogALnzhZG3ATV8FthuoLHtZd7/9f4ZsUAzBqvxLDXLkw+s//wrsrxSiE36thb3zZ2R2rl0IIugD2WEJkppFYwnzsMPM6n+MWxYeKxb9q1sCTeXhBDlFLlqpPv4AbP2Ug+NKBBkccsAwdhXLVpzCAyQ5Topo8uHIIEyXtOT0dZjK/r0wpZPtV2kfcjB1AHuQYlkreU/fAXRpTJaj38CxQ2VFjOIChP+3yShSJFoHuP80IOymcmeO2a5AwBxVnFLibOiTqNVA60UJrWEUsi4V8xa2W0 Cag4/9Gn alysb8JrRi5JYvPlP3eQ5F8dDIJo+1ydMndm9y+zr7WAwnLvip/qq795BYlqSStMPdJQBMM/+3H7ulqxjGAZinGwoMeRmCihmyvjg8R/6rsOuf2BdTVaDEomktfoQKwS3Uz7UoMzahAfy8oPhm5ioN3oPy5gig8cl0HKMKvstfeC8j6CjuqQoFelfsTTONxf7bXFqjFetEKbqPCceaAFXQPbi9LyUz6wzTnD9TJf8+rXw2Rs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch provides the functionality that processes a "zswap_batch" in which swap_read_folio() had previously stored swap entries found in zswap, for batched loading. The newly added zswap_finish_load_batch() API implements the main zswap load batching functionality. This makes use of the sub-batches of zswap_entry/xarray/page/source-length readily available from zswap_add_load_batch(). These sub-batch arrays are processed one at a time, until the entire zswap folio_batch has been loaded. The existing zswap_load() functionality of deleting zswap_entries for folios found in the swapcache, is preserved. Signed-off-by: Kanchana P Sridhar --- include/linux/zswap.h | 22 ++++++ mm/page_io.c | 35 +++++++++ mm/swap.h | 17 +++++ mm/zswap.c | 171 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 245 insertions(+) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 1d6de281f243..a0792c2b300a 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -110,6 +110,15 @@ struct zswap_store_pipeline_state { u8 nr_comp_pages; }; +/* Note: If SWAP_CRYPTO_SUB_BATCH_SIZE exceeds 256, change the u8 to u16. */ +struct zswap_load_sub_batch_state { + struct xarray **trees; + struct zswap_entry **entries; + struct page **pages; + unsigned int *slens; + u8 nr_decomp; +}; + bool zswap_store_batching_enabled(void); void __zswap_store_batch(struct swap_in_memory_cache_cb *simc); void __zswap_store_batch_single(struct swap_in_memory_cache_cb *simc); @@ -136,6 +145,14 @@ static inline bool zswap_add_load_batch( return false; } +void __zswap_finish_load_batch(struct zswap_decomp_batch *zd_batch); +static inline void zswap_finish_load_batch( + struct zswap_decomp_batch *zd_batch) +{ + if (zswap_load_batching_enabled()) + __zswap_finish_load_batch(zd_batch); +} + unsigned long zswap_total_pages(void); bool zswap_store(struct folio *folio); bool zswap_load(struct folio *folio); @@ -188,6 +205,11 @@ static inline bool zswap_add_load_batch( return false; } +static inline void zswap_finish_load_batch( + struct zswap_decomp_batch *zd_batch) +{ +} + static inline bool zswap_store(struct folio *folio) { return false; diff --git a/mm/page_io.c b/mm/page_io.c index 9750302d193b..aa83221318ef 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -816,6 +816,41 @@ bool swap_read_folio(struct folio *folio, struct swap_iocb **plug, return true; } +static void __swap_post_process_zswap_load_batch( + struct zswap_decomp_batch *zswap_batch) +{ + u8 i; + + for (i = 0; i < folio_batch_count(&zswap_batch->fbatch); ++i) { + struct folio *folio = zswap_batch->fbatch.folios[i]; + folio_unlock(folio); + } +} + +/* + * The swapin_readahead batching interface makes sure that the + * input zswap_batch consists of folios belonging to the same swap + * device type. + */ +void __swap_read_zswap_batch_unplug(struct zswap_decomp_batch *zswap_batch, + struct swap_iocb **splug) +{ + unsigned long pflags; + + if (!folio_batch_count(&zswap_batch->fbatch)) + return; + + psi_memstall_enter(&pflags); + delayacct_swapin_start(); + + /* Load the zswap batch. */ + zswap_finish_load_batch(zswap_batch); + __swap_post_process_zswap_load_batch(zswap_batch); + + psi_memstall_leave(&pflags); + delayacct_swapin_end(); +} + void __swap_read_unplug(struct swap_iocb *sio) { struct iov_iter from; diff --git a/mm/swap.h b/mm/swap.h index 310f99007fe6..2b82c8ed765c 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -125,6 +125,16 @@ struct swap_iocb; bool swap_read_folio(struct folio *folio, struct swap_iocb **plug, struct zswap_decomp_batch *zswap_batch, struct folio_batch *non_zswap_batch); +void __swap_read_zswap_batch_unplug( + struct zswap_decomp_batch *zswap_batch, + struct swap_iocb **splug); +static inline void swap_read_zswap_batch_unplug( + struct zswap_decomp_batch *zswap_batch, + struct swap_iocb **splug) +{ + if (likely(zswap_batch)) + __swap_read_zswap_batch_unplug(zswap_batch, splug); +} void __swap_read_unplug(struct swap_iocb *plug); static inline void swap_read_unplug(struct swap_iocb *plug) { @@ -268,6 +278,13 @@ static inline bool swap_read_folio(struct folio *folio, struct swap_iocb **plug, { return false; } + +static inline void swap_read_zswap_batch_unplug( + struct zswap_decomp_batch *zswap_batch, + struct swap_iocb **splug) +{ +} + static inline void swap_write_unplug(struct swap_iocb *sio) { } diff --git a/mm/zswap.c b/mm/zswap.c index 1d293f95d525..39bf7d8810e9 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -35,6 +35,7 @@ #include #include #include +#include #include "swap.h" #include "internal.h" @@ -2401,6 +2402,176 @@ bool __zswap_add_load_batch(struct zswap_decomp_batch *zd_batch, return true; } +static __always_inline void zswap_load_sub_batch_init( + struct zswap_decomp_batch *zd_batch, + unsigned int sb, + struct zswap_load_sub_batch_state *zls) +{ + zls->trees = zd_batch->trees[sb]; + zls->entries = zd_batch->entries[sb]; + zls->pages = zd_batch->pages[sb]; + zls->slens = zd_batch->slens[sb]; + zls->nr_decomp = zd_batch->nr_decomp[sb]; +} + +static void zswap_load_map_sources( + struct zswap_load_sub_batch_state *zls, + u8 *srcs[]) +{ + u8 i; + + for (i = 0; i < zls->nr_decomp; ++i) { + struct zswap_entry *entry = zls->entries[i]; + struct zpool *zpool = entry->pool->zpool; + u8 *buf = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); + memcpy(srcs[i], buf, entry->length); + zpool_unmap_handle(zpool, entry->handle); + } +} + +static void zswap_decompress_batch( + struct zswap_load_sub_batch_state *zls, + u8 *srcs[], + int decomp_errors[]) +{ + struct crypto_acomp_ctx *acomp_ctx; + + acomp_ctx = raw_cpu_ptr(zls->entries[0]->pool->acomp_ctx); + + swap_crypto_acomp_decompress_batch( + srcs, + zls->pages, + zls->slens, + decomp_errors, + zls->nr_decomp, + acomp_ctx); +} + +static void zswap_load_batch_updates( + struct zswap_decomp_batch *zd_batch, + unsigned int sb, + struct zswap_load_sub_batch_state *zls, + int decomp_errors[]) +{ + unsigned int j; + u8 i; + + for (i = 0; i < zls->nr_decomp; ++i) { + j = (sb * SWAP_CRYPTO_SUB_BATCH_SIZE) + i; + struct folio *folio = zd_batch->fbatch.folios[j]; + struct zswap_entry *entry = zls->entries[i]; + + BUG_ON(decomp_errors[i]); + count_vm_event(ZSWPIN); + if (entry->objcg) + count_objcg_events(entry->objcg, ZSWPIN, 1); + + if (zd_batch->swapcache[j]) { + zswap_entry_free(entry); + folio_mark_dirty(folio); + } + + folio_mark_uptodate(folio); + } +} + +static void zswap_load_decomp_batch( + struct zswap_decomp_batch *zd_batch, + unsigned int sb, + struct zswap_load_sub_batch_state *zls) +{ + int decomp_errors[SWAP_CRYPTO_SUB_BATCH_SIZE]; + struct crypto_acomp_ctx *acomp_ctx; + + acomp_ctx = raw_cpu_ptr(zls->entries[0]->pool->acomp_ctx); + mutex_lock(&acomp_ctx->mutex); + + zswap_load_map_sources(zls, acomp_ctx->buffer); + + zswap_decompress_batch(zls, acomp_ctx->buffer, decomp_errors); + + mutex_unlock(&acomp_ctx->mutex); + + zswap_load_batch_updates(zd_batch, sb, zls, decomp_errors); +} + +static void zswap_load_start_accounting( + struct zswap_decomp_batch *zd_batch, + unsigned int sb, + struct zswap_load_sub_batch_state *zls, + bool workingset[], + bool in_thrashing[]) +{ + unsigned int j; + u8 i; + + for (i = 0; i < zls->nr_decomp; ++i) { + j = (sb * SWAP_CRYPTO_SUB_BATCH_SIZE) + i; + struct folio *folio = zd_batch->fbatch.folios[j]; + workingset[i] = folio_test_workingset(folio); + if (workingset[i]) + delayacct_thrashing_start(&in_thrashing[i]); + } +} + +static void zswap_load_end_accounting( + struct zswap_decomp_batch *zd_batch, + struct zswap_load_sub_batch_state *zls, + bool workingset[], + bool in_thrashing[]) +{ + u8 i; + + for (i = 0; i < zls->nr_decomp; ++i) + if (workingset[i]) + delayacct_thrashing_end(&in_thrashing[i]); +} + +/* + * All entries in a zd_batch belong to the same swap device. + */ +void __zswap_finish_load_batch(struct zswap_decomp_batch *zd_batch) +{ + struct zswap_load_sub_batch_state zls; + unsigned int nr_folios = folio_batch_count(&zd_batch->fbatch); + unsigned int nr_sb = DIV_ROUND_UP(nr_folios, SWAP_CRYPTO_SUB_BATCH_SIZE); + unsigned int sb; + + /* + * Process the zd_batch in sub-batches of + * SWAP_CRYPTO_SUB_BATCH_SIZE. + */ + for (sb = 0; sb < nr_sb; ++sb) { + bool workingset[SWAP_CRYPTO_SUB_BATCH_SIZE]; + bool in_thrashing[SWAP_CRYPTO_SUB_BATCH_SIZE]; + + zswap_load_sub_batch_init(zd_batch, sb, &zls); + + zswap_load_start_accounting(zd_batch, sb, &zls, + workingset, in_thrashing); + + /* Decompress the batch. */ + if (zls.nr_decomp) + zswap_load_decomp_batch(zd_batch, sb, &zls); + + /* + * Should we free zswap_entries, as in zswap_load(): + * With the new swapin_readahead batching interface, + * all prefetch entries are read into the swapcache. + * Freeing the zswap entries here causes segfaults, + * most probably because a page-fault occured while + * the buffer was being decompressed. + * Allowing the regular folio_free_swap() sequence + * in do_swap_page() appears to keep things stable + * without duplicated zswap-swapcache memory, as far + * as I can tell from my testing. + */ + + zswap_load_end_accounting(zd_batch, &zls, + workingset, in_thrashing); + } +} + void zswap_invalidate(swp_entry_t swp) { pgoff_t offset = swp_offset(swp); From patchwork Fri Oct 18 06:48:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 13841265 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA1C9D3C551 for ; Fri, 18 Oct 2024 06:48:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0EB456B00B3; Fri, 18 Oct 2024 02:48:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EF3876B00B5; Fri, 18 Oct 2024 02:48:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D70DD6B00B7; Fri, 18 Oct 2024 02:48:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9925E6B00B3 for ; Fri, 18 Oct 2024 02:48:16 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B8C6AA05F0 for ; Fri, 18 Oct 2024 06:47:54 +0000 (UTC) X-FDA: 82685793456.26.8E40131 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by imf09.hostedemail.com (Postfix) with ESMTP id 0D29E14000B for ; Fri, 18 Oct 2024 06:48:05 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gydkjxnj; spf=pass (imf09.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729233900; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O4FtJuDSldNkCMPp6Ms6Z4bQNZjd3qldBPFLZh97Bx8=; b=JySvcrwJl3Yw5otsI6PGdDyEK/PlvBOTNfpCM9okUZCnhHp0NU7NpVQpqWnXNS51gLAXEM mXgMo3G142Qhm3ie7BagaFGhO9G27BCa3Jt2C4PXI24fjT0GjJhIIVCXUTc0XYVCasOQqo aJFe+KEV49DmhV/KtpLHCvgWN50rBpk= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gydkjxnj; spf=pass (imf09.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729233900; a=rsa-sha256; cv=none; b=HqyRSpV/lbO1Wn5XG1Ik5/zvLZm0C7bFAc3TRP6qxGPkfO7GVyGpeNuIEXli6nFKW4t92E zmje4ZFIXl5J68l5vVC/uc69a0A/5VYbZZxr3hVc0+Dw3yQwsNlVqk/fBYsjCRAZTblObC aN0wzEiNtTV/YeJ0VJ8LyX/y3njd5vw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729234094; x=1760770094; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NVsoZ0G+h+yCIJer8I6ApzubS8LAuLTzsZjgRgzHJ0A=; b=gydkjxnjZoM6cztnQuYxrpMQbREo5Uwc5pcx0eoAt2e1MZDCrNkworSC j8eCsHbBxLdOIsy+VqyE3lFl49V/5ysGwLlofe8An5Ia6yd9Teo7rqgbs D1w8mVB5LTH+oiQ6BNsliARt0Hb0nWwc9AplsVKbe07FV3Fm5mktzag6L dRPXAiLR+yBKbl3N2G7xldTkmZtbxJMnnwYYwriubS+tU8M2Evxpjp7BI ycgPmqx050hYjB4eJm1Ou5urduGNwliBZtayBL18ZLzeqh29D+aAoq7+7 dupNQmhji7N2/YYSpSBGx6drXQj/7AxNLfNCFEaSmA3AJyyE7Wd/X6i2z A==; X-CSE-ConnectionGUID: rEKBjC69R7el1i75BDAxXw== X-CSE-MsgGUID: DgDjXXvoQRuysIz07ip9Gw== X-IronPort-AV: E=McAfee;i="6700,10204,11228"; a="28963376" X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="28963376" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Oct 2024 23:48:10 -0700 X-CSE-ConnectionGUID: 9smf1lgiQ5GrHRanDYY+1g== X-CSE-MsgGUID: vZzx9IQOSxa3k9xr46CDBg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="82744530" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.6]) by fmviesa003.fm.intel.com with ESMTP; 17 Oct 2024 23:48:09 -0700 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, bfoster@redhat.com, dchinner@redhat.com, chrisl@kernel.org, david@redhat.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [RFC PATCH v1 6/7] mm: do_swap_page() calls swapin_readahead() zswap load batching interface. Date: Thu, 17 Oct 2024 23:48:04 -0700 Message-Id: <20241018064805.336490-7-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> References: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0D29E14000B X-Stat-Signature: joojcwdgmrjsr5i9rsdnokrwgd6wuapq X-Rspam-User: X-HE-Tag: 1729234085-633320 X-HE-Meta: U2FsdGVkX18H89MxM4Q0ht2xyIAVNUM5z2uUcCRKNHQnpU/g9eGbcdNnw4274y93GZHzbWqzCGFCATwVGtWW9xe58Yl0/hC3UWSTcI05ffGgmQAzz1fCEs7/OODuxu52tvAIYX1hVfrD2Te2nQLz+xRE8TIA+cBFA22C9cRFTewfS6BpI2CGT8DjvlugnaXKLRDX3EWOg3jVNXzXWwNVWARbzLeXY7J0xIEMovkhrYn7n7RkzqsHxuYCsY2rHL5ew3sDQ6Eob1nOaBwnEuc3L+PJyZmIYrCARBAB9RIdzdyFRQHH+1OBZLUem0QrO9iWlZYAWshkFfr6Y9M8cTSP6Lsjk7jqZONTEHIMgv/20+oAq7AdWu7mqnmJIXkjFW4pZviJsaq5APg55JT8w5pJGJZrWPgzM+KyvHSyn7BWu0DZJrPsolkWfcqa5WNmL1D17XsLzhM9Q9nlPxH+dvjbEQVt4WMh3hAWJl7NQ7VJCCyo597BdEkQzwqr+A427/me8MUKHh8mJKPkop08fY8WY0eZuKZB0vqhxp4WZ5alHrSRqVtw/oNxpwbQgtqZBD35hQLGNWnJTYz2oV9vUu8oyhDogrVvLJTFI2uRnCuF9+ZvEcT/+zkc6USVmHhyYLEszZr3xc7ezTwRRTiI4JBR5+2bwOqJ+Ba/7oen7b47K05wGwJy/MthKwh6UEh6F6d6iDHlDk9o38BfLa3BODoZj91swXipzKrmlUd1o59Uv5G+EqxyRI1V0ZycPLZXDbmlNYrkHwgbKYrzro9fzYc/05poHKcFQVMabqZxQjVIlyvuCjisrW5coQlClMiov9mYdSEzbYUgB2p4tAyIctQGvQVgA1XnSmo37Punhh19jeoPk93awgA2JkuT3cZCONA0Mm0jC3m1afFwA+hgEvnMmvmJmJhF7OiN/jKpobug1/Y5Pe86Wscharm+ZKyOyAT2yodloEDzLmTibIlK4qJ lsi9f4NH JGbJEiEtrPR9x8yna7k1Re8viOLmCP9rawP29qwxEgbL55roMYgT7euuQ6FFnMlD/B+pfoMhlHZAczEuYYtiTq6iR5Hb1CViyIbBKgQwtbqCovB5JGRcLPU9RrsUA0XcoPbA/vcHtJIFFQ0w7rPfQkJtNjT0d4vMsWfkRPcHgiEGRkngmMLYukR4ZsSoXGU8oygLmtGOXK3NxKhLGKE8etaE91MXXoMzHboqR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch invokes the swapin_readahead() based batching interface to prefetch a batch of 4K folios for zswap load with batch decompressions in parallel using IAA hardware. swapin_readahead() prefetches folios based on vm.page-cluster and the usefulness of prior prefetches to the workload. As folios are created in the swapcache and the readahead code calls swap_read_folio() with a "zswap_batch" and a "non_zswap_batch", the respective folio_batches get populated with the folios to be read. Finally, the swapin_readahead() procedures will call the newly added process_ra_batch_of_same_type() which: 1) Reads all the non_zswap_batch folios sequentially by calling swap_read_folio(). 2) Calls swap_read_zswap_batch_unplug() with the zswap_batch which calls zswap_finish_load_batch() that finally decompresses each SWAP_CRYPTO_SUB_BATCH_SIZE sub-batch (i.e. upto 8 pages in a prefetch batch of say, 32 folios) in parallel with IAA. Within do_swap_page(), we try to benefit from batch decompressions in both these scenarios: 1) single-mapped, SWP_SYNCHRONOUS_IO: We call swapin_readahead() with "single_mapped_path = true". This is done only in the !zswap_never_enabled() case. 2) Shared and/or non-SWP_SYNCHRONOUS_IO folios: We call swapin_readahead() with "single_mapped_path = false". This will place folios in the swapcache: a design choice that handles cases where a folio that is "single-mapped" in process 1 could be prefetched in process 2; and handles highly contended server scenarios with stability. There are checks added at the end of do_swap_page(), after the folio has been successfully loaded, to detect if the single-mapped swapcache folio is still single-mapped, and if so, folio_free_swap() is called on the folio. Within the swapin_readahead() functions, if single_mapped_path is true, and either the platform does not have IAA, or, if the platform has IAA and the user selects a software compressor for zswap (details of sysfs knob follow), readahead/batching are skipped and the folio is loaded using zswap_load(). A new swap parameter "singlemapped_ra_enabled" (false by default) is added for platforms that have IAA, zswap_load_batching_enabled() is true, and we want to give the user the option to run experiments with IAA and with software compressors for zswap (swap device is SWP_SYNCHRONOUS_IO): For IAA: echo true > /sys/kernel/mm/swap/singlemapped_ra_enabled For software compressors: echo false > /sys/kernel/mm/swap/singlemapped_ra_enabled If "singlemapped_ra_enabled" is set to false, swapin_readahead() will skip prefetching folios in the "single-mapped SWP_SYNCHRONOUS_IO" do_swap_page() path. Thanks Ying Huang for the really helpful brainstorming discussions on the swap_read_folio() plug design. Suggested-by: Ying Huang Signed-off-by: Kanchana P Sridhar --- mm/memory.c | 187 +++++++++++++++++++++++++++++++++++++----------- mm/shmem.c | 2 +- mm/swap.h | 12 ++-- mm/swap_state.c | 157 ++++++++++++++++++++++++++++++++++++---- mm/swapfile.c | 2 +- 5 files changed, 299 insertions(+), 61 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index b5745b9ffdf7..9655b85fc243 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3924,6 +3924,42 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) return 0; } +/* + * swapin readahead based batching interface for zswap batched loads using IAA: + * + * Should only be called for and if the faulting swap entry in do_swap_page + * is single-mapped and SWP_SYNCHRONOUS_IO. + * + * Detect if the folio is in the swapcache, is still mapped to only this + * process, and further, there are no additional references to this folio + * (for e.g. if another process simultaneously readahead this swap entry + * while this process was handling the page-fault, and got a pointer to the + * folio allocated by this process in the swapcache), besides the references + * that were obtained within __read_swap_cache_async() by this process that is + * faulting in this single-mapped swap entry. + */ +static inline bool should_free_singlemap_swapcache(swp_entry_t entry, + struct folio *folio) +{ + if (!folio_test_swapcache(folio)) + return false; + + if (__swap_count(entry) != 0) + return false; + + /* + * The folio ref count for a single-mapped folio that was allocated + * in __read_swap_cache_async(), can be a maximum of 3. These are the + * incrementors of the folio ref count in __read_swap_cache_async(): + * folio_alloc_mpol(), add_to_swap_cache(), folio_add_lru(). + */ + + if (folio_ref_count(folio) <= 3) + return true; + + return false; +} + static inline bool should_try_to_free_swap(struct folio *folio, struct vm_area_struct *vma, unsigned int fault_flags) @@ -4215,6 +4251,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) swp_entry_t entry; pte_t pte; vm_fault_t ret = 0; + bool single_mapped_swapcache = false; void *shadow = NULL; int nr_pages; unsigned long page_idx; @@ -4283,51 +4320,90 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { - /* skip swapcache */ - folio = alloc_swap_folio(vmf); - if (folio) { - __folio_set_locked(folio); - __folio_set_swapbacked(folio); - - nr_pages = folio_nr_pages(folio); - if (folio_test_large(folio)) - entry.val = ALIGN_DOWN(entry.val, nr_pages); - /* - * Prevent parallel swapin from proceeding with - * the cache flag. Otherwise, another thread - * may finish swapin first, free the entry, and - * swapout reusing the same entry. It's - * undetectable as pte_same() returns true due - * to entry reuse. - */ - if (swapcache_prepare(entry, nr_pages)) { + if (zswap_never_enabled()) { + /* skip swapcache */ + folio = alloc_swap_folio(vmf); + if (folio) { + __folio_set_locked(folio); + __folio_set_swapbacked(folio); + + nr_pages = folio_nr_pages(folio); + if (folio_test_large(folio)) + entry.val = ALIGN_DOWN(entry.val, nr_pages); /* - * Relax a bit to prevent rapid - * repeated page faults. + * Prevent parallel swapin from proceeding with + * the cache flag. Otherwise, another thread + * may finish swapin first, free the entry, and + * swapout reusing the same entry. It's + * undetectable as pte_same() returns true due + * to entry reuse. */ - add_wait_queue(&swapcache_wq, &wait); - schedule_timeout_uninterruptible(1); - remove_wait_queue(&swapcache_wq, &wait); - goto out_page; + if (swapcache_prepare(entry, nr_pages)) { + /* + * Relax a bit to prevent rapid + * repeated page faults. + */ + add_wait_queue(&swapcache_wq, &wait); + schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); + goto out_page; + } + need_clear_cache = true; + + mem_cgroup_swapin_uncharge_swap(entry, nr_pages); + + shadow = get_shadow_from_swap_cache(entry); + if (shadow) + workingset_refault(folio, shadow); + + folio_add_lru(folio); + + /* To provide entry to swap_read_folio() */ + folio->swap = entry; + swap_read_folio(folio, NULL, NULL, NULL); + folio->private = NULL; + } + } else { + /* + * zswap is enabled or was enabled at some point. + * Don't skip swapcache. + * + * swapin readahead based batching interface + * for zswap batched loads using IAA: + * + * Readahead is invoked in this path only if + * the sys swap "singlemapped_ra_enabled" swap + * parameter is set to true. By default, + * "singlemapped_ra_enabled" is set to false, + * the recommended setting for software compressors. + * For IAA, if "singlemapped_ra_enabled" is set + * to true, readahead will be deployed in this path + * as well. + * + * For single-mapped pages, the batching interface + * calls __read_swap_cache_async() to allocate and + * place the faulting page in the swapcache. This is + * to handle a scenario where the faulting page in + * this process happens to simultaneously be a + * readahead page in another process. By placing the + * single-mapped faulting page in the swapcache, + * we avoid race conditions and duplicate page + * allocations under these scenarios. + */ + folio = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, + vmf, true); + if (!folio) { + ret = VM_FAULT_OOM; + goto out; } - need_clear_cache = true; - - mem_cgroup_swapin_uncharge_swap(entry, nr_pages); - - shadow = get_shadow_from_swap_cache(entry); - if (shadow) - workingset_refault(folio, shadow); - - folio_add_lru(folio); - /* To provide entry to swap_read_folio() */ - folio->swap = entry; - swap_read_folio(folio, NULL, NULL, NULL); - folio->private = NULL; - } + single_mapped_swapcache = true; + nr_pages = folio_nr_pages(folio); + swapcache = folio; + } /* swapin with zswap support. */ } else { folio = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, - vmf); + vmf, false); swapcache = folio; } @@ -4528,8 +4604,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * yet. */ swap_free_nr(entry, nr_pages); - if (should_try_to_free_swap(folio, vma, vmf->flags)) + if (should_try_to_free_swap(folio, vma, vmf->flags)) { folio_free_swap(folio); + single_mapped_swapcache = false; + } add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages); @@ -4619,6 +4697,30 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (waitqueue_active(&swapcache_wq)) wake_up(&swapcache_wq); } + + /* + * swapin readahead based batching interface + * for zswap batched loads using IAA: + * + * Don't skip swapcache strategy for single-mapped + * pages: As described above, we place the + * single-mapped faulting page in the swapcache, + * to avoid race conditions and duplicate page + * allocations between process 1 handling a + * page-fault for a single-mapped page, while + * simultaneously, the same swap entry is a + * readahead prefetch page in another process 2. + * + * One side-effect of this, is that if the race did + * not occur, we need to clean up the swapcache + * entry and free the zswap entry for the faulting + * page, iff it is still single-mapped and is + * exclusive to this process. + */ + if (single_mapped_swapcache && + data_race(should_free_singlemap_swapcache(entry, folio))) + folio_free_swap(folio); + if (si) put_swap_device(si); return ret; @@ -4638,6 +4740,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (waitqueue_active(&swapcache_wq)) wake_up(&swapcache_wq); } + + if (single_mapped_swapcache && + data_race(should_free_singlemap_swapcache(entry, folio))) + folio_free_swap(folio); + if (si) put_swap_device(si); return ret; diff --git a/mm/shmem.c b/mm/shmem.c index 66eae800ffab..e4549c04f316 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1624,7 +1624,7 @@ static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp, struct folio *folio; mpol = shmem_get_pgoff_policy(info, index, 0, &ilx); - folio = swap_cluster_readahead(swap, gfp, mpol, ilx); + folio = swap_cluster_readahead(swap, gfp, mpol, ilx, false); mpol_cond_put(mpol); return folio; diff --git a/mm/swap.h b/mm/swap.h index 2b82c8ed765c..2861bd8f5a96 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -199,9 +199,11 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_flags, struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated, bool skip_if_exists); struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, - struct mempolicy *mpol, pgoff_t ilx); + struct mempolicy *mpol, pgoff_t ilx, + bool single_mapped_path); struct folio *swapin_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); + struct vm_fault *vmf, + bool single_mapped_path); static inline unsigned int folio_swap_flags(struct folio *folio) { @@ -304,13 +306,15 @@ static inline void show_swap_cache_info(void) } static inline struct folio *swap_cluster_readahead(swp_entry_t entry, - gfp_t gfp_mask, struct mempolicy *mpol, pgoff_t ilx) + gfp_t gfp_mask, struct mempolicy *mpol, pgoff_t ilx, + bool single_mapped_path) { return NULL; } static inline struct folio *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, - struct vm_fault *vmf) + struct vm_fault *vmf, + bool single_mapped_path) { return NULL; } diff --git a/mm/swap_state.c b/mm/swap_state.c index 0aa938e4c34d..66ea8f7f724c 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -44,6 +44,12 @@ struct address_space *swapper_spaces[MAX_SWAPFILES] __read_mostly; static unsigned int nr_swapper_spaces[MAX_SWAPFILES] __read_mostly; static bool enable_vma_readahead __read_mostly = true; +/* + * Enable readahead in single-mapped do_swap_page() path. + * Set to "true" for IAA. + */ +static bool enable_singlemapped_readahead __read_mostly = false; + #define SWAP_RA_WIN_SHIFT (PAGE_SHIFT / 2) #define SWAP_RA_HITS_MASK ((1UL << SWAP_RA_WIN_SHIFT) - 1) #define SWAP_RA_HITS_MAX SWAP_RA_HITS_MASK @@ -340,6 +346,11 @@ static inline bool swap_use_vma_readahead(void) return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap); } +static inline bool swap_use_singlemapped_readahead(void) +{ + return READ_ONCE(enable_singlemapped_readahead); +} + /* * Lookup a swap entry in the swap cache. A found folio will be returned * unlocked and with its refcount incremented - we rely on the kernel @@ -635,12 +646,49 @@ static unsigned long swapin_nr_pages(unsigned long offset) return pages; } +static void process_ra_batch_of_same_type( + struct zswap_decomp_batch *zswap_batch, + struct folio_batch *non_zswap_batch, + swp_entry_t targ_entry, + struct swap_iocb **splug) +{ + unsigned int i; + + for (i = 0; i < folio_batch_count(non_zswap_batch); ++i) { + struct folio *folio = non_zswap_batch->folios[i]; + swap_read_folio(folio, splug, NULL, NULL); + if (folio->swap.val != targ_entry.val) { + folio_set_readahead(folio); + count_vm_event(SWAP_RA); + } + folio_put(folio); + } + + swap_read_zswap_batch_unplug(zswap_batch, splug); + + for (i = 0; i < folio_batch_count(&zswap_batch->fbatch); ++i) { + struct folio *folio = zswap_batch->fbatch.folios[i]; + if (folio->swap.val != targ_entry.val) { + folio_set_readahead(folio); + count_vm_event(SWAP_RA); + } + folio_put(folio); + } + + folio_batch_reinit(non_zswap_batch); + + zswap_load_batch_reinit(zswap_batch); +} + /** * swap_cluster_readahead - swap in pages in hope we need them soon * @entry: swap entry of this memory * @gfp_mask: memory allocation flags * @mpol: NUMA memory allocation policy to be applied * @ilx: NUMA interleave index, for use only when MPOL_INTERLEAVE + * @single_mapped_path: Called from do_swap_page() single-mapped path. + * Only readahead if the sys "singlemapped_ra_enabled" swap parameter + * is set to true. * * Returns the struct folio for entry and addr, after queueing swapin. * @@ -654,7 +702,8 @@ static unsigned long swapin_nr_pages(unsigned long offset) * are fairly likely to have been swapped out from the same node. */ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, - struct mempolicy *mpol, pgoff_t ilx) + struct mempolicy *mpol, pgoff_t ilx, + bool single_mapped_path) { struct folio *folio; unsigned long entry_offset = swp_offset(entry); @@ -664,12 +713,22 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, struct swap_info_struct *si = swp_swap_info(entry); struct blk_plug plug; struct swap_iocb *splug = NULL; + struct zswap_decomp_batch zswap_batch; + struct folio_batch non_zswap_batch; bool page_allocated; + if (single_mapped_path && + (!swap_use_singlemapped_readahead() || + !zswap_load_batching_enabled())) + goto skip; + mask = swapin_nr_pages(offset) - 1; if (!mask) goto skip; + zswap_load_batch_init(&zswap_batch); + folio_batch_init(&non_zswap_batch); + /* Read a page_cluster sized and aligned cluster around offset. */ start_offset = offset & ~mask; end_offset = offset | mask; @@ -678,6 +737,7 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (end_offset >= si->max) end_offset = si->max - 1; + /* Note that all swap entries readahead are of the same swap type. */ blk_start_plug(&plug); for (offset = start_offset; offset <= end_offset ; offset++) { /* Ok, do the async read-ahead now */ @@ -687,14 +747,22 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!folio) continue; if (page_allocated) { - swap_read_folio(folio, &splug, NULL, NULL); - if (offset != entry_offset) { - folio_set_readahead(folio); - count_vm_event(SWAP_RA); + if (swap_read_folio(folio, &splug, + &zswap_batch, &non_zswap_batch)) { + if (offset != entry_offset) { + folio_set_readahead(folio); + count_vm_event(SWAP_RA); + } + folio_put(folio); } + } else { + folio_put(folio); } - folio_put(folio); } + + process_ra_batch_of_same_type(&zswap_batch, &non_zswap_batch, + entry, &splug); + blk_finish_plug(&plug); swap_read_unplug(splug); lru_add_drain(); /* Push any new pages onto the LRU now */ @@ -1009,6 +1077,9 @@ static int swap_vma_ra_win(struct vm_fault *vmf, unsigned long *start, * @mpol: NUMA memory allocation policy to be applied * @targ_ilx: NUMA interleave index, for use only when MPOL_INTERLEAVE * @vmf: fault information + * @single_mapped_path: Called from do_swap_page() single-mapped path. + * Only readahead if the sys "singlemapped_ra_enabled" swap parameter + * is set to true. * * Returns the struct folio for entry and addr, after queueing swapin. * @@ -1019,10 +1090,14 @@ static int swap_vma_ra_win(struct vm_fault *vmf, unsigned long *start, * */ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, - struct mempolicy *mpol, pgoff_t targ_ilx, struct vm_fault *vmf) + struct mempolicy *mpol, pgoff_t targ_ilx, struct vm_fault *vmf, + bool single_mapped_path) { struct blk_plug plug; struct swap_iocb *splug = NULL; + struct zswap_decomp_batch zswap_batch; + struct folio_batch non_zswap_batch; + int type = -1, prev_type = -1; struct folio *folio; pte_t *pte = NULL, pentry; int win; @@ -1031,10 +1106,18 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, pgoff_t ilx; bool page_allocated; + if (single_mapped_path && + (!swap_use_singlemapped_readahead() || + !zswap_load_batching_enabled())) + goto skip; + win = swap_vma_ra_win(vmf, &start, &end); if (win == 1) goto skip; + zswap_load_batch_init(&zswap_batch); + folio_batch_init(&non_zswap_batch); + ilx = targ_ilx - PFN_DOWN(vmf->address - start); blk_start_plug(&plug); @@ -1057,16 +1140,38 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, if (!folio) continue; if (page_allocated) { - swap_read_folio(folio, &splug, NULL, NULL); - if (addr != vmf->address) { - folio_set_readahead(folio); - count_vm_event(SWAP_RA); + type = swp_type(entry); + + /* + * Process this sub-batch before switching to + * another swap device type. + */ + if ((prev_type >= 0) && (type != prev_type)) + process_ra_batch_of_same_type(&zswap_batch, + &non_zswap_batch, + targ_entry, + &splug); + + if (swap_read_folio(folio, &splug, + &zswap_batch, &non_zswap_batch)) { + if (addr != vmf->address) { + folio_set_readahead(folio); + count_vm_event(SWAP_RA); + } + folio_put(folio); } + + prev_type = type; + } else { + folio_put(folio); } - folio_put(folio); } if (pte) pte_unmap(pte); + + process_ra_batch_of_same_type(&zswap_batch, &non_zswap_batch, + targ_entry, &splug); + blk_finish_plug(&plug); swap_read_unplug(splug); lru_add_drain(); @@ -1092,7 +1197,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, * or vma-based(ie, virtual address based on faulty address) readahead. */ struct folio *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf) + struct vm_fault *vmf, bool single_mapped_path) { struct mempolicy *mpol; pgoff_t ilx; @@ -1100,8 +1205,10 @@ struct folio *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx); folio = swap_use_vma_readahead() ? - swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf) : - swap_cluster_readahead(entry, gfp_mask, mpol, ilx); + swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf, + single_mapped_path) : + swap_cluster_readahead(entry, gfp_mask, mpol, ilx, + single_mapped_path); mpol_cond_put(mpol); return folio; @@ -1126,10 +1233,30 @@ static ssize_t vma_ra_enabled_store(struct kobject *kobj, return count; } +static ssize_t singlemapped_ra_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%s\n", + enable_singlemapped_readahead ? "true" : "false"); +} +static ssize_t singlemapped_ra_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret; + + ret = kstrtobool(buf, &enable_singlemapped_readahead); + if (ret) + return ret; + + return count; +} static struct kobj_attribute vma_ra_enabled_attr = __ATTR_RW(vma_ra_enabled); +static struct kobj_attribute singlemapped_ra_enabled_attr = __ATTR_RW(singlemapped_ra_enabled); static struct attribute *swap_attrs[] = { &vma_ra_enabled_attr.attr, + &singlemapped_ra_enabled_attr.attr, NULL, }; diff --git a/mm/swapfile.c b/mm/swapfile.c index b0915f3fab31..10367eaee1ff 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2197,7 +2197,7 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, }; folio = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, - &vmf); + &vmf, false); } if (!folio) { swp_count = READ_ONCE(si->swap_map[offset]); From patchwork Fri Oct 18 06:48:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 13841266 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 382CDD3C550 for ; Fri, 18 Oct 2024 06:48:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 99C986B00B6; Fri, 18 Oct 2024 02:48:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 94C2A6B00B7; Fri, 18 Oct 2024 02:48:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79CB66B00B8; Fri, 18 Oct 2024 02:48:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5294A6B00B6 for ; Fri, 18 Oct 2024 02:48:18 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 01FAE1A05DD for ; Fri, 18 Oct 2024 06:47:56 +0000 (UTC) X-FDA: 82685793540.24.B9FECA2 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by imf22.hostedemail.com (Postfix) with ESMTP id E9D1FC0019 for ; Fri, 18 Oct 2024 06:48:02 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=WYVrONdM; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf22.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729233977; a=rsa-sha256; cv=none; b=QCDv67KTFk65dDE0IkGtItiuN4eeuAW9GcQ1mlx7pv+lGAq3ISii62xdJkHf5kmLeUb46N 1uHz3+rZhOvs85PdhjzKxytRegZU5PcNzDkSQznGDd5cdQxVyzPs2TCBOcGnx2Fo3i7gxS geplW5MzgtWYmlU88sN8ORoayGMplEQ= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=WYVrONdM; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf22.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729233977; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=a+DO2IbSo3aqtpKkTXkM3Xy050C7tBzcVCQxYobierI=; b=WVwZ/5QzSjgxlj0euSn37XiTBg92wdifTwM7RDpBykW8Fq3gElZtlQom6j2rjSIVrHsaFD y0bMknqkyuxDgZPCRYaFiGYhNuYmssm354o2lwHgcNGel9oIXg49zXl7mMKOheIzseLMm3 6WDcbpJc8y/VRPPYhXp1d5vFIa0fhQA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729234096; x=1760770096; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CzMrEaQ3oHH5UP19JYpptn44NLCDei/++V3CWVkaTRg=; b=WYVrONdMDbi1iGY+8EIEdWq2QHEAQ8F0PUajhgMSkilz97vHUKg/ZJ+3 XB6D3l01WTdTjc5/WbTOvp//hjSxpFIAbxmrMHxgrVMb1zmwCQ7iPdCAB FXoxW659V+fVjGfXgehVFxXCQZAxIdnvvC3mb4dCGnujMuXTjgkGHKUoj v2obg4IzgnpFIW4F3jXu75IHePphCn03D9C6q+T34hhytznTK5608QqF2 j6/IaGo2PcnTZ5FEN36rSFCZTX8oI8Pv+EAJzfBbz08SNlRcMiQaFmUMf nweY/ctQSrk8J7df+RTLSIAC8Oi3GE63NLZ/ZHyi2BSK31em/AdkKHGFQ Q==; X-CSE-ConnectionGUID: FH6sIy+uSpiSSEfPWWghOg== X-CSE-MsgGUID: PIugjJgbSU2naWm2mYd7Pw== X-IronPort-AV: E=McAfee;i="6700,10204,11228"; a="28963388" X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="28963388" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Oct 2024 23:48:10 -0700 X-CSE-ConnectionGUID: IJqafArETn+8qdBvF+ZmKQ== X-CSE-MsgGUID: heBzjw1nQkOgWyQXAlD5Kg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="82744533" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.6]) by fmviesa003.fm.intel.com with ESMTP; 17 Oct 2024 23:48:10 -0700 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, bfoster@redhat.com, dchinner@redhat.com, chrisl@kernel.org, david@redhat.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [RFC PATCH v1 7/7] mm: For IAA batching, reduce SWAP_BATCH and modify swap slot cache thresholds. Date: Thu, 17 Oct 2024 23:48:05 -0700 Message-Id: <20241018064805.336490-8-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> References: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E9D1FC0019 X-Stat-Signature: dm7kqwh73dyg3d5wa1iyzxbcsioj4tk8 X-Rspam-User: X-HE-Tag: 1729234082-538479 X-HE-Meta: U2FsdGVkX1+wGMeIuRs7y5AmFpOparGxpRi6oBHgt08U6jkrn/ocSoUMpK/e0CxFAILnzIr53zPT3wwrfW54I0UBSqr5z0RoUzd3YRvvI+RPIm4pX4WDTBSnSr7pHT7+l6zGcpm+//sWk0bxD9ofN9cvfLIRtP00MeQkeifCTQBOtR9k28pJYLXyLwtxb9XmjFD8RYU6X7RanrITItFTnz7JglDDfUIjId0hF6oqUdFNFdB+Qe6jo+HW3c8ajHsZQulTjq7T+hgzoI/8dtjaRCvAmsbgcP8w80nZRe4uvT3xYkiiKNdjoVexIDbh17reLUbwoipdE/I6/AqSTX+Nqdi7Ve/WRbU0FeYG22s7hXl7t4Y0nqxPZ+37+5XU/E8rEexns5knyL2N7zxQ6yBQjoUOtaqLULtGgXZ1TT9ymjQ9hUnkMXykal/VihdI9ZS6ECU1r6AQcfm+QKVk/fUKUVvcVkEYfMU8Y4xfQyR4d01KG22JTwX0P82NjrDWWE+Y6NhpGoNHmCCyvr6k0tPXV8qaAezCNA07ATACF+vZncpSp55E5sb+QXiJ7ILk6Y64fuKzILmqsKGq1dQV8clTQPmOCuBZ/YF/R6TEqA6yOVsfqMSMczBg/KCgTbtvdysNNLWznBmAhHbY0t+8KCOy8RgZPcPPKEKvylTcuJFXWn2pVmflxVqzxphxOabIG3fUjxjUnKSCCXG0AN56rXjEiW6GUpOLaHXjIUFJQXeTts90JL7h2GTVyVmw/uHoZQzgHC0Hm42qvUy1qXkdsnGJw2PMfcz21wiKQmEnzbrZk48Rkwlx4x2BWlnMWNFXCuwH2zpyuiIM28JQ4zuN86Bazr8KUQaQgesaFQYtysA43PAU0xRzb9r/2kBVR4zwpf46Lp7poGGQgr5HX2Uvcle5qPO3M2hc5r70AvYkbZLEBXVsm5xHJOh3I9JaNKQqfdIp/1gMMrp19PAimtwNkr5 b3+nNzeq dcKVsT0avV2EyDoRBtpyBZnVS2fD/eUSO2hycljE2Gj/1kXCflvLuPVp5nhd4EuhLGzJR22MOX1Ad3Jqfvh8lOIr45cYs2r2UrBYgqE84PcPWE/smEiy87e0x8sIuX8jIEcWGhcHrpvPBcqlKgxMGbeCKNbfeUycH6hcIEqT3alhQ6OoZ7yRSPWwd9NqlpIjO/HShpW2mZkg62IYeFU6ntl6ETG78A34AjHy7y+NHEuwkdnxPeamD8u4P2rxMbz9V7ywv8emdgir2ToU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When IAA is used for compress batching and decompress batching of folios, we significantly reduce the swapout-swapin path latencies, such that swap page-faults' latencies are reduced. This means swap entries will need to be freed more often, and swap slots will have to be released more often. The existing SWAP_BATCH and SWAP_SLOTS_CACHE_SIZE value of 64 can cause lock contention of the swap_info_struct lock in swapcache_free_entries and cpu hardlockups can result in highly contended server scenarios. To prevent this, the SWAP_BATCH and SWAP_SLOTS_CACHE_SIZE has been reduced to 16 if IAA is used for compress/decompress batching. The swap_slots_cache activate/deactive thresholds have been modified accordingly, so that we don't compromise performance for stability. Signed-off-by: Kanchana P Sridhar --- include/linux/swap.h | 7 +++++++ include/linux/swap_slots.h | 7 +++++++ 2 files changed, 14 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index ca533b478c21..3987faa0a2ff 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -32,7 +33,13 @@ struct pagevec; #define SWAP_FLAGS_VALID (SWAP_FLAG_PRIO_MASK | SWAP_FLAG_PREFER | \ SWAP_FLAG_DISCARD | SWAP_FLAG_DISCARD_ONCE | \ SWAP_FLAG_DISCARD_PAGES) + +#if defined(CONFIG_ZSWAP_STORE_BATCHING_ENABLED) || \ + defined(CONFIG_ZSWAP_LOAD_BATCHING_ENABLED) +#define SWAP_BATCH 16 +#else #define SWAP_BATCH 64 +#endif static inline int current_is_kswapd(void) { diff --git a/include/linux/swap_slots.h b/include/linux/swap_slots.h index 15adfb8c813a..1b6e4e2798bd 100644 --- a/include/linux/swap_slots.h +++ b/include/linux/swap_slots.h @@ -7,8 +7,15 @@ #include #define SWAP_SLOTS_CACHE_SIZE SWAP_BATCH + +#if defined(CONFIG_ZSWAP_STORE_BATCHING_ENABLED) || \ + defined(CONFIG_ZSWAP_LOAD_BATCHING_ENABLED) +#define THRESHOLD_ACTIVATE_SWAP_SLOTS_CACHE (40*SWAP_SLOTS_CACHE_SIZE) +#define THRESHOLD_DEACTIVATE_SWAP_SLOTS_CACHE (16*SWAP_SLOTS_CACHE_SIZE) +#else #define THRESHOLD_ACTIVATE_SWAP_SLOTS_CACHE (5*SWAP_SLOTS_CACHE_SIZE) #define THRESHOLD_DEACTIVATE_SWAP_SLOTS_CACHE (2*SWAP_SLOTS_CACHE_SIZE) +#endif struct swap_slots_cache { bool lock_initialized;