From patchwork Wed Aug 14 06:28:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kanchana P Sridhar X-Patchwork-Id: 13762875 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64680C52D7F for ; Wed, 14 Aug 2024 06:28:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ABB666B0082; Wed, 14 Aug 2024 02:28:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A6C6B6B0083; Wed, 14 Aug 2024 02:28:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90BD16B0085; Wed, 14 Aug 2024 02:28:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 748E86B0082 for ; Wed, 14 Aug 2024 02:28:38 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 555661A0C5C for ; Wed, 14 Aug 2024 06:28:37 +0000 (UTC) X-FDA: 82449872274.28.E4EB638 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by imf23.hostedemail.com (Postfix) with ESMTP id 929B7140005 for ; Wed, 14 Aug 2024 06:28:34 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=l7tR8sdw; spf=pass (imf23.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723616903; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=N/oQgqlmA3b63YBDaruPBFcLUBm9Wr8QDFMNmtqVZGI=; b=6fLqnLHvwZCIB/U6W/AAPBhdkIGeXI4Su38+TvYZ7g3Ggb8NV/AEoDPT5uRZAW/NTVj5Qv F9kAxXir84te4ExHX+67SbvdPSLA+zyMQhMoJr7XdO+/iIflYCdvtHcbBENBaZB07Mc4ER EoSUWd4hLi0AzzYAj2GrKEi6/SJ2mMw= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=l7tR8sdw; spf=pass (imf23.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723616903; a=rsa-sha256; cv=none; b=Rl7vBbyYGcH7YnuLTXdKHCMVMs+EBZHZ31QQQVj/hDNRw3BZYIUD+7ZIiuitCHxpK4pAOJ zUdZXHnUbSmGr6cQVwJdov6kL2wmRApSBF9UmbzwDepojC3GmSs3uOqHdkTvgXbCqyAybH wIYcT1CIofNZXTRiMXfCu0/lr0HVNHc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723616915; x=1755152915; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=zeIHT1v0vvZei3h9UO8HLpf7B6l7GDDoXoIp3RPT/1c=; b=l7tR8sdwAyepzIb6LjGJSuuNJX09ijfMthkN6LA9UwV4ZOlVzV03gcb8 U1oj7PYxz5bYOchkGVCEjVQUb8PbG+4k/z+3uuRjrPQOiuHtv/HKpjcng ntF+09LW2fhE2YN/yRvdBGSFcsbrTfhhmJ8O0pmBYl0WhniR6FIlR32Hp 5PafLYxh9DAEerUT/nkycrU/AyCRwxBRh1HpVYiK333iUSuiW1Bz32xSA tp+tdrnWVk9nJ1ak+5e79VNQ9ubnHEe9OfSmLsTmdUlJAns/V6JUcVN0Y WszJc6gCqsqO7E8jWWXFd5TU8ywfCQrvjEjIG4QAJOEeVJML5zmOcusWz g==; X-CSE-ConnectionGUID: fBR2QPrwSI6hsQwzO64Q7A== X-CSE-MsgGUID: av/RokbWTQyt+1x3mOjXNw== X-IronPort-AV: E=McAfee;i="6700,10204,11163"; a="44332995" X-IronPort-AV: E=Sophos;i="6.09,288,1716274800"; d="scan'208";a="44332995" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Aug 2024 23:28:32 -0700 X-CSE-ConnectionGUID: TfxPtTF9RICevaroIFKQDQ== X-CSE-MsgGUID: a5V/pDQOSNaduiZnBMkimA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,288,1716274800"; d="scan'208";a="63568744" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.6]) by fmviesa004.fm.intel.com with ESMTP; 13 Aug 2024 23:28:32 -0700 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org Cc: nanhai.zou@intel.com, wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [RFC PATCH v1 0/4] mm: ZSWAP swap-out of mTHP folios Date: Tue, 13 Aug 2024 23:28:26 -0700 Message-Id: <20240814062830.26833-1-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: rftsirhfpsnxzrs83oom4goysdnaup35 X-Rspamd-Queue-Id: 929B7140005 X-Rspamd-Server: rspam11 X-HE-Tag: 1723616914-813384 X-HE-Meta: U2FsdGVkX19tTPJhMMNLXTGCWrVyaRLrS1ZfVl575Num2PYPEQ90rrN/W4tpmla77gqCERrCmKtwaLIZ+jh7WkJHuO01dOGjiwuv/MYDhujkrwwVI8P8c7DYYV57UK8+aoK0V1YRxDNvCy3bu6ckC3910hb5g8jpVY4iHAk7s2Gerj+AAqAPbyLVoHW8/wFiT9q1LtmZoL/bqEHJaZa8lKOsWxgHNmWsXwaT2BPXip0UuWiOr++WZ5pbAPd6k5DmUKBXj2kd/pVml8yDkC5gWQwSTVVWBOAE1zkEBOyw0L71rSBIlCX0DoMX2hw46vNh5yThqzTKbPRGWG7rk80HI+T7lZjB9pvoPHrMMGXBz8oCasjCOiiNOhUKBN9Usbh5FuY8pz+s9SJz0cNF0reHZH7OmiWDN9HjPec41T4wZNOZrv09NKdoRXgFNoBOJH91vg70TEAuFHqIwNyHeRf0eMHd6a/aXOeASCV2Sdc7/3UkrggqzsyFIKyOCTr4d4Xtw+BVDPbN5uRoGUPPw/peOWBOePSVWnkVq4UI6xB/CgF3nXCeuDv+iR5cB+TnfT0IGrXmr+TIT2aoBmvNurtwbpHkFDr74/EtcrcKXaPUaGjmrS2e36QMvI9tcXiebvLhW9L2uRG237Dx1BxAYoW1lB0XSWV/oSvVmtcrPoPbCK4pT1LVNxCAaILsrYllLrq7WqztU+eTzikjIxUfdOOWo7agJVmTigTkt1maurqfxQ6hd23q+OP4mhFY9SNMCPXn5YAvn5NGkDodd8ITGF08KFSOmEm2twkE88NJMdrUZUZffihddldrB4kpRUqg0yK99idTa6VdQKpncXwYty3f3hWtAlnxMxGbZSlQ4tJTJC37+V57ctC516OpIqzxAyKawJ9WihfSYJlPAxnLVwHnVfYC9Cc5oZ7Cotkq0pUZVIVkuaWljCn7D2Q32m2X6Ijokv1kSPq4SRw1hXt1Txf 9/jGnXzn rUVNlNBggfewQfPsvtRkQiPOfLB+7YFyLrupoLcwCz6Qke7BEoeWjf8bI150URiv/YdcFIS8SBQp5to7NNZnN8rrCjgtjEpTs7IxGZv8UAWsoKiuiCDri9hdRp749tiZj8/H2xNREzyYdNYRPYXUs1GT+FbXfSgWHJRyFxyxOtoTqrG2fIOz2GUWPCxQZ5tR/9iIQvZQ9sHhueFxp95MhFVabWyupvYHPnuL9p1cqrqFr1JYMlfydaeWg9gqvlipLaG8Zibr980fmtoXximZtJzkYhQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This RFC patch-series enables zswap_store() to accept and store mTHP folios. The most significant contribution in this series is from the earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been migrated to v6.10 in patch [3] of this series. [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@arm.com/T/#u Additionally, there is an attempt to modularize some of the functionality in zswap_store(), to make it more amenable to supporting any-order mTHPs. For instance, the determination of whether a folio is same-filled is based on mapping an index into the folio to derive the page. Likewise, there is a function "zswap_store_entry" added to store a zswap_entry in the xarray. For testing purposes, per-mTHP size vmstat zswap_store event counters are added, and incremented upon successful zswap_store of an mTHP. This patch-series is a precursor to ZSWAP compress batching of mTHP swap-out and decompress batching of swap-ins based on swapin_readahead(), using Intel IAA hardware acceleration, which we would like to submit in subsequent RFC patch-series, with performance improvement data. Performance Testing: ==================== Testing of this patch-series was done with the v6.10 mainline, without and with this RFC, on an Intel Sapphire Rapids server, dual-socket 56 cores per socket, 4 IAA devices per socket. The system has 503 GiB RAM, 176 GiB swap/ZSWAP with ZRAM as the backing swap device. Core frequency was fixed at 2500MHz. The vm-scalability "usemem" test was run in a cgroup whose memory.high was fixed at 40G. Following a similar methodology as in Ryan Roberts' "Swap-out mTHP without splitting" series [2], 70 usemem processes were run, each allocating and writing 1G of memory: usemem --init-time -w -O -n 70 1g Other kernel configuration parameters: ZSWAP Compressor : LZ4, DEFLATE-IAA ZSWAP Allocator : ZSMALLOC ZRAM Compressor : LZO-RLE SWAP page-cluster : 2 In the experiments where "deflate-iaa" is used as the ZSWAP compressor, IAA "compression verification" is enabled. Hence each IAA compression will be decompressed internally by the "iaa_crypto" driver, the crc-s returned by the hardware will be compared and errors reported in case of mismatches. Thus "deflate-iaa" helps ensure better data integrity as compared to the software compressors. Throughput reported by usemem and perf sys time for running the test were measured and averaged across 3 runs: 64KB mTHP: ========== ---------------------------------------------------------- | | | | | |Kernel | mTHP SWAP-OUT | Throughput | Improvement| | | | KB/s | | |----------------|---------------|------------|------------| |v6.10 mainline | ZRAM lzo-rle | 111,180 | Baseline | |zswap-mTHP-RFC | ZSWAP lz4 | 115,996 | 4% | |zswap-mTHP-RFC | ZSWAP | | | | | deflate-iaa | 166,048 | 49% | |----------------------------------------------------------| | | | | | |Kernel | mTHP SWAP-OUT | Sys time | Improvement| | | | sec | | |----------------|---------------|------------|------------| |v6.10 mainline | ZRAM lzo-rle | 1,049.69 | Baseline | |zswap-mTHP RFC | ZSWAP lz4 | 1,178.20 | -12% | |zswap-mTHP-RFC | ZSWAP | | | | | deflate-iaa | 626.12 | 40% | ---------------------------------------------------------- ------------------------------------------------------- | VMSTATS, mTHP ZSWAP stats, | v6.10 | zswap-mTHP | | mTHP ZRAM stats: | mainline | RFC | |-------------------------------------------------------| | pswpin | 16 | 0 | | pswpout | 7,823,984 | 0 | | zswpin | 551 | 647 | | zswpout | 1,410 | 15,175,113 | |-------------------------------------------------------| | thp_swpout | 0 | 0 | | thp_swpout_fallback | 0 | 0 | | pgmajfault | 2,189 | 2,241 | |-------------------------------------------------------| | zswpout_4kb_folio | | 1,497 | | mthp_zswpout_64kb | | 948,351 | |-------------------------------------------------------| | hugepages-64kB/stats/swpout| 488,999 | 0 | ------------------------------------------------------- 2MB PMD-THP/2048K mTHP: ======================= ---------------------------------------------------------- | | | | | |Kernel | mTHP SWAP-OUT | Throughput | Improvement| | | | KB/s | | |----------------|---------------|------------|------------| |v6.10 mainline | ZRAM lzo-rle | 136,617 | Baseline | |zswap-mTHP-RFC | ZSWAP lz4 | 137,360 | 1% | |zswap-mTHP-RFC | ZSWAP | | | | | deflate-iaa | 179,097 | 31% | |----------------------------------------------------------| | | | | | |Kernel | mTHP SWAP-OUT | Sys time | Improvement| | | | sec | | |----------------|---------------|------------|------------| |v6.10 mainline | ZRAM lzo-rle | 1,044.40 | Baseline | |zswap-mTHP RFC | ZSWAP lz4 | 1,035.79 | 1% | |zswap-mTHP-RFC | ZSWAP | | | | | deflate-iaa | 571.31 | 45% | ---------------------------------------------------------- --------------------------------------------------------- | VMSTATS, mTHP ZSWAP stats, | v6.10 | zswap-mTHP | | mTHP ZRAM stats: | mainline | RFC | |---------------------------------------------------------| | pswpin | 0 | 0 | | pswpout | 8,630,272 | 0 | | zswpin | 565 | 6,901 | | zswpout | 1,388 | 15,379,163 | |---------------------------------------------------------| | thp_swpout | 16,856 | 0 | | thp_swpout_fallback | 0 | 0 | | pgmajfault | 2,184 | 8,532 | |---------------------------------------------------------| | zswpout_4kb_folio | | 5,851 | | mthp_zswpout_2048kb | | 30,026 | | zswpout_pmd_thp_folio | | 30,026 | |---------------------------------------------------------| | hugepages-2048kB/stats/swpout| 16,856 | 0 | --------------------------------------------------------- As expected in the "Before" experiment, there are relatively fewer swapouts, because ZRAM utilization is not accounted in the cgroup. With the introduction of zswap_store mTHP, the "After" data reflects the higher swapout activity, and consequent sys time degradation. Our goal is to improve ZSWAP mTHP store performance using batching. With Intel IAA compress/decompress batching used in ZSWAP (to be submitted as additional RFC series), we are able to demonstrate significant performance improvements with IAA as compared to software compressors. For instance, with IAA-Canned compression [3] used with batching of zswap_stores and of zswap_loads, the usemem experiment's average of 3 runs throughput improves to 170,461 KB/s (64KB mTHP) and 188,325 KB/s (2MB THP). [2] https://lore.kernel.org/linux-mm/20240408183946.2991168-1-ryan.roberts@arm.com/ [3] https://patchwork.kernel.org/project/linux-crypto/cover/cover.1710969449.git.andre.glover@linux.intel.com/ Kanchana P Sridhar (4): mm: zswap: zswap_is_folio_same_filled() takes an index in the folio. mm: vmstat: Per mTHP-size zswap_store vmstat event counters. mm: zswap: zswap_store() extended to handle mTHP folios. mm: page_io: Count successful mTHP zswap stores in vmstat. include/linux/vm_event_item.h | 15 +++ mm/page_io.c | 44 +++++++ mm/vmstat.c | 15 +++ mm/zswap.c | 223 ++++++++++++++++++++++++---------- 4 files changed, 233 insertions(+), 64 deletions(-)