From patchwork Thu Nov 21 22:25:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13882400 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85DC5E64039 for ; Thu, 21 Nov 2024 22:25:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE20C6B0082; Thu, 21 Nov 2024 17:25:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D91076B0083; Thu, 21 Nov 2024 17:25:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C320A6B0085; Thu, 21 Nov 2024 17:25:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A59586B0082 for ; Thu, 21 Nov 2024 17:25:52 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4A6441A14BB for ; Thu, 21 Nov 2024 22:25:52 +0000 (UTC) X-FDA: 82811532762.09.66441E5 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf05.hostedemail.com (Postfix) with ESMTP id B8FC9100003 for ; Thu, 21 Nov 2024 22:24:09 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LTAC9eWm; spf=pass (imf05.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732227799; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Guh7Yn9Hc4d+8Fk3RwLTIYPaNCb6Z7W95umc9lFaHl0=; b=H6PkCLWPvf5IXxGcCfZXWcKQiKitJnVgXIWzy1CC0yJUFg7ayqEd7nHaf2R86orOMd15gm XVM3PfAU0QYAG/tmfPOUwx5PEYwJjm0W95Dze5kEVM6TstvcXEVPvodOuamyCYNQ1LZpZ3 oVvu/7ULaeA56LdZ80pSPTRRqm9Rk1Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732227799; a=rsa-sha256; cv=none; b=zx4k5sEM1MnmN44143v/HE1GjTzolryszWRhIr0rhDeYl1fWnrSYizxgOMIpopigyBvdY7 U8feskN9Mlb7vaotqu907DFZt/LSKWfoYlV6wMFdHHU/xof+LyqmRM6GaSlb6lglevb4zA EtCX+TGdzy+0l3QXI0VxJqIrFTejx2o= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LTAC9eWm; spf=pass (imf05.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-724e1b08fc7so17427b3a.0 for ; Thu, 21 Nov 2024 14:25:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732227949; x=1732832749; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Guh7Yn9Hc4d+8Fk3RwLTIYPaNCb6Z7W95umc9lFaHl0=; b=LTAC9eWmWDzGvoBKDPATfixHknCGsBLAYcQs4CIvT7V5KoNLKLrI9VwpE04clV4vsx g5ZDhjdhIbYWLBbRhpmBSHNTzdrwp670jjriYWxUIKxNO6f08TlM1wjbbRdJSX14eoEc Jp+x9qAiIaIz3FusXUtITcxleUA96GebuZLW/XUf/Uarr+SOwhMyjUw43qB1IWv8a5oO wn72v09Ts9yuhRa48KT8qZh0se3wiT880xnlWz8Cl3tRlVdsLe+W8bG2Sxg6xyoQPN1G dIEL9TTtU4LfrcCoDordEuprpXPXk05zcgG1gVQap17vR9BTn33Ok+rMkDkMhShY2uRx 2JHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732227949; x=1732832749; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Guh7Yn9Hc4d+8Fk3RwLTIYPaNCb6Z7W95umc9lFaHl0=; b=Bh4QKDf/uk9XJ60f0Ujyjb6Vo9lYh3SzRQEKsEkBUz+QU2BrbqY1Vg6rPcFJm+F8Dk zkmJjSH1HMfgbiYunANUYjxKjllByDChlyH5l2aHRO6314L0hy3RWkIGouoehUteV1uc K1J8baQDt3rhHWyBqy7ndb2vhBKB+SxFKrjLH8zbyvhVaaFoCXreFO/VxdheK3/Hc4jI bXBlz7aEExCWpqGj4WcPuTzLe4llnP9vksII8yS9SykeXmMoSM8UnGEUrzK9ihkvGurS u8A3bbQnDZcdPHHD7ul3U0Cyn/kcmAVbQus3q5ImwoclRHj53PcFR/FNBas3EOlcih9A mhzg== X-Forwarded-Encrypted: i=1; AJvYcCU4YswQsa73e3Co67sOwROVcexl+uaHHXHJfQM5kFu1/08gMwx3VrsW7EowJBoaa03EgqfNgfsqCw==@kvack.org X-Gm-Message-State: AOJu0YwniMKJWe1EgEHc8qCH0LbCUvzGzx2pQa2fmxxG/3LpxYD8WZhY o/1z9WCUlP2SSaKWz1Ju5Ua6hdWpbu3RYK+eg4lekJEaGVI4nDTX+fn6wA== X-Gm-Gg: ASbGncul3xEXAw87whcwxaEy6a9FdTcuO8F1PcNf815lgXKotSd9acJQja9zjsDm947 kV+4NbMY5oO8yZF5HhngrjwEZkTOGH1t8Q/vbnLppzi3rmMf/VIiu9TSl0H7h4v4sLJgS8062BQ AJ2v4YnxuphrdzFSM/oq7ivnbPpBDWIA/IGUNfz8W3+P4Kel1L7G2LYsQnbkBlSxL7ReCOiRPZa Fz3esotkbbEsw7QGSxhsYMFeEoEbfeHJppNS/wjmGINTrnXWKRG7+FhGD3GsPsFFWJf0n8S X-Google-Smtp-Source: AGHT+IETA4wpw+VnntD97h/Hb+AphNIR21B0ZEPJo5fYxt7LbjH5bsj0mzFINEemUwE3VvHhaOk8Nw== X-Received: by 2002:a17:902:f686:b0:20c:e1f5:48c7 with SMTP id d9443c01a7336-2129f830574mr5131335ad.55.1732227948865; Thu, 21 Nov 2024 14:25:48 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:8942:5500:9d64:b0ba:faf2:680e]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2129dba22f4sm3334745ad.100.2024.11.21.14.25.40 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 21 Nov 2024 14:25:48 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kanchana.p.sridhar@intel.com, kasong@tencent.com, linux-block@vger.kernel.org, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, surenb@google.com, terrelln@fb.com, usamaarif642@gmail.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, zhengtangquan@oppo.com, zhouchengming@bytedance.com Subject: [PATCH RFC v3 0/4] mTHP-friendly compression in zsmalloc and zram based on multi-pages Date: Fri, 22 Nov 2024 11:25:17 +1300 Message-Id: <20241121222521.83458-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: B8FC9100003 X-Stat-Signature: udnwfp96qcypxxi1xypt31it6eg4tf5g X-HE-Tag: 1732227849-366181 X-HE-Meta: U2FsdGVkX180ajzCVbTxEk5O3U/9tkvz1tZChH+9BIoIngDxe7aeEJi+7/9+TrlxCoks440H7J3lQ9XgoRS6WjBSZ1iYaF1eDUK0InmN/bLS7tgQVos3+ddrHcpYT4aWzWfiMhppVFSzAoDM+BdeK9UobrZQWOgv4c6e9AeecpPJC/REQIU8uR9rmqo7eiLyLRLbuTa9cM/YR9zjNYu/5zzUxzQU8O0CTyETFz2Y7ByGIr/3/FJ8yK+f6dvhd1Js4h1FsTW6Un08pDppjoknKvJYDlvw+rCpxfb2eC/GcIIgAH4Td/6pB1aGH0pIyV3cxkstty3svXvTrT2bmlgQQPbr129K6jkeacfAVCuXAbWkADAv7nalxbHggwC1MQFD5QJsFC39JzMB+aWsC8HtB+VQcOCOO3h38AvR0HiN/I3vtUqUrqWsfaEETUYW9/U/odeYFbtVjs/mj7wOfd/68pLXunTBJDF10MEmn4Mz7WNy3ZcXTF3CkAULifsVR1TYEfr6ynfbp+Xp2YA5a5n6DNr0gJQ0X+mTmFhAUKsNm0/AieBlBAyg7I3w3yyLW+k4qwqnbFjqap5g2mtYaBKgRkx7kKKInrrVsyUFa95Kx0kyT3wIIEIIHgwmOr5ThcTb+K938i1HPEPhr/0Sdm0mNCPSosTuw3lVLe5D5HxLmsI+9M/uTINK2rwSEIuGRdUuw0JfepIjScClUyPjfAQAEiHH/Gf0LZU8dtTZLPInBz0oExbDs0TEw4WU76H6vVTjlLceDlRD8F4Ijme7fbdaNsZ0z7LUyOnX1cd0YqyvSngFv5M6LW2KpNfZYLhfTXK+eOqm+Ty0XHtvFJGBY8z/p02YED6SkUjS1Q7HouPYHZZp+khmD2b6SXwGZi0tX/esmNCEdI254SSLSGxqq5YS3NLjcRY/9sIVc5eVqCI4+qo4NrdppHEqxjPR1ZHjg8e82t5DnvIP4dsiqhRvTwN XkLmSn14 ngHFrcE2Ezbg6WpvvTRmRrHqXZfwTGFQa2RNAC211ybbRNSCml2g12Jyx+Gb7T3HvqbrbwCwaLZhGg2dTo79J0G8iLaX+2094iuvdqx9X0ByousdKWE5I8AEtQ95t6hyncTYzrpF62wq6H/b8ocLWJ2nzSC0MV6p4S7fULfbhdhwG3y0oliWfh7rBNc5iNScYNiK9ykIa+BPmMVfPSrcZ9SyL+WbldauR+v65qQ6euWhk5T/+K4ZmevGYwBv3oaEz/KsLO3FxD8kTmaBdDeQRj2gjr2Y2owXcDD4ioqOmOSSqEOFvqGSEDFYUJEhfbHslPumt5VIjp2M9Xn1Gbhh9a0Tiajt82XZq5lGxWTxsvoe8v9RjKTNLaj5s28cqo8YJnTdNM3vUM1vheqKfqDRi7oje/r6eNGo1czg4bZsdRvMhTtnATbtdDKx+yg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song When large folios are compressed at a larger granularity, we observe a notable reduction in CPU usage and a significant improvement in compression ratios. mTHP's ability to be swapped out without splitting and swapped back in as a whole allows compression and decompression at larger granularities. This patchset enhances zsmalloc and zram by adding support for dividing large folios into multi-page blocks, typically configured with a 2-order granularity. Without this patchset, a large folio is always divided into `nr_pages` 4KiB blocks. The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` setting, where the default of 2 allows all anonymous THP to benefit. Examples include: * A 16KiB large folio will be compressed and stored as a single 16KiB block. * A 64KiB large folio will be compressed and stored as four 16KiB blocks. For example, swapping out and swapping in 100MiB of typical anonymous data 100 times (with 16KB mTHP enabled) using zstd yields the following results: w/o patches w/ patches swap-out time(ms) 68711 49908 swap-in time(ms) 30687 20685 compression ratio 20.49% 16.9% I deliberately created a test case with intense swap thrashing. On my Intel i9 10-core, 20-thread PC, I imposed a 1GB memory limit on a memcg to compile the Linux kernel, intending to amplify swap activity and analyze its impact on system time. Using the ZSTD algorithm, my test script, which builds the kernel for five rounds, is as follows: #!/bin/bash echo never > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled echo never > /sys/kernel/mm/transparent_hugepage/hugepages-32kB/enabled echo always > /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled echo never > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled vmstat_path="/proc/vmstat" thp_base_path="/sys/kernel/mm/transparent_hugepage" read_values() { pswpin=$(grep "pswpin" $vmstat_path | awk '{print $2}') pswpout=$(grep "pswpout" $vmstat_path | awk '{print $2}') pgpgin=$(grep "pgpgin" $vmstat_path | awk '{print $2}') pgpgout=$(grep "pgpgout" $vmstat_path | awk '{print $2}') swpout_64k=$(cat $thp_base_path/hugepages-64kB/stats/swpout 2>/dev/null || echo 0) swpout_32k=$(cat $thp_base_path/hugepages-32kB/stats/swpout 2>/dev/null || echo 0) swpout_16k=$(cat $thp_base_path/hugepages-16kB/stats/swpout 2>/dev/null || echo 0) swpin_64k=$(cat $thp_base_path/hugepages-64kB/stats/swpin 2>/dev/null || echo 0) swpin_32k=$(cat $thp_base_path/hugepages-32kB/stats/swpin 2>/dev/null || echo 0) swpin_16k=$(cat $thp_base_path/hugepages-16kB/stats/swpin 2>/dev/null || echo 0) echo "$pswpin $pswpout $swpout_64k $swpout_32k $swpout_16k $swpin_64k $swpin_32k $swpin_16k $pgpgin $pgpgout" } for ((i=1; i<=5; i++)) do echo echo "*** Executing round $i ***" make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- clean 1>/dev/null 2>/dev/null echo 3 > /proc/sys/vm/drop_caches #kernel build initial_values=($(read_values)) time systemd-run --scope -p MemoryMax=1G make ARCH=arm64 \ CROSS_COMPILE=aarch64-linux-gnu- vmlinux -j20 1>/dev/null 2>/dev/null final_values=($(read_values)) echo "pswpin: $((final_values[0] - initial_values[0]))" echo "pswpout: $((final_values[1] - initial_values[1]))" echo "64kB-swpout: $((final_values[2] - initial_values[2]))" echo "32kB-swpout: $((final_values[3] - initial_values[3]))" echo "16kB-swpout: $((final_values[4] - initial_values[4]))" echo "64kB-swpin: $((final_values[5] - initial_values[5]))" echo "32kB-swpin: $((final_values[6] - initial_values[6]))" echo "pgpgin: $((final_values[8] - initial_values[8]))" echo "pgpgout: $((final_values[9] - initial_values[9]))" done ****************** Test results ******* Without the patchset: *** Executing round 1 *** real 7m56.173s user 81m29.401s sys 42m57.470s pswpin: 29815871 pswpout: 50548760 64kB-swpout: 0 32kB-swpout: 0 16kB-swpout: 11206086 64kB-swpin: 0 32kB-swpin: 0 16kB-swpin: 6596517 pgpgin: 146093656 pgpgout: 211024708 *** Executing round 2 *** real 7m48.227s user 81m20.558s sys 43m0.940s pswpin: 29798189 pswpout: 50882005 64kB-swpout: 0 32kB-swpout: 0 16kB-swpout: 11286587 64kB-swpin: 0 32kB-swpin: 0 16kB-swpin: 6596103 pgpgin: 146841468 pgpgout: 212374760 *** Executing round 3 *** real 7m56.664s user 81m10.936s sys 43m5.991s pswpin: 29760702 pswpout: 51230330 64kB-swpout: 0 32kB-swpout: 0 16kB-swpout: 11363346 64kB-swpin: 0 32kB-swpin: 0 16kB-swpin: 6586263 pgpgin: 145374744 pgpgout: 213355600 *** Executing round 4 *** real 8m29.115s user 81m18.955s sys 42m49.050s pswpin: 29651724 pswpout: 50631678 64kB-swpout: 0 32kB-swpout: 0 16kB-swpout: 11249036 64kB-swpin: 0 32kB-swpin: 0 16kB-swpin: 6583515 pgpgin: 145819060 pgpgout: 211373768 *** Executing round 5 *** real 7m46.124s user 80m29.780s sys 41m37.005s pswpin: 28805646 pswpout: 49570858 64kB-swpout: 0 32kB-swpout: 0 16kB-swpout: 11010873 64kB-swpin: 0 32kB-swpin: 0 16kB-swpin: 6391598 pgpgin: 142354376 pgpgout: 20713566 ******* With the patchset: *** Executing round 1 *** real 7m43.760s user 80m35.185s sys 35m50.685s pswpin: 29870407 pswpout: 50101263 64kB-swpout: 0 32kB-swpout: 0 16kB-swpout: 11140509 64kB-swpin: 0 32kB-swpin: 0 16kB-swpin: 6838090 pgpgin: 146500224 pgpgout: 209218896 *** Executing round 2 *** real 7m31.820s user 81m39.787s sys 37m24.341s pswpin: 31100304 pswpout: 51666202 64kB-swpout: 0 32kB-swpout: 0 16kB-swpout: 11471841 64kB-swpin: 0 32kB-swpin: 0 16kB-swpin: 7106112 pgpgin: 151763112 pgpgout: 215526464 *** Executing round 3 *** real 7m35.732s user 79m36.028s sys 34m4.190s pswpin: 28357528 pswpout: 47716236 64kB-swpout: 0 32kB-swpout: 0 16kB-swpout: 10619547 64kB-swpin: 0 32kB-swpin: 0 16kB-swpin: 6500899 pgpgin: 139903688 pgpgout: 199715908 *** Executing round 4 *** real 7m38.242s user 80m50.768s sys 35m54.201s pswpin: 29752937 pswpout: 49977585 64kB-swpout: 0 32kB-swpout: 0 16kB-swpout: 11117552 64kB-swpin: 0 32kB-swpin: 0 16kB-swpin: 6815571 pgpgin: 146293900 pgpgout: 208755500 *** Executing round 5 *** real 8m2.692s user 81m40.159s sys 37m11.361s pswpin: 30813683 pswpout: 51687672 64kB-swpout: 0 32kB-swpout: 0 16kB-swpout: 11481684 64kB-swpin: 0 32kB-swpin: 0 16kB-swpin: 7044988 pgpgin: 150231840 pgpgout: 215616760 Although the real time fluctuated significantly on my PC, the sys time has clearly decreased from over 40 minutes to just over 30 minutes across all five rounds. -v3: * Added a patch to fall back to four smaller folios to avoid partial reads. discussed this option with Usama, Ying, and Nhat in v2. Not entirely sure it will be well-received, but I've done my best to minimize the complexity added to do_swap_page(). * Add a patch to adjust zstd backend estimated_src_size; * Addressed one VM_WARN_ON in patch 1 for PageMovable(); -v2: https://lore.kernel.org/linux-mm/20241107101005.69121-1-21cnbao@gmail.com/ While it is not mature yet, I know some people are waiting for an update :-) * Fixed some stability issues. * rebase againest the latest mm-unstable. * Set default order to 2 which benefits all anon mTHP. * multipages ZsPageMovable is not supported yet. Barry Song (2): zram: backend_zstd: Adjust estimated_src_size to accommodate multi-page compression mm: fall back to four small folios if mTHP allocation fails Tangquan Zheng (2): mm: zsmalloc: support objects compressed based on multiple pages zram: support compression at the granularity of multi-pages drivers/block/zram/Kconfig | 9 + drivers/block/zram/backend_zstd.c | 6 +- drivers/block/zram/zcomp.c | 17 +- drivers/block/zram/zcomp.h | 12 +- drivers/block/zram/zram_drv.c | 450 ++++++++++++++++++++++++++++-- drivers/block/zram/zram_drv.h | 45 +++ include/linux/zsmalloc.h | 10 +- mm/Kconfig | 18 ++ mm/memory.c | 203 +++++++++++++- mm/zsmalloc.c | 235 ++++++++++++---- 10 files changed, 896 insertions(+), 109 deletions(-)