diff mbox series

[RFC,2/2] zswap: add sysfs knob for same node mode

Message ID 20250329110230.2459730-3-nphamcs@gmail.com (mailing list archive)
State New
Headers show
Series zswap: fix placement inversion in memory tiering systems | expand

Commit Message

Nhat Pham March 29, 2025, 11:02 a.m. UTC
Taking advantage of the new node-selection capability of zsmalloc, allow
zswap to keep the compressed copy in the same node as the original page.

The main use case is for CXL systems, where pages in CXL tier should
stay in CXL when they are zswapped so as not to create memory pressure
in higher tier.

This new behavior is opted-in only, and can be enabled as follows:

echo Y > /sys/module/zswap/parameters/same_node_mode

Suggested-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
 Documentation/admin-guide/mm/zswap.rst |  9 +++++++++
 mm/zswap.c                             | 10 ++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
index fd3370aa43fe..be8953acc15e 100644
--- a/Documentation/admin-guide/mm/zswap.rst
+++ b/Documentation/admin-guide/mm/zswap.rst
@@ -142,6 +142,15 @@  User can enable it as follows::
 This can be enabled at the boot time if ``CONFIG_ZSWAP_SHRINKER_DEFAULT_ON`` is
 selected.
 
+In a NUMA system, sometimes we want the compressed copy to reside in the same
+node as the original page. For instance, if we use the NUMA nodes to represent
+a CXL-based memory tiering system, we do not want the pages demoted to the
+lower tier to accidentally return to the higher tier via zswap, creating
+memory pressure in the higher tier. The same-node behavior can be enabled
+as follows::
+
+	echo Y > /sys/module/zswap/parameters/same_node_mode
+
 A debugfs interface is provided for various statistic about pool size, number
 of pages stored, same-value filled pages and various counters for the reasons
 pages are rejected.
diff --git a/mm/zswap.c b/mm/zswap.c
index 89b6d4ade4cd..2eee57648750 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -129,6 +129,9 @@  static bool zswap_shrinker_enabled = IS_ENABLED(
 		CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
 module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);
 
+static bool zswap_same_node_mode;
+module_param_named(same_node_mode, zswap_same_node_mode, bool, 0644);
+
 bool zswap_is_enabled(void)
 {
 	return zswap_enabled;
@@ -942,7 +945,7 @@  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 {
 	struct crypto_acomp_ctx *acomp_ctx;
 	struct scatterlist input, output;
-	int comp_ret = 0, alloc_ret = 0;
+	int comp_ret = 0, alloc_ret = 0, nid = page_to_nid(page);
 	unsigned int dlen = PAGE_SIZE;
 	unsigned long handle;
 	struct zpool *zpool;
@@ -981,7 +984,10 @@  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 
 	zpool = pool->zpool;
 	gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
-	alloc_ret = zpool_malloc(zpool, dlen, gfp, &handle, NULL);
+	if (zswap_same_node_mode)
+		alloc_ret = zpool_malloc(zpool, dlen, gfp, &handle, &nid);
+	else
+		alloc_ret = zpool_malloc(zpool, dlen, gfp, &handle, NULL);
 	if (alloc_ret)
 		goto unlock;