From patchwork Wed Aug 14 06:28:29 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
X-Patchwork-Id: 13762878
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DB4B8C52D7F
	for <linux-mm@archiver.kernel.org>; Wed, 14 Aug 2024 06:28:45 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id A24CC6B0089; Wed, 14 Aug 2024 02:28:42 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 9D0AB6B008A; Wed, 14 Aug 2024 02:28:42 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 7127B6B0092; Wed, 14 Aug 2024 02:28:42 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com
 [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id 4D8326B0089
	for <linux-mm@kvack.org>; Wed, 14 Aug 2024 02:28:42 -0400 (EDT)
Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id F359A1A0C5F
	for <linux-mm@kvack.org>; Wed, 14 Aug 2024 06:28:41 +0000 (UTC)
X-FDA: 82449872442.07.689F542
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	by imf23.hostedemail.com (Postfix) with ESMTP id B27B1140007
	for <linux-mm@kvack.org>; Wed, 14 Aug 2024 06:28:39 +0000 (UTC)
Authentication-Results: imf23.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=MXLxDP9s;
	spf=pass (imf23.hostedemail.com: domain of kanchana.p.sridhar@intel.com
 designates 198.175.65.9 as permitted sender)
 smtp.mailfrom=kanchana.p.sridhar@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1723616907;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=Ebj5GFDs/74o0qRFKlK9Fp6ADE9sHKKOQfDE5ev/VP8=;
	b=t8+7KZxarqRv15ilIYNmIAgxQLFp01AYZgTWVqZT0ATGP2GIcOg+TqOyO2xfRUeCDKKb4E
	dJGKz8zF28dSHVyBwSU/xRBgWQkGoP/rOZhqOfiL5uJLaaIKnOCY6/W4JZluqlYWxn2vji
	GfedYG+mlsdUNcAOONntipO4KGxeqi8=
ARC-Authentication-Results: i=1;
	imf23.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=MXLxDP9s;
	spf=pass (imf23.hostedemail.com: domain of kanchana.p.sridhar@intel.com
 designates 198.175.65.9 as permitted sender)
 smtp.mailfrom=kanchana.p.sridhar@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723616907; a=rsa-sha256;
	cv=none;
	b=8m3tTNb5TTqmB7hLsAKQjiGWVJl3oLdaMdULlRPelils3pb91Nc9jzTHVsKAZ13F4ROCmq
	qKtpKSlyuZr56687qcakAGs0a7YN518phzxCoQSftI1keVc4V49BDudgcy/Dv5wQiB6aWQ
	pvzt365veKO4BBLCOaMFhg4Icom5iQ0=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1723616920; x=1755152920;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=b0exLTLKXempeXy+88HdDHRjQzo5T2XSUscqTXLuQAo=;
  b=MXLxDP9sXMu0B7G99rsBLFW7xinv8X4uuF//3ToX+SjXAjgCtBRnpLxd
   5VSSjTtwxB8AtzAcHCNeDRAT0rHUirVcTwMfN4PVdJiRerczRBDzbe1pP
   T+O+oaQOasCP1tsqx14x9/mz8UA07aaXRhmbV/PuyP7/2eme39Nx0HPAJ
   1XUzFMi5oNcrs2Pv+C31doRUtTrWVPwm44RaIjUGrE/acXYuynZVDAFr5
   cc1YH8aX8OTuajOYY4Obllmaw0u78LwhNmxORnhCq0hAN/R1MTxEBwvRA
   Ajk2Eo1pFN3rsP8KK6uRz+0qDglYcQGeXAuXYEzG1HxV1WL1+sMudFwor
   g==;
X-CSE-ConnectionGUID: dqPmvxYxSRmHBlnMg+Yq2A==
X-CSE-MsgGUID: 3d13CpWDSzyfZpaZ88cVLQ==
X-IronPort-AV: E=McAfee;i="6700,10204,11163"; a="44333019"
X-IronPort-AV: E=Sophos;i="6.09,288,1716274800";
   d="scan'208";a="44333019"
Received: from fmviesa004.fm.intel.com ([10.60.135.144])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 13 Aug 2024 23:28:33 -0700
X-CSE-ConnectionGUID: v572F++PQd+9e4jrfr8uVg==
X-CSE-MsgGUID: nrg/9K0lRlityO7atqhYpQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.09,288,1716274800";
   d="scan'208";a="63568759"
Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.6])
  by fmviesa004.fm.intel.com with ESMTP; 13 Aug 2024 23:28:33 -0700
From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
To: linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	hannes@cmpxchg.org,
	yosryahmed@google.com,
	nphamcs@gmail.com,
	ryan.roberts@arm.com,
	ying.huang@intel.com,
	21cnbao@gmail.com,
	akpm@linux-foundation.org
Cc: nanhai.zou@intel.com,
	wajdi.k.feghali@intel.com,
	vinodh.gopal@intel.com,
	kanchana.p.sridhar@intel.com
Subject: [RFC PATCH v1 3/4] mm: zswap: zswap_store() extended to handle mTHP
 folios.
Date: Tue, 13 Aug 2024 23:28:29 -0700
Message-Id: <20240814062830.26833-4-kanchana.p.sridhar@intel.com>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20240814062830.26833-1-kanchana.p.sridhar@intel.com>
References: <20240814062830.26833-1-kanchana.p.sridhar@intel.com>
MIME-Version: 1.0
X-Rspam-User: 
X-Stat-Signature: r3kyz5ohucouz56zcykwqaeigsqckao1
X-Rspamd-Queue-Id: B27B1140007
X-Rspamd-Server: rspam11
X-HE-Tag: 1723616919-351119
X-HE-Meta: 
 U2FsdGVkX1+V/HiCaImm05MJVP09zm7qFB111JxPp3gUaHwizcYkZi4h9B+WpYHRTBpWU0GRIt4pdvQBVDqDh+c2oj0qjRjsUQHSsdIkXwQwd0R9NAYFSdCNOvZ63ptxrjHKEozMfxtAHW24SsS+d6jtQxeoQUE4wOpDAYYMUxPIGtNcGJDLrlTJNXDMNIv4Nc5JEkh08l2bqqHFVQNy5nH8hNOOhtB5XXt8W0vBm6mXzuPojGFSqPBjSBaShU3GsN04RuFFEvRTWxMjhxDVgcfn65l1OQD+c8hOQqr4midG8Fls6w67a6/5bZz//6KNaS+AkgPgdFJJEscQbRapQfMi9xTG7NIslvTFcp8mR9/ZtaMePMfsE7XO/0rFkkdojDYGCc/Y95uK782zbaYXE+qiWWbxt9s6muSSXlTJA1Q7FJBghs6TEGtaTY9uEAM6HK7K9MSf6Yd+GMO1XOc59g0racNFqTaiBxiF8w8iz+Vao73hRH3gWX42aYvDDZpUgnputUy59rv4SdBVD/CGVqUrKlBp1DOP8gYwyQojVps6ubwFMyl/bATDeYFvN4bwSGhKtoS7MYJ6GOF0ScSv2vLr/BD0vDA/i+vWD13hIoIxxZLAq1OeNCKe19wbRA8AfxJ7gRkfizzPts35oxq4phTaBZPWVAkDq9AZXwkEZk+ZTjtLHOFU3sTiqZM/vqRYXjlI3xktBX6WJcxNcJj6uzdN94UARt4/ZfJ80KYaEVnuuv0MxoaAJk7Tu5baa+RTPb6LM4EnowQ8VEaeTTd3IYpPL9ffZty7FNf1XvPMsK+NB1k+vREZmmmNnyJLkv9hq2LiIkctdQvKP9h0G4W6t5+q8Qh+iezr8n8ZEQo2FBlipbZnAVoD9wx/joismmyeQ8yr6JEKRSX2dE6W5ZRqNfPkhFLA0if3zQrbIInjhaGL3lNvL4BX9A6VB2/2evQ16FMER7KLeEvE/Ttb0Al
 6uC0IM4H
 GgVoUf1CdYUxUt6A8nEU3VBOxNX8v39EslxliyiVSZjx/ZZ7s2Fi9L+L2K6uGGPwF9nSL5377U5yiC+KI89Pp4oV47FiNjIH5XibqBQrsZg24jFdajo+oTHly9gl+BEdDICrgpcoGcEtaZQq6FUpr75xRWZhCYbgOWMuCJQD7Ml97VVZMQGcasjihGAfDq9C998qihqJcZN0VAMLnqXOV/gBjB+f5C4ghLxGOLuU1Hp62M3w6plHiZ/2vaf0WvgMkjDrmngPFMfi9NIVyS5cPt7LeQ454YVxqWeXbMtW8vxeWADF7D/Vc5YOsmGbafhShMzoZ
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

zswap_store() will now process and store mTHP and PMD-size THP folios.

This change reuses and adapts the functionality in Ryan Roberts' RFC
patch [1]:

  "[RFC,v1] mm: zswap: Store large folios without splitting"

  [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@arm.com/T/#u

This patch provides a sequential implementation of storing an mTHP in
zswap_store() by iterating through each page in the folio to compress
and store it in the zswap zpool.

Towards this goal, zswap_compress() is modified to take a page instead
of a folio as input.

Each page's swap offset is stored as a separate zswap entry.

If an error is encountered during the store of any page in the mTHP,
all previous pages/entries stored will be invalidated. Thus, an mTHP
is either entirely stored in ZSWAP, or entirely not stored in ZSWAP.

This forms the basis for building batching of pages during zswap store
of large folios, by compressing batches of up to say, 8 pages in an
mTHP in parallel in hardware, with the Intel In-Memory Analytics
Accelerator (Intel IAA).

Co-developed-by: Ryan Roberts
Signed-off-by:
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 mm/zswap.c | 219 ++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 157 insertions(+), 62 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index a6b0a7c636db..98ff98b485f5 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -899,7 +899,7 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
 	return 0;
 }
 
-static bool zswap_compress(struct folio *folio, struct zswap_entry *entry)
+static bool zswap_compress(struct page *page, struct zswap_entry *entry)
 {
 	struct crypto_acomp_ctx *acomp_ctx;
 	struct scatterlist input, output;
@@ -917,7 +917,7 @@ static bool zswap_compress(struct folio *folio, struct zswap_entry *entry)
 
 	dst = acomp_ctx->buffer;
 	sg_init_table(&input, 1);
-	sg_set_page(&input, &folio->page, PAGE_SIZE, 0);
+	sg_set_page(&input, page, PAGE_SIZE, 0);
 
 	/*
 	 * We need PAGE_SIZE * 2 here since there maybe over-compression case,
@@ -1409,36 +1409,82 @@ static void zswap_fill_page(void *ptr, unsigned long value)
 /*********************************
 * main API
 **********************************/
-bool zswap_store(struct folio *folio)
+
+/*
+ * Returns true if the entry was successfully
+ * stored in the xarray, and false otherwise.
+ */
+static bool zswap_store_entry(struct xarray *tree,
+			      struct zswap_entry *entry)
 {
-	swp_entry_t swp = folio->swap;
-	pgoff_t offset = swp_offset(swp);
-	struct xarray *tree = swap_zswap_tree(swp);
-	struct zswap_entry *entry, *old;
-	struct obj_cgroup *objcg = NULL;
-	struct mem_cgroup *memcg = NULL;
-	unsigned long value;
+	struct zswap_entry *old;
+	pgoff_t offset = swp_offset(entry->swpentry);
 
-	VM_WARN_ON_ONCE(!folio_test_locked(folio));
-	VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
+	old = xa_store(tree, offset, entry, GFP_KERNEL);
 
-	/* Large folios aren't supported */
-	if (folio_test_large(folio))
+	if (xa_is_err(old)) {
+		int err = xa_err(old);
+
+		WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
+		zswap_reject_alloc_fail++;
 		return false;
+	}
 
-	if (!zswap_enabled)
-		goto check_old;
+	/*
+	 * We may have had an existing entry that became stale when
+	 * the folio was redirtied and now the new version is being
+	 * swapped out. Get rid of the old.
+	 */
+	if (old)
+		zswap_entry_free(old);
 
-	/* Check cgroup limits */
-	objcg = get_obj_cgroup_from_folio(folio);
-	if (objcg && !obj_cgroup_may_zswap(objcg)) {
-		memcg = get_mem_cgroup_from_objcg(objcg);
-		if (shrink_memcg(memcg)) {
-			mem_cgroup_put(memcg);
-			goto reject;
-		}
-		mem_cgroup_put(memcg);
+	return true;
+}
+
+/*
+ * If the zswap store fails or zswap is disabled, we must invalidate the
+ * possibly stale entry which was previously stored at this offset.
+ * Otherwise, writeback could overwrite the new data in the swapfile.
+ *
+ * This is called after the store of the i-th offset
+ * in a large folio, has failed. All entries from
+ * [i-1 .. 0] must be deleted.
+ *
+ * This is also called if zswap_store() is called,
+ * but zswap is not enabled. All offsets for the folio
+ * are deleted from zswap in this case.
+ */
+static void zswap_delete_stored_offsets(struct xarray *tree,
+					pgoff_t offset,
+					long nr_pages)
+{
+	struct zswap_entry *entry;
+	long i;
+
+	for (i = 0; i < nr_pages; ++i) {
+		entry = xa_erase(tree, offset + i);
+		if (entry)
+			zswap_entry_free(entry);
 	}
+}
+
+/*
+ * Stores the page at specified "index" in a folio.
+ */
+static bool zswap_store_page(struct folio *folio, long index,
+			     struct obj_cgroup *objcg,
+			     struct zswap_pool *pool)
+{
+	swp_entry_t swp = folio->swap;
+	int type = swp_type(swp);
+	pgoff_t offset = swp_offset(swp) + index;
+	struct page *page = folio_page(folio, index);
+	struct xarray *tree = swap_zswap_tree(swp);
+	struct zswap_entry *entry;
+	unsigned long value;
+
+	if (objcg)
+		obj_cgroup_get(objcg);
 
 	if (zswap_check_limits())
 		goto reject;
@@ -1450,7 +1496,7 @@ bool zswap_store(struct folio *folio)
 		goto reject;
 	}
 
-	if (zswap_is_folio_same_filled(folio, 0, &value)) {
+	if (zswap_is_folio_same_filled(folio, index, &value)) {
 		entry->length = 0;
 		entry->value = value;
 		atomic_inc(&zswap_same_filled_pages);
@@ -1458,42 +1504,20 @@ bool zswap_store(struct folio *folio)
 	}
 
 	/* if entry is successfully added, it keeps the reference */
-	entry->pool = zswap_pool_current_get();
-	if (!entry->pool)
+	if (!zswap_pool_get(pool))
 		goto freepage;
 
-	if (objcg) {
-		memcg = get_mem_cgroup_from_objcg(objcg);
-		if (memcg_list_lru_alloc(memcg, &zswap_list_lru, GFP_KERNEL)) {
-			mem_cgroup_put(memcg);
-			goto put_pool;
-		}
-		mem_cgroup_put(memcg);
-	}
+	entry->pool = pool;
 
-	if (!zswap_compress(folio, entry))
+	if (!zswap_compress(page, entry))
 		goto put_pool;
 
 store_entry:
-	entry->swpentry = swp;
+	entry->swpentry = swp_entry(type, offset);
 	entry->objcg = objcg;
 
-	old = xa_store(tree, offset, entry, GFP_KERNEL);
-	if (xa_is_err(old)) {
-		int err = xa_err(old);
-
-		WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
-		zswap_reject_alloc_fail++;
+	if (!zswap_store_entry(tree, entry))
 		goto store_failed;
-	}
-
-	/*
-	 * We may have had an existing entry that became stale when
-	 * the folio was redirtied and now the new version is being
-	 * swapped out. Get rid of the old.
-	 */
-	if (old)
-		zswap_entry_free(old);
 
 	if (objcg) {
 		obj_cgroup_charge_zswap(objcg, entry->length);
@@ -1527,7 +1551,7 @@ bool zswap_store(struct folio *folio)
 	else {
 		zpool_free(zswap_find_zpool(entry), entry->handle);
 put_pool:
-		zswap_pool_put(entry->pool);
+		zswap_pool_put(pool);
 	}
 freepage:
 	zswap_entry_cache_free(entry);
@@ -1535,16 +1559,87 @@ bool zswap_store(struct folio *folio)
 	obj_cgroup_put(objcg);
 	if (zswap_pool_reached_full)
 		queue_work(shrink_wq, &zswap_shrink_work);
-check_old:
+
+	return false;
+}
+
+/*
+ * Modified to store mTHP folios.
+ * Each page in the mTHP will be compressed
+ * and stored sequentially.
+ */
+bool zswap_store(struct folio *folio)
+{
+	long nr_pages = folio_nr_pages(folio);
+	swp_entry_t swp = folio->swap;
+	pgoff_t offset = swp_offset(swp);
+	struct xarray *tree = swap_zswap_tree(swp);
+	struct obj_cgroup *objcg = NULL;
+	struct mem_cgroup *memcg = NULL;
+	struct zswap_pool *pool;
+	bool ret = false;
+	long index;
+
+	VM_WARN_ON_ONCE(!folio_test_locked(folio));
+	VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
+
 	/*
-	 * If the zswap store fails or zswap is disabled, we must invalidate the
-	 * possibly stale entry which was previously stored at this offset.
-	 * Otherwise, writeback could overwrite the new data in the swapfile.
+	 * If zswap is disabled, we must invalidate the possibly stale entry
+	 * which was previously stored at this offset. Otherwise, writeback
+	 * could overwrite the new data in the swapfile.
 	 */
-	entry = xa_erase(tree, offset);
-	if (entry)
-		zswap_entry_free(entry);
-	return false;
+	if (!zswap_enabled)
+		goto reject;
+
+	/* Check cgroup limits */
+	objcg = get_obj_cgroup_from_folio(folio);
+	if (objcg && !obj_cgroup_may_zswap(objcg)) {
+		memcg = get_mem_cgroup_from_objcg(objcg);
+		if (shrink_memcg(memcg)) {
+			mem_cgroup_put(memcg);
+			goto put_objcg;
+		}
+		mem_cgroup_put(memcg);
+	}
+
+	if (zswap_check_limits())
+		goto put_objcg;
+
+	pool = zswap_pool_current_get();
+	if (!pool)
+		goto put_objcg;
+
+	if (objcg) {
+		memcg = get_mem_cgroup_from_objcg(objcg);
+		if (memcg_list_lru_alloc(memcg, &zswap_list_lru, GFP_KERNEL)) {
+			mem_cgroup_put(memcg);
+			goto put_pool;
+		}
+		mem_cgroup_put(memcg);
+	}
+
+	/*
+	 * Store each page of the folio as a separate entry. If we fail to store
+	 * a page, unwind by removing all the previous pages we stored.
+	 */
+	for (index = 0; index < nr_pages; ++index) {
+		if (!zswap_store_page(folio, index, objcg, pool))
+			goto put_pool;
+	}
+
+	ret = true;
+
+put_pool:
+	zswap_pool_put(pool);
+put_objcg:
+	obj_cgroup_put(objcg);
+	if (zswap_pool_reached_full)
+		queue_work(shrink_wq, &zswap_shrink_work);
+reject:
+	if (!ret)
+		zswap_delete_stored_offsets(tree, offset, nr_pages);
+
+	return ret;
 }
 
 bool zswap_load(struct folio *folio)