From patchwork Fri Oct 18 06:48:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 13841264 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B65B9D3C551 for ; Fri, 18 Oct 2024 06:48:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7A4A6B00B4; Fri, 18 Oct 2024 02:48:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACFC06B00B6; Fri, 18 Oct 2024 02:48:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FEC36B00B5; Fri, 18 Oct 2024 02:48:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 63DA96B00B3 for ; Fri, 18 Oct 2024 02:48:16 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DE38D160584 for ; Fri, 18 Oct 2024 06:48:02 +0000 (UTC) X-FDA: 82685793288.03.27A7B06 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by imf24.hostedemail.com (Postfix) with ESMTP id B6F1C180008 for ; Fri, 18 Oct 2024 06:48:11 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dYo+kkKn; spf=pass (imf24.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729233947; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/NINXGQBvJUBNLK4hx4Evqu0NzhoLpwSPoTRKM8ufRk=; b=3Ww2OKLbohlSq6YEuieVCIBAGAUcXPSqICQGb8zx2jNKgubxGsVUTCGMlxH469jh0Svdhj lfioLubt2M502AyZ/4pVBBCjdKnrc7ZdvrpwYno7IEToEj+aBZzvfZHFoFWRfWC5poX9XL +yzKcvDyE46dyHrD8w9X0v1mdOkDwBI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729233947; a=rsa-sha256; cv=none; b=Cm90Vgr4NGo7FWMfaG4amS3Sj/Q/nnWskrZ2q2a3Zo+YMnTP1b1eEvwhf1NkuHCURX3sIe ZeiuoAQecl0/C/u1IP99sWJm/kRKSofY/feYCnDEc8iObY7fFSgHLLxlGND0fxKW8RY/95 jGz4IX44u2IQwYr2JavdA2/st58LP2U= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dYo+kkKn; spf=pass (imf24.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729234094; x=1760770094; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EVMRjfiuNxd9amTtVRjIPzNA0R9fg2+OAdURffS6kSM=; b=dYo+kkKn3M8MHJ6MvzrEQvUPy/GHEJ6msi158lOoFYJbJrsORfhayuHd s65YcMmEdSPt5xh9k8w1AsTrZaD45+cozQSu/DRlNzHe5U+Tech4321eP INULkOwRRvYQ+60E7cqRCioV2bpW7zS56Y1FgL1fM4UoJ56on4tdQ4aoT 55RFF9oOWX4MAdeUT7d33oLw/FHZkCIlMkyRgkuW/DnINaWd7RkPLwl/B WFu9N/KQbbRrftQyVrXux3TKX9Ynk2BZXza6nZapd+IY3ObRfJ050wxTG gK4Wxn0mWAc+b9J147X+ahVeVVX6lBcO4etrG1lj6MpMUnTbAI3Om/kPP A==; X-CSE-ConnectionGUID: drZX2FK0QuGkokVud0S/Sg== X-CSE-MsgGUID: g5anUPF4SXq7B4zfsNY83A== X-IronPort-AV: E=McAfee;i="6700,10204,11228"; a="28963364" X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="28963364" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Oct 2024 23:48:09 -0700 X-CSE-ConnectionGUID: bK28v/+wSFW12salxHZpOw== X-CSE-MsgGUID: 2L41HZYESMSBem9+UQfdBw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,212,1725346800"; d="scan'208";a="82744521" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.6]) by fmviesa003.fm.intel.com with ESMTP; 17 Oct 2024 23:48:08 -0700 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, bfoster@redhat.com, dchinner@redhat.com, chrisl@kernel.org, david@redhat.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [RFC PATCH v1 5/7] mm: swap, zswap: zswap folio_batch processing with IAA decompression batching. Date: Thu, 17 Oct 2024 23:48:03 -0700 Message-Id: <20241018064805.336490-6-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> References: <20241018064805.336490-1-kanchana.p.sridhar@intel.com> MIME-Version: 1.0 X-Stat-Signature: w97ogr4e6r8o4e53w14wtcutf7ttfpdx X-Rspamd-Queue-Id: B6F1C180008 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729234091-870487 X-HE-Meta: U2FsdGVkX1/olRTK1/WvrOB7r2XFuLiEcXDFHjoSwbn80YuX5sphDxtD6WbGO/H1qqB2yAgPeFxFwEVPLrURaFJQApW3Su6q4R8NoQ09e4kx8s1NsyF9SvMRsX1w90o/69cj0O8r8eABc7m1sZe8JMq6iEvRT/EUtIASyhVBhOmn07/+/npgVqAv/Pih53lrQAh3dDFiJUKgX16/keD9YQJAAMUMzDVer8/wixv9ZlKoZJjPPd5vTOT7ygtXMumVH2NtY6HsjTcq+fS12l8/XNLKhaPwquAYXpbLv6IDTYoT/pyh6oaducP9FO6InYf3niJGI2MpTQcIxAlR2Q2O9esNFLuVymIpamvAnffDCx8FHDgTLwPGZvT+S84rZfDEApB1OatfP7xgMbk4oPe0QBekiwKoJgRemhucqOZ9Q+yV1G+E2v2v2RO89eaMA0CJEbCsJntfi8oq3FY5RhXiFIuNiWA4mILYoPuUJGxtdv7QIDM4eX5DW98leGsYpZeG41tTdRrLFImipP8TbpYbw4A2EFnDbc1t6EQR+05wyeQ5jp0oeFGnR29Rw9tRqCxO8VuGyU3QXv9btjzyru/LDXSZiwqm3S+KE8NkQjgCJgPjn39BLwnOx7HbnJCYTiDMDH9gogdet08XAKeMt8SASYTTti+IDmLvKpjHjfFvsXMUnYuOxCQv3JM3LhogALnzhZG3ATV8FthuoLHtZd7/9f4ZsUAzBqvxLDXLkw+s//wrsrxSiE36thb3zZ2R2rl0IIugD2WEJkppFYwnzsMPM6n+MWxYeKxb9q1sCTeXhBDlFLlqpPv4AbP2Ug+NKBBkccsAwdhXLVpzCAyQ5Topo8uHIIEyXtOT0dZjK/r0wpZPtV2kfcjB1AHuQYlkreU/fAXRpTJaj38CxQ2VFjOIChP+3yShSJFoHuP80IOymcmeO2a5AwBxVnFLibOiTqNVA60UJrWEUsi4V8xa2W0 Cag4/9Gn alysb8JrRi5JYvPlP3eQ5F8dDIJo+1ydMndm9y+zr7WAwnLvip/qq795BYlqSStMPdJQBMM/+3H7ulqxjGAZinGwoMeRmCihmyvjg8R/6rsOuf2BdTVaDEomktfoQKwS3Uz7UoMzahAfy8oPhm5ioN3oPy5gig8cl0HKMKvstfeC8j6CjuqQoFelfsTTONxf7bXFqjFetEKbqPCceaAFXQPbi9LyUz6wzTnD9TJf8+rXw2Rs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch provides the functionality that processes a "zswap_batch" in which swap_read_folio() had previously stored swap entries found in zswap, for batched loading. The newly added zswap_finish_load_batch() API implements the main zswap load batching functionality. This makes use of the sub-batches of zswap_entry/xarray/page/source-length readily available from zswap_add_load_batch(). These sub-batch arrays are processed one at a time, until the entire zswap folio_batch has been loaded. The existing zswap_load() functionality of deleting zswap_entries for folios found in the swapcache, is preserved. Signed-off-by: Kanchana P Sridhar --- include/linux/zswap.h | 22 ++++++ mm/page_io.c | 35 +++++++++ mm/swap.h | 17 +++++ mm/zswap.c | 171 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 245 insertions(+) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 1d6de281f243..a0792c2b300a 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -110,6 +110,15 @@ struct zswap_store_pipeline_state { u8 nr_comp_pages; }; +/* Note: If SWAP_CRYPTO_SUB_BATCH_SIZE exceeds 256, change the u8 to u16. */ +struct zswap_load_sub_batch_state { + struct xarray **trees; + struct zswap_entry **entries; + struct page **pages; + unsigned int *slens; + u8 nr_decomp; +}; + bool zswap_store_batching_enabled(void); void __zswap_store_batch(struct swap_in_memory_cache_cb *simc); void __zswap_store_batch_single(struct swap_in_memory_cache_cb *simc); @@ -136,6 +145,14 @@ static inline bool zswap_add_load_batch( return false; } +void __zswap_finish_load_batch(struct zswap_decomp_batch *zd_batch); +static inline void zswap_finish_load_batch( + struct zswap_decomp_batch *zd_batch) +{ + if (zswap_load_batching_enabled()) + __zswap_finish_load_batch(zd_batch); +} + unsigned long zswap_total_pages(void); bool zswap_store(struct folio *folio); bool zswap_load(struct folio *folio); @@ -188,6 +205,11 @@ static inline bool zswap_add_load_batch( return false; } +static inline void zswap_finish_load_batch( + struct zswap_decomp_batch *zd_batch) +{ +} + static inline bool zswap_store(struct folio *folio) { return false; diff --git a/mm/page_io.c b/mm/page_io.c index 9750302d193b..aa83221318ef 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -816,6 +816,41 @@ bool swap_read_folio(struct folio *folio, struct swap_iocb **plug, return true; } +static void __swap_post_process_zswap_load_batch( + struct zswap_decomp_batch *zswap_batch) +{ + u8 i; + + for (i = 0; i < folio_batch_count(&zswap_batch->fbatch); ++i) { + struct folio *folio = zswap_batch->fbatch.folios[i]; + folio_unlock(folio); + } +} + +/* + * The swapin_readahead batching interface makes sure that the + * input zswap_batch consists of folios belonging to the same swap + * device type. + */ +void __swap_read_zswap_batch_unplug(struct zswap_decomp_batch *zswap_batch, + struct swap_iocb **splug) +{ + unsigned long pflags; + + if (!folio_batch_count(&zswap_batch->fbatch)) + return; + + psi_memstall_enter(&pflags); + delayacct_swapin_start(); + + /* Load the zswap batch. */ + zswap_finish_load_batch(zswap_batch); + __swap_post_process_zswap_load_batch(zswap_batch); + + psi_memstall_leave(&pflags); + delayacct_swapin_end(); +} + void __swap_read_unplug(struct swap_iocb *sio) { struct iov_iter from; diff --git a/mm/swap.h b/mm/swap.h index 310f99007fe6..2b82c8ed765c 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -125,6 +125,16 @@ struct swap_iocb; bool swap_read_folio(struct folio *folio, struct swap_iocb **plug, struct zswap_decomp_batch *zswap_batch, struct folio_batch *non_zswap_batch); +void __swap_read_zswap_batch_unplug( + struct zswap_decomp_batch *zswap_batch, + struct swap_iocb **splug); +static inline void swap_read_zswap_batch_unplug( + struct zswap_decomp_batch *zswap_batch, + struct swap_iocb **splug) +{ + if (likely(zswap_batch)) + __swap_read_zswap_batch_unplug(zswap_batch, splug); +} void __swap_read_unplug(struct swap_iocb *plug); static inline void swap_read_unplug(struct swap_iocb *plug) { @@ -268,6 +278,13 @@ static inline bool swap_read_folio(struct folio *folio, struct swap_iocb **plug, { return false; } + +static inline void swap_read_zswap_batch_unplug( + struct zswap_decomp_batch *zswap_batch, + struct swap_iocb **splug) +{ +} + static inline void swap_write_unplug(struct swap_iocb *sio) { } diff --git a/mm/zswap.c b/mm/zswap.c index 1d293f95d525..39bf7d8810e9 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -35,6 +35,7 @@ #include #include #include +#include #include "swap.h" #include "internal.h" @@ -2401,6 +2402,176 @@ bool __zswap_add_load_batch(struct zswap_decomp_batch *zd_batch, return true; } +static __always_inline void zswap_load_sub_batch_init( + struct zswap_decomp_batch *zd_batch, + unsigned int sb, + struct zswap_load_sub_batch_state *zls) +{ + zls->trees = zd_batch->trees[sb]; + zls->entries = zd_batch->entries[sb]; + zls->pages = zd_batch->pages[sb]; + zls->slens = zd_batch->slens[sb]; + zls->nr_decomp = zd_batch->nr_decomp[sb]; +} + +static void zswap_load_map_sources( + struct zswap_load_sub_batch_state *zls, + u8 *srcs[]) +{ + u8 i; + + for (i = 0; i < zls->nr_decomp; ++i) { + struct zswap_entry *entry = zls->entries[i]; + struct zpool *zpool = entry->pool->zpool; + u8 *buf = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); + memcpy(srcs[i], buf, entry->length); + zpool_unmap_handle(zpool, entry->handle); + } +} + +static void zswap_decompress_batch( + struct zswap_load_sub_batch_state *zls, + u8 *srcs[], + int decomp_errors[]) +{ + struct crypto_acomp_ctx *acomp_ctx; + + acomp_ctx = raw_cpu_ptr(zls->entries[0]->pool->acomp_ctx); + + swap_crypto_acomp_decompress_batch( + srcs, + zls->pages, + zls->slens, + decomp_errors, + zls->nr_decomp, + acomp_ctx); +} + +static void zswap_load_batch_updates( + struct zswap_decomp_batch *zd_batch, + unsigned int sb, + struct zswap_load_sub_batch_state *zls, + int decomp_errors[]) +{ + unsigned int j; + u8 i; + + for (i = 0; i < zls->nr_decomp; ++i) { + j = (sb * SWAP_CRYPTO_SUB_BATCH_SIZE) + i; + struct folio *folio = zd_batch->fbatch.folios[j]; + struct zswap_entry *entry = zls->entries[i]; + + BUG_ON(decomp_errors[i]); + count_vm_event(ZSWPIN); + if (entry->objcg) + count_objcg_events(entry->objcg, ZSWPIN, 1); + + if (zd_batch->swapcache[j]) { + zswap_entry_free(entry); + folio_mark_dirty(folio); + } + + folio_mark_uptodate(folio); + } +} + +static void zswap_load_decomp_batch( + struct zswap_decomp_batch *zd_batch, + unsigned int sb, + struct zswap_load_sub_batch_state *zls) +{ + int decomp_errors[SWAP_CRYPTO_SUB_BATCH_SIZE]; + struct crypto_acomp_ctx *acomp_ctx; + + acomp_ctx = raw_cpu_ptr(zls->entries[0]->pool->acomp_ctx); + mutex_lock(&acomp_ctx->mutex); + + zswap_load_map_sources(zls, acomp_ctx->buffer); + + zswap_decompress_batch(zls, acomp_ctx->buffer, decomp_errors); + + mutex_unlock(&acomp_ctx->mutex); + + zswap_load_batch_updates(zd_batch, sb, zls, decomp_errors); +} + +static void zswap_load_start_accounting( + struct zswap_decomp_batch *zd_batch, + unsigned int sb, + struct zswap_load_sub_batch_state *zls, + bool workingset[], + bool in_thrashing[]) +{ + unsigned int j; + u8 i; + + for (i = 0; i < zls->nr_decomp; ++i) { + j = (sb * SWAP_CRYPTO_SUB_BATCH_SIZE) + i; + struct folio *folio = zd_batch->fbatch.folios[j]; + workingset[i] = folio_test_workingset(folio); + if (workingset[i]) + delayacct_thrashing_start(&in_thrashing[i]); + } +} + +static void zswap_load_end_accounting( + struct zswap_decomp_batch *zd_batch, + struct zswap_load_sub_batch_state *zls, + bool workingset[], + bool in_thrashing[]) +{ + u8 i; + + for (i = 0; i < zls->nr_decomp; ++i) + if (workingset[i]) + delayacct_thrashing_end(&in_thrashing[i]); +} + +/* + * All entries in a zd_batch belong to the same swap device. + */ +void __zswap_finish_load_batch(struct zswap_decomp_batch *zd_batch) +{ + struct zswap_load_sub_batch_state zls; + unsigned int nr_folios = folio_batch_count(&zd_batch->fbatch); + unsigned int nr_sb = DIV_ROUND_UP(nr_folios, SWAP_CRYPTO_SUB_BATCH_SIZE); + unsigned int sb; + + /* + * Process the zd_batch in sub-batches of + * SWAP_CRYPTO_SUB_BATCH_SIZE. + */ + for (sb = 0; sb < nr_sb; ++sb) { + bool workingset[SWAP_CRYPTO_SUB_BATCH_SIZE]; + bool in_thrashing[SWAP_CRYPTO_SUB_BATCH_SIZE]; + + zswap_load_sub_batch_init(zd_batch, sb, &zls); + + zswap_load_start_accounting(zd_batch, sb, &zls, + workingset, in_thrashing); + + /* Decompress the batch. */ + if (zls.nr_decomp) + zswap_load_decomp_batch(zd_batch, sb, &zls); + + /* + * Should we free zswap_entries, as in zswap_load(): + * With the new swapin_readahead batching interface, + * all prefetch entries are read into the swapcache. + * Freeing the zswap entries here causes segfaults, + * most probably because a page-fault occured while + * the buffer was being decompressed. + * Allowing the regular folio_free_swap() sequence + * in do_swap_page() appears to keep things stable + * without duplicated zswap-swapcache memory, as far + * as I can tell from my testing. + */ + + zswap_load_end_accounting(zd_batch, &zls, + workingset, in_thrashing); + } +} + void zswap_invalidate(swp_entry_t swp) { pgoff_t offset = swp_offset(swp);