From patchwork Tue Oct 17 00:35:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13424277 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DDB3CDB474 for ; Tue, 17 Oct 2023 00:35:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E6F098D00DD; Mon, 16 Oct 2023 20:35:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E1F298D00B8; Mon, 16 Oct 2023 20:35:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE7138D00DD; Mon, 16 Oct 2023 20:35:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BF9A58D00B8 for ; Mon, 16 Oct 2023 20:35:24 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8D116A0C9D for ; Tue, 17 Oct 2023 00:35:24 +0000 (UTC) X-FDA: 81353084568.04.BE25D2A Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf23.hostedemail.com (Postfix) with ESMTP id BCC9D140015 for ; Tue, 17 Oct 2023 00:35:21 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HzDbGK2b; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697502921; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TAwiYt7jZ0jTSch20EJdCPnumHKLIIb6QfY26Jbdiok=; b=L5uJGKnAy8FQw330GM2Z29YVmJpngidDb6nAEsDeJbPUjICpn5CAr+4KlqxuLKbvCc7Y7e vVHS/aT1XWof8jBaKVzs+8nDu5hgCmsbNqtB7pgPt+WkI5o0N1rneM1L7iDBnVOFSKRicz 237acwAyj8aGolbDuSaoFsybO46Q9xA= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HzDbGK2b; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697502921; a=rsa-sha256; cv=none; b=0w7P8fdeef/LQ2jDjCoRT2ngiTX4Jf9PalSQFDMfvdQUOOaup8osBtbB0fephjNwj8/u03 HImseOjq5uJNolOeZkOncWpT/Kd6jg+Wn/43USB5KldzLQNlzWXAf3AyOaUiKnsFR/PCzF agUSHS77uHfNFcZfpk8JZMI2VuyN8Vw= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1c9c5a1b87bso34532835ad.3 for ; Mon, 16 Oct 2023 17:35:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697502921; x=1698107721; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TAwiYt7jZ0jTSch20EJdCPnumHKLIIb6QfY26Jbdiok=; b=HzDbGK2buLf/Vs8nQriwNNifbXGp/J7FITrPGgmqTiq986HDEpTxpT7TuVXzWZnmVF dCOKDzUaKvaQdY3Mpm+NUDyl6TP9I0HJCyax+4pgKbjswvpqTGMfBllJ0XmrnN0uskmj rse8cLtwqht2ZECgIA7OkI+hvU39vCmZWdqEYgaHH58lVPnETctBBezLAg9GuNVzJ9Jc a06oa0YX1wRlZVKIquDAWJzROYx+Sum20i2eFiRJwSGwbunYm5o5WLFQD+rc4RrMzVdJ ZeVEkgQ2xiStXJCUmOKDmVyWFf2X9wYiIt7yMZOlvI6OhfRKeo2f54vY9DhdQ4kUjjbM 4A8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697502921; x=1698107721; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TAwiYt7jZ0jTSch20EJdCPnumHKLIIb6QfY26Jbdiok=; b=tYgRsq54Enp8l/bkK3zV0tTDRRQBLDDs/unAwzc4KNJm6qLTIXCMLdwzCJ4xWnH+wZ FIfWgSnQBp3tv71dzZFuo9h/u2XW3qMI/UMYjKjAXne9wWgX3xveM+d8Zo6aZPI+igQo +qG7lZFwiyj66VZHsKySVtubxjroB5ne+BdQN/e3RvY2AHmIA7dspq6jVKAAywnjW4ee KRGfwYP8aRvMoTwyRl95o0ANWYhZAZNX8VrZjmfAsL8L+16MF2rxod422/w7aHRYWVfZ vKoI8zUbcnsI73D07WucnMAQHtiW1qPCrINPi+l4HJefKFnCizGn+WhE8Nhuv45uDWL9 +cBA== X-Gm-Message-State: AOJu0YyRUtpEhjs/xwi+Wc/3/ArZdgfpQ1N+UDhE7rSdRvN/LhjI1wwl pF8YQ9ND9HPds2q7LR/7HOs= X-Google-Smtp-Source: AGHT+IHetkovzeS4CXVvYSqkxsOsLDksqAtX486TpH5iBtoG7n3TfpIRVgIp+F/1WP2Fk4nkn72LRQ== X-Received: by 2002:a17:903:23cf:b0:1ca:4ad7:6878 with SMTP id o15-20020a17090323cf00b001ca4ad76878mr763231plh.8.1697502920631; Mon, 16 Oct 2023 17:35:20 -0700 (PDT) Received: from localhost (fwdproxy-prn-119.fbsv.net. [2a03:2880:ff:77::face:b00c]) by smtp.gmail.com with ESMTPSA id u10-20020a17090341ca00b001b7f40a8959sm201875ple.76.2023.10.16.17.35.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 17:35:20 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, david@ixit.cz Subject: [PATCH 1/2] swap: allows swap bypassing on zswap store failure Date: Mon, 16 Oct 2023 17:35:18 -0700 Message-Id: <20231017003519.1426574-2-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231017003519.1426574-1-nphamcs@gmail.com> References: <20231017003519.1426574-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: BCC9D140015 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 8q974disx1daf1r839wwyzo1puu3pwon X-HE-Tag: 1697502921-724119 X-HE-Meta: U2FsdGVkX1+JuwvGGPJl5Yc0P/CoqYwsCU6s+d5squJtWnXzcKyKSarn7/ytUmopDEK6vT2VB27PnWcuXJ/Rj1mgsz5gRN74ZlUGK5qZ/Dg7MDYNHIoYgcGM7xQB3DhuHGNh5EePHpcxXWwkuoLkwRUQ9ANVDDkV4q9FNzwh7vPy0lIkKTF/l0ykXBBKfDMfhuSUaWPvthLqwnkov0Tx995Nz1y4lmR0VjMw9jLCEBBFJoSuXgA+G3Wni6VkJYUIh5381qO5LSEeS25bayurXQK1sVFajoJjd6bHD6415dqeWZmp0XRoenJ8bf8piNTiESkuj7ADWp2e7eU0NCGKIxY14d3YqqodbS9bDXfXeJoK7+9edqRjXcMC4a1gOOplAOxU5TAtkyHRbUU4lPpOOoUyr76CEKFOQIOg6x2WXC2k4UlL1guPyFUmcYqpZjPM43dEuK8AP3JVe7wcAuuEvADpkPh9Cq14HNX8iWTey1Yf9BgQzBy/IZ/1PClqY4PLHFOedbDICgCN8NVhcN4s95YPMp6aLootsJbiS+f0TYymcvxDBASq76b2rG1TlLMVIXiBxFxXNWi7oX3YpjcfqNVUWlVHhUeoTNiNDS19FRZQu8YI/MAPDNPXgBfePXBF88cypx5dGNyN8D2t89HVI5lNm8X2wP8ISojmMKKOtWaR5jaAcF4HonZP1apXvTXYi5jyuh3Uxldlc0CkKvbm/ifU3yuIxN3hu0zK4jLUxy96O/g/blFDJTOcyitqh0W/TeooKsC+zzi0HMGKV6Gpr3vL76gxSW/Xkc7HVAyJRtXu6nPdQn0wBpDM/iZ84zK6cCePbzYnuq4vPSfcTrM1Ge0RaBNK4IkTuNq+CguOiZeQysHLM0wNRd9eI7Y4bQXfm1/1N7gA4a8ewpIXamESzQj3NZcp3tIKyyzTPGgN/F+MxYhR1CwhFmqyaNiGHgnVwBmjOQ+YRAiV/0RhD7e A7qwK1qa 5M+XnYo3SZ6yTe+J9Hg4Ebnv2BoW1bOxkUUS6eZljn7KxENIGDKZ6RHiI9lg337iYJprX+dMYuX2+PfKVvJS0c32ABRMzB5+VdK35giy/2l5MHQMm/wz8FlksIupt0VTi6qRvogTeRFXv35zTbqxVhcEqxKnD+0nn2K2VHcLLkPP7PBD3U2Y9b/aGQ0KiBNmkgD16Y5FaxtE7yWDFuPlA0ZHLO8AypqoQCWBR/hsxkJAj2VfuKIZdt8DQAlDo8wqlwd68DVxzju2N8qrW+oaMFZxJgs1+oc5SinptvbLpLV8bvSvkses8txDGru5osB3L6NA5xqiqmoZqY2Gm02NNzwGZBMLXVgtfNMcYV8NOlTI7IzkEE3twNmU4Zbx9RcQDhKJtfuYLXjT6hig11KJG1pQaS79FMFaUUHDUmBdm0nAC7Wu4hC5C+fC8JPhfxbe0kNPSAYOgg+Lqlv6RrTIwMEQCrJzs0R2SKJtTJ2sIIatVdA8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: During our experiment with zswap, we sometimes observe swap IOs even though the zswap pool limit is never hit. This is due to occasional zswap store failures, in which case the page will be written straight to the swapping device. This prevents many users who cannot tolerate swapping from adopting zswap to save memory where possible. This patch adds the option to bypass swap when a zswap store fails. The feature is disabled by default (to preserve the existing behavior), and can be enabled via a new zswap module parameter. When enabled, swapping is all but prevented (except for when the zswap pool is full and have to write pages back to swap). Suggested-by: Johannes Weiner Signed-off-by: Nhat Pham --- Documentation/admin-guide/mm/zswap.rst | 9 +++++++++ include/linux/zswap.h | 9 +++++++++ mm/page_io.c | 6 ++++++ mm/shmem.c | 8 ++++++-- mm/zswap.c | 4 ++++ 5 files changed, 34 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst index ae8597a67804..82fa8148a65a 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -153,6 +153,15 @@ attribute, e. g.:: Setting this parameter to 100 will disable the hysteresis. +Many users cannot tolerate the swapping that comes with zswap store failures, +due to the IO incurred if these pages are needed later on. In this scenario, +users can bypass swapping when zswap store attempts fail (and keep the pages +in memory) as follows: + + echo Y > /sys/module/zswap/parameters/bypass_swap_when_store_fail_enabled + +Note that swapping due to writeback is not disabled with this option. + When there is a sizable amount of cold memory residing in the zswap pool, it can be advantageous to proactively write these cold pages to swap and reclaim the memory for other use cases. By default, the zswap shrinker is disabled. diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 04f80b64a09b..c67da5223894 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -7,6 +7,7 @@ extern u64 zswap_pool_total_size; extern atomic_t zswap_stored_pages; +extern bool zswap_bypass_swap_when_store_fail_enabled; #ifdef CONFIG_ZSWAP @@ -18,6 +19,10 @@ void zswap_swapoff(int type); bool zswap_remove_swpentry_from_lru(swp_entry_t swpentry); void zswap_insert_swpentry_into_lru(swp_entry_t swpentry); +static inline bool zswap_bypass_swap_when_store_fail(void) +{ + return zswap_bypass_swap_when_store_fail_enabled; +} #else static inline bool zswap_store(struct folio *folio) @@ -41,6 +46,10 @@ static inline bool zswap_remove_swpentry_from_lru(swp_entry_t swpentry) static inline void zswap_insert_swpentry_into_lru(swp_entry_t swpentry) {} +static inline bool zswap_bypass_swap_when_store_fail(void) +{ + return false; +} #endif #endif /* _LINUX_ZSWAP_H */ diff --git a/mm/page_io.c b/mm/page_io.c index cb559ae324c6..482f56d27bcd 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -201,6 +201,12 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) folio_end_writeback(folio); return 0; } + + if (zswap_bypass_swap_when_store_fail()) { + folio_mark_dirty(folio); + return AOP_WRITEPAGE_ACTIVATE; + } + __swap_writepage(&folio->page, wbc); return 0; } diff --git a/mm/shmem.c b/mm/shmem.c index 6503910b0f54..8614d7fbe18c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1514,8 +1514,12 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) mutex_unlock(&shmem_swaplist_mutex); BUG_ON(folio_mapped(folio)); - swap_writepage(&folio->page, wbc); - return 0; + /* + * Seeing AOP_WRITEPAGE_ACTIVATE here indicates swapping is disabled on + * zswap store failure. Note that in that case the folio is already + * re-marked dirty by swap_writepage() + */ + return swap_writepage(&folio->page, wbc); } mutex_unlock(&shmem_swaplist_mutex); diff --git a/mm/zswap.c b/mm/zswap.c index d545516fb5de..db2674548670 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -138,6 +138,10 @@ static bool zswap_non_same_filled_pages_enabled = true; module_param_named(non_same_filled_pages_enabled, zswap_non_same_filled_pages_enabled, bool, 0644); +bool zswap_bypass_swap_when_store_fail_enabled; +module_param_named(bypass_swap_when_store_fail_enabled, + zswap_bypass_swap_when_store_fail_enabled, bool, 0644); + static bool zswap_exclusive_loads_enabled = IS_ENABLED( CONFIG_ZSWAP_EXCLUSIVE_LOADS_DEFAULT_ON); module_param_named(exclusive_loads, zswap_exclusive_loads_enabled, bool, 0644); From patchwork Tue Oct 17 00:35:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13424278 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 179C6CDB465 for ; Tue, 17 Oct 2023 00:35:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 436968D00B8; Mon, 16 Oct 2023 20:35:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3BF6A8D00DE; Mon, 16 Oct 2023 20:35:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EA568D00B8; Mon, 16 Oct 2023 20:35:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0925E8D00DE for ; Mon, 16 Oct 2023 20:35:25 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D23911A0CF6 for ; Tue, 17 Oct 2023 00:35:24 +0000 (UTC) X-FDA: 81353084568.05.7074B18 Received: from mail-oo1-f42.google.com (mail-oo1-f42.google.com [209.85.161.42]) by imf02.hostedemail.com (Postfix) with ESMTP id 0F27080005 for ; Tue, 17 Oct 2023 00:35:22 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=AgLweKJm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.161.42 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697502923; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=T3aXq8VH/Bzp0yHca9sQcxKTmb8CT7p/vnr3DuJfLn4=; b=r5CxZMegfjuOJBeguUfoMpWUHnt4YQ8pKxdM82xuYcPTnL3II7ObGVqaDgW9sOmDZjOeMK IEPvEuzXa87alu+pIWUHy72gweyELlSn5tBHLl9y9KJ7djWSDAqbVLFs9VR0U5qO8hatpa GxxON8PkIAwg096jB1zndlN/JDnqRaw= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=AgLweKJm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.161.42 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697502923; a=rsa-sha256; cv=none; b=xV0TcAoYDtRyOFkzqWWlkIwnlahU0cLQ/dcPPozpeWO/XgeaUkGgmAFYop5XzfCKT2xAjs qK7GtEbg3rjHkkRqQ9UUCblsIejdhnH7hTDyArgMXnaCT27NJcoJcGcutdgqwEyNW3zSaD vlI3df36JoZNNZN9JUEQVz89cx7USss= Received: by mail-oo1-f42.google.com with SMTP id 006d021491bc7-57f0f81b2aeso2846270eaf.3 for ; Mon, 16 Oct 2023 17:35:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697502922; x=1698107722; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=T3aXq8VH/Bzp0yHca9sQcxKTmb8CT7p/vnr3DuJfLn4=; b=AgLweKJmx9VxV9EvmrRP3WnGZKtmf1d5ZVPfQNxDOJ4ywP3yVAmR+gdQZMBO9gTaHr lODbYz+2KkhDbuJ5DjoUksxs3zOo2jKA9Rcsm/R2KGMqButEGZWC9U6ZUitbM79cY1XK HjBRyk8/ddV/Hva0bjCE+MceUMmjBlZ1ymYlyttKc06J42RzZUqBWgTACDRW3XmBs4F7 kRrL8e0ghEfe0GQbs3quNyL323LXyMZbWTM8VtoQyTV+p88rdstxlGt3XqToU7DeapN7 kUzCZ8Ozdy0VB/Nma4wSMMR/8/Wck+vNQOgCrN97YX7zAnsYyZlbhGa9DePFevDDemZE ukgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697502922; x=1698107722; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T3aXq8VH/Bzp0yHca9sQcxKTmb8CT7p/vnr3DuJfLn4=; b=wjE+zVUcFf0NrA5YRW3rwG9ySwVo54sX7LAFjbBzI0tLaIQoL+lcUiaLpjWnfkur9d bu/0/us6Otqd/9o7SMp45Pe3MOH0HxIKNdylME3HOGnsK5Xre+oT+XOBtVcltgcsQ9kN JRXyFtHnqcnDxafrw3yofqqPS3qNJ1V8yc/CftsMojxpjv/GQ49DDrjy/3CsE3oBT32i K3vGbTnHePZMMES+Qg9u4pLF45gq7bSV4e95y3lFb/UOVejvcx2yRhoeCTmeemzxqOgF L1Lm9ZZUEU5jX9GPVtnEiCs6eDxnzk7RGNWT2VgsS3mRbg1aoL1dM/wNBW6Uq2trBcH+ 2eYA== X-Gm-Message-State: AOJu0Yza+id6FU1V1dk23WhYB5FNJeLIJZkYYHoy0kUKrpd/cD+S8fsu 3bHLgc+WyCyVz2QatgusavA= X-Google-Smtp-Source: AGHT+IFFT4a1HqMInXLkTxCxEdrwYD+on0yB5b7MmegJ8fvbhPx0zC2Bva9YyeHxZ1JjI5ENbjyh4w== X-Received: by 2002:a05:6359:2c45:b0:139:d5f0:c027 with SMTP id qv5-20020a0563592c4500b00139d5f0c027mr862671rwb.30.1697502921618; Mon, 16 Oct 2023 17:35:21 -0700 (PDT) Received: from localhost (fwdproxy-prn-119.fbsv.net. [2a03:2880:ff:77::face:b00c]) by smtp.gmail.com with ESMTPSA id p15-20020aa7860f000000b00690cd981652sm157411pfn.61.2023.10.16.17.35.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 17:35:21 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, david@ixit.cz Subject: [PATCH 2/2] zswap: store uncompressed pages when compression algorithm fails Date: Mon, 16 Oct 2023 17:35:19 -0700 Message-Id: <20231017003519.1426574-3-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231017003519.1426574-1-nphamcs@gmail.com> References: <20231017003519.1426574-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 0F27080005 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: mdcx6dteun6dp94wj4h449tnwug6ocp6 X-HE-Tag: 1697502922-858785 X-HE-Meta: U2FsdGVkX19Czqe5Dtwri+FH6Sk85YqbSNmmIfpZ7AsDXhuc2a+RInQekbesSQdPM4YNwYVHDhY52jJQtSndflafCkv0IBr8UkLx+fXP327gwhl9OeL40DzGHN188wwhwOGs0utYR0ew6tBwbKoSVArJolEi6Vgx+rLNHZKHitIvVJCJ3E2sQflEZkgOcxy67cMppZRfIQf0I8SJTUxSadBaUKkrsgo+1C5oan+dGV3vlBno92vYM+D+rizB258cZp1fUHWra7wyVx0lnDbLZHSQl54bGQomMpiXHOMMIC4tpCrASZ9haVAMiqQC6IZq+KB0ImQuuCCfOP2BtFgec3b/c+pFvEF+SipeTgYD5vHOHMRjcmCxSos8+W5QU33VAG/mxTcODRPakxalpA/l8Yc0B2kj/hyBM29VjRHThy/6F5EG/eiH0Sk5lpewezMqw92Xvh0qAO5XlXKGVZ+Eu+4SaMcFffm3PSVEMSgp0FgbTad+zgdqPovVYft1I1TgBlR6imDtc5ojXdy8yDt4/Y8GNXmWPJoBSvPn0dENAeGeV2kjBIbeADCjIztoOffcZd09H2sMDhnkWMFLkYa3rhDxqc3ynRA7sa1OPHMfxyxJifjPZAFq6c5MnAzRJdjCzH5lMz+ZXAzrDFbKtr9fGMMXHeKqT1cYfSuXrXJsVxzVKZeREo9FyEdEWeR0D8GmE3OMO1hu2cXPHEeKfAGUiwh3n5OL+Y9zhPUamcft0m/EGRNBsMdjoEGPKtwmTCJNyxnr/DkLpV3iYnyO9fZcKpwMWAOQoGr4tXGluhP6CJmGrtT9EOXSxMvp9PsV+R7pxb4rleAn7I20drRCz1tYVdRZpYtpEvXeEdc03J+iVeiUWxs5aMbnyXyjtJBYlGTmUR1gXuS0WvOCBp9F84gssEtsmU5IXfkJGefCEDvlcdSFgjesyPZ5kZ7QXY5TpUo0hR7t1PiWlQ53DqqUgNj GceclDTT 8WPyXFwLcVoTP6hdD0hgKzpRU/pudSgCpxAmw42K0h4EQnAjlQIvNfS0k0bK3a4JtDvNShX4k0zlwIhtSzA3m9kNJ7sJo1UAsjXQbX6NXF/Qza/D92VQPaKwULB2q846KpiiS+QGqPLhUq1f3DbVOxFcYl9xCuV2v1B30g4m3pLgtM9Meneqz6A14NeDRr+uoBE2fIMk62QKLDW3tpsHhV0RpcWdywuPQf6KJG/5x6N5vQX4igS7xDH6LBLvb5SDmiJo4v5jF9+5w9ynLR0jqx9bWbn510aW2np7ED5k+u8xiYZUvaqzzaeWCHM4sss2Ky+2pb5LPCceWlEqkkKcwISEG09M5rRrC+pnMSNzIvdILmbuqFdM+k7o/gGT/7CdcxIwlXT5YbWbyQK5ngv2621ampPm2wUd9xi9/9VGTBCzxxN0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In a zswap store, the compression step could fail, leaving the pages out of the zswap pool. This could happen, for instance, when the pages hold random data (such as /dev/urandom). This behavior can potentially create unnecessary LRU inversion: warm but incompressible pages are swapped out, whereas cold compressible pages reside in the zswap pool. This could incur IO later when the warmer pages are indeed required again. We can instead just store the original page in the zswap pool. Storing PAGE_SIZE objects is already supported by zsmalloc, zswap's default backend allocator. When these pages are needed (either for a zswap load or for writeback), we can simply copy the original content into the desired buffer. This patch allows zswap to store uncompressed pages. The feature can be enabled at runtime via a new parameter. In addition, we add a counter to track the number of times the compression step fails for diagnostics. Signed-off-by: Nhat Pham --- Documentation/admin-guide/mm/zswap.rst | 7 +++ mm/zswap.c | 60 +++++++++++++++++++++++--- 2 files changed, 61 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst index 82fa8148a65a..ac1018cb5373 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -162,6 +162,13 @@ in memory) as follows: Note that swapping due to writeback is not disabled with this option. +Compression could also fail during a zswap store attempt. In many cases, it is +nevertheless beneficial to store the page in the zswap pool (in its +uncompressed form) for the sake of maintaining the LRU ordering, which will +be useful for reclaim. This can be enabled as follows: + + echo Y > /sys/module/zswap/parameters/uncompressed_pages_enabled + When there is a sizable amount of cold memory residing in the zswap pool, it can be advantageous to proactively write these cold pages to swap and reclaim the memory for other use cases. By default, the zswap shrinker is disabled. diff --git a/mm/zswap.c b/mm/zswap.c index db2674548670..096266d35602 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -62,6 +62,8 @@ static u64 zswap_pool_limit_hit; static u64 zswap_written_back_pages; /* Store failed due to a reclaim failure after pool limit was reached */ static u64 zswap_reject_reclaim_fail; +/* Compression step fails during store attempt */ +static u64 zswap_reject_compress_fail; /* Compressed page was too big for the allocator to (optimally) store */ static u64 zswap_reject_compress_poor; /* Store failed because underlying allocator could not get memory */ @@ -142,6 +144,10 @@ bool zswap_bypass_swap_when_store_fail_enabled; module_param_named(bypass_swap_when_store_fail_enabled, zswap_bypass_swap_when_store_fail_enabled, bool, 0644); +static bool zswap_uncompressed_pages_enabled; +module_param_named(uncompressed_pages_enabled, + zswap_uncompressed_pages_enabled, bool, 0644); + static bool zswap_exclusive_loads_enabled = IS_ENABLED( CONFIG_ZSWAP_EXCLUSIVE_LOADS_DEFAULT_ON); module_param_named(exclusive_loads, zswap_exclusive_loads_enabled, bool, 0644); @@ -224,6 +230,7 @@ struct zswap_pool { * value - value of the same-value filled pages which have same content * objcg - the obj_cgroup that the compressed memory is charged to * lru - handle to the pool's lru used to evict pages. + * is_uncompressed - whether the page is stored in its uncompressed form. */ struct zswap_entry { struct rb_node rbnode; @@ -238,6 +245,7 @@ struct zswap_entry { struct obj_cgroup *objcg; int nid; struct list_head lru; + bool is_uncompressed; }; /* @@ -1307,7 +1315,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct crypto_acomp_ctx *acomp_ctx; struct zpool *pool = zswap_find_zpool(entry); bool page_was_allocated; - u8 *src, *tmp = NULL; + u8 *src, *dst, *tmp = NULL; unsigned int dlen; int ret; struct writeback_control wbc = { @@ -1356,6 +1364,19 @@ static int zswap_writeback_entry(struct zswap_entry *entry, dlen = PAGE_SIZE; src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); + if (entry->is_uncompressed) { + if (!zpool_can_sleep_mapped(pool)) + kfree(tmp); + + dst = kmap_local_page(page); + copy_page(dst, src); + kunmap_local(dst); + zpool_unmap_handle(pool, entry->handle); + + ret = 0; + goto success; + } + if (!zpool_can_sleep_mapped(pool)) { memcpy(tmp, src, entry->length); src = tmp; @@ -1376,6 +1397,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, else zpool_unmap_handle(pool, entry->handle); +success: BUG_ON(ret); BUG_ON(dlen != PAGE_SIZE); @@ -1454,7 +1476,7 @@ bool zswap_store(struct folio *folio) char *buf; u8 *src, *dst; gfp_t gfp; - int ret; + int ret, compress_ret; VM_WARN_ON_ONCE(!folio_test_locked(folio)); VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); @@ -1569,11 +1591,15 @@ bool zswap_store(struct folio *folio) * but in different threads running on different cpu, we have different * acomp instance, so multiple threads can do (de)compression in parallel. */ - ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait); + compress_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), + &acomp_ctx->wait); dlen = acomp_ctx->req->dlen; - if (ret) - goto put_dstmem; + if (compress_ret) { + zswap_reject_compress_fail++; + if (!zswap_uncompressed_pages_enabled) + goto put_dstmem; + } /* store */ zpool = zswap_find_zpool(entry); @@ -1590,7 +1616,15 @@ bool zswap_store(struct folio *folio) goto put_dstmem; } buf = zpool_map_handle(zpool, handle, ZPOOL_MM_WO); - memcpy(buf, dst, dlen); + + /* Compressor failed. Store the page in its uncompressed form. */ + if (compress_ret) { + dlen = PAGE_SIZE; + src = kmap_local_page(page); + copy_page(buf, src); + kunmap_local(src); + } else + memcpy(buf, dst, dlen); zpool_unmap_handle(zpool, handle); mutex_unlock(acomp_ctx->mutex); @@ -1598,6 +1632,7 @@ bool zswap_store(struct folio *folio) entry->swpentry = swp_entry(type, offset); entry->handle = handle; entry->length = dlen; + entry->is_uncompressed = compress_ret; insert_entry: entry->objcg = objcg; @@ -1687,6 +1722,17 @@ bool zswap_load(struct folio *folio) } zpool = zswap_find_zpool(entry); + + if (entry->is_uncompressed) { + src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); + dst = kmap_local_page(page); + copy_page(dst, src); + kunmap_local(dst); + zpool_unmap_handle(zpool, entry->handle); + ret = true; + goto stats; + } + if (!zpool_can_sleep_mapped(zpool)) { tmp = kmalloc(entry->length, GFP_KERNEL); if (!tmp) { @@ -1855,6 +1901,8 @@ static int zswap_debugfs_init(void) zswap_debugfs_root, &zswap_reject_alloc_fail); debugfs_create_u64("reject_kmemcache_fail", 0444, zswap_debugfs_root, &zswap_reject_kmemcache_fail); + debugfs_create_u64("reject_compress_fail", 0444, + zswap_debugfs_root, &zswap_reject_compress_fail); debugfs_create_u64("reject_compress_poor", 0444, zswap_debugfs_root, &zswap_reject_compress_poor); debugfs_create_u64("written_back_pages", 0444,