From patchwork Wed Oct 25 09:52:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhongkun He X-Patchwork-Id: 13435886 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C9A4C0032E for ; Wed, 25 Oct 2023 09:53:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04DF68D0003; Wed, 25 Oct 2023 05:53:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F3F3D8D0001; Wed, 25 Oct 2023 05:53:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E07138D0003; Wed, 25 Oct 2023 05:53:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D31C68D0001 for ; Wed, 25 Oct 2023 05:53:02 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A48D4B5E14 for ; Wed, 25 Oct 2023 09:53:02 +0000 (UTC) X-FDA: 81383520204.08.5DC463F Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf19.hostedemail.com (Postfix) with ESMTP id 9225C1A0007 for ; Wed, 25 Oct 2023 09:52:59 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=NucegCD9; spf=pass (imf19.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698227580; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=d+j017g3tatMpnGtHNzJ9Kcmrx8z0MmUnYgAG8PgHZs=; b=1jBqgqayXhNdQgnQWyDvpQxbl07LplapQw1URNkKBg7i5zYwgVn9kQXT+NKB4xgIWwQ8vj fMwPe+kFxVaqH0Uzvr0pXSZijffU8+kHeQ3Fp3J54RQw5PCPV0IBabyr4xwB6bRbDtfoP1 mO2rDxBTbMIYbTl68W3/CY8uMUuIVjI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698227580; a=rsa-sha256; cv=none; b=5HYpy+ZpS41nWHxW9pGsQxLk1PX8EuQRrHPnBvb4F0eYy5BP7cbNlcU7pATLMiLwbA116A Dh+ItsJH0ST+HmBKkpuvvpmiI8V7d2SOSD5+UyG8CIJ5gco73wiE4KnFcN61uswRkaesAU JraxCX6wfeKPvheRe0KfAWFV35lr9TM= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=NucegCD9; spf=pass (imf19.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-5ac87af634aso3466353a12.2 for ; Wed, 25 Oct 2023 02:52:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1698227578; x=1698832378; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=d+j017g3tatMpnGtHNzJ9Kcmrx8z0MmUnYgAG8PgHZs=; b=NucegCD9ForIq/0r4cvrii1j9n69pTfaMzRbtLFermUY+cWvCHo7/5+ZJTlULGd+ki eOJFigI41ekKcbtaGM+f49wUMFbdsiNywHKqHZPbx57UsshmKYLqdoBT8QeMtcdsEZfa GShGiHRHTSSNwzbwfdLeokPR5X4H8W0d3V3mS4SHx7FaKuLr3iED/G7nEezeWgwl0ium vzek6wmfsz2Sd6a9BkwxApNAoCfrl1+ErPRb3YVMTHvQeOZqczJczsZUF1gUCzTADL91 +Tp8FkBjw5YuC2aNzqdfyjLSKyarHB9asHcushxzaiD4Rz0dvhFuojM1Pz+s9RcyB4id aZnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698227578; x=1698832378; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=d+j017g3tatMpnGtHNzJ9Kcmrx8z0MmUnYgAG8PgHZs=; b=feS38+Wsx0F2OK+zF1lIV3xoZZ4gTvVKPvslCiD81c0Vl15I7EAbo2qw5SY1jRTS04 W+/RCQQAXKhzAlGcTs7eJlzMyz5BMUEKvlOd4joEGuxTcnN2agtJgPb7KyZxaOV5Evyg gsi/MsDF400tnn4c21rddM5bOiZK0iqtiFH1MWrTgPq/uH+lR8UfQ9MH//84tfFKp4un exWKdauw7zZBplkdQWvCnzYY+TDAxmEQ4Wgj+TayxSMf0dahRAAq4HWHbC6du8oKXYwc ghox7oeQywpDbfAUwkKufiGU4kWvRmTFPPPWDaqrOtNrxpYFAqtxZPxqOst3/YLHXhwz jzbA== X-Gm-Message-State: AOJu0Yw3VIlE0jsToipWXmEnZcdnRLjdgu+CnH1QrZKiYVv7qCxqeER/ Wk3TGcnANzCnIKYbNI2R+Z40vA== X-Google-Smtp-Source: AGHT+IEEsb0TI4qSA136epJAkV4h841JNYtA/h7V0hxBD9EC3BJdsZDZEL5KnMs9HpyniuDwII6/JA== X-Received: by 2002:a17:90b:51d0:b0:268:b0b:a084 with SMTP id sf16-20020a17090b51d000b002680b0ba084mr12471826pjb.46.1698227578332; Wed, 25 Oct 2023 02:52:58 -0700 (PDT) Received: from Tower.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id 63-20020a17090a09c500b00262eb0d141esm9267748pjo.28.2023.10.25.02.52.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 02:52:57 -0700 (PDT) From: Zhongkun He To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhongkun He Subject: [PATCH v2] zswap: add writeback_time_threshold interface to shrink zswap pool Date: Wed, 25 Oct 2023 17:52:48 +0800 Message-Id: <20231025095248.458789-1-hezhongkun.hzk@bytedance.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: 9225C1A0007 X-Rspam-User: X-Stat-Signature: 8c3bqbc5cfft9xb4x7kpguqqy4y8m9fk X-Rspamd-Server: rspam03 X-HE-Tag: 1698227579-280297 X-HE-Meta: U2FsdGVkX1/y0bZNhPLzs2LZ3CwNHEKsHCofevDAR4Up6a0d0g6tcrE9Xti8Pf7rI1lC+Ts27LUMcD7CQxKi21sOdSyF2Y+qhlUaItF5rLWjEQTaIa7D97XfYHdkZYS+qrd5uMQpa29/N43pFcxkw2FOuu2HAuj4GF7AJk5v4LS+VGVHSZKK9ZKmax1jlmFt4g0kTCxbRgmNRiaSTebdg2tD1cy0fIrxV2Fmyzl5Lw0xe3B8LC1bQa29qPdIch2/9DGjvzoTpXn13zOSPg8ibxl21XL3A4TovevnObevlMPpRwFB3iyK/kpdMQSvtvERHJx2KbQUTITqB/ccPJgEVkrXRd9+YhH5h/4zzZnt6Z9ETPD4tkF8Aefzx5ifCz7I3HJ4+ZqDZZONBmKGb3H9AXFj+zVQzoMle5fNw5W4rSld01DVXtAmbTLiOuxwOf6t2CnOCngSNQUx+onqxd8qPF0QT1IFNNzpF1o4Wrb7KTSyWRPOJFo/c2dFNqyWRPtDNEJffynDnPKe+uLvixaQmF1fYAyreWVuH1xnVfd3zD5KrEIJHfLBg1QRIm+bseQEKpbTbU96YGwcZ4P+M2hmNf7uEJyBuoT3v7E8kp4tSRr3GRHa2S3I7EoNT5nGIG4YtzMiSAc+FgmeQlgAuc5Bmd0Zgrf7FciR5OuBxXd+7UMnFpZGC+B/fW6+YXD9isFmHlFo8tW97ZSfNfYFYn4qeGxGLNYzNED3ER6DP7H60fnDBZvV6Fy0FXjoknrLUTfkz+e6wQU2KRRGnp+ldDmHvze9WHo76ovCce4GjVDxLhEzZXNjM/C5OiLO/nFzMQ0sbEFfNiJd8wR1xseyYYVK/LxNjMWFClhfMAv7kavumk9OzFAA2D+b1SJoqSjy3Cek5Nvz8gtviLmqMrmfIISyrLA9SrDWoifa1O2eRcm/u5vMsWNS7VhjOBLQ+WNSX2rn5+j55A4qG7aGw3NQYWF 9jVXFLdP E+96FVQRIKSF1pHSoCH9hUUT156PavY62PjZ9iWLDYf9ziHYlv76D9D3IUNhaOl57VI4FuirQxRqgif7jiRq2lXxL9Ecv3skI2/ZxokWTlsoIHhVc3cDZbOtCfr7grb/HmXmqD43XI2j7ewB/Eb6XDS87i0RuwJjBGRxvwCjaCBWDEQE2qvjGshEP9AnUG5btV4VyuG0AhXqUzENY0voyXjKO2UyWesdLSjMLZe6i9TuhOwH+bzEVuB3hJ/9n6GljDg79Rl4QLyWLNIKpbS4suKoveRA5LMVVwP6zzlp6ySUcEiT0UjjyTb3KT0iEqy5y5weBaof3ioQ4NpE1ggU7JeA0Rq7N5nLW4YnQFx8HsoetdwTKRhjjVdIW9p2brdAZv7vzv/RO1eIcBQL1TQVEK7i4qSRE/1c949j8fLVtAKFVDAhVj1kmiN8TS3GgbL2L+n1sIc8CWUSPEsakxrqj9ZZzhw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: zswap does not have a suitable method to select objects that have not been accessed for a long time, and just shrink the pool when the limit is hit. There is a high probability of wasting memory in zswap if the limit is too high. This patch add a new interface writeback_time_threshold to shrink zswap pool proactively based on the time threshold in second, e.g.:: echo 600 > /sys/module/zswap/parameters/writeback_time_threshold If zswap_entrys have not been accessed for more than 600 seconds, they will be swapout to swap. if set to 0, all of them will be swapout. This patch provides more control by specifying the time at which to start writing pages out. Signed-off-by: Zhongkun He --- v2: - rename sto_time to last_ac_time (suggested by Nhat Pham) - update the access time when a page is read (reported by Yosry Ahmed and Nhat Pham) - add config option (suggested by Yosry Ahmed) --- Documentation/admin-guide/mm/zswap.rst | 9 +++ mm/Kconfig | 11 +++ mm/zswap.c | 104 +++++++++++++++++++++++++ 3 files changed, 124 insertions(+) diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst index 45b98390e938..7aec245f89b4 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -153,6 +153,15 @@ attribute, e. g.:: Setting this parameter to 100 will disable the hysteresis. +When there is a lot of cold memory according to the last accessed time in the +zswap, it can be swapout and save memory in userspace proactively. User can +write writeback time threshold in second to enable it, e.g.:: + + echo 600 > /sys/module/zswap/parameters/writeback_time_threshold + +If zswap_entrys have not been accessed for more than 600 seconds, they will be +swapout. if set to 0, all of them will be swapout. + A debugfs interface is provided for various statistic about pool size, number of pages stored, same-value filled pages and various counters for the reasons pages are rejected. diff --git a/mm/Kconfig b/mm/Kconfig index 89971a894b60..426358d2050b 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -61,6 +61,17 @@ config ZSWAP_EXCLUSIVE_LOADS_DEFAULT_ON The cost is that if the page was never dirtied and needs to be swapped out again, it will be re-compressed. +config ZSWAP_WRITEBACK_TIME_ON + bool "writeback zswap based on the last accessed time" + depends on ZSWAP + default n + help + If selected, the feature for tracking last accessed time will be + enabled at boot, otherwise it will be disabled. + + The zswap can be swapout and save memory in userspace proactively + by writing writeback_time_threshold in second. + choice prompt "Default compressor" depends on ZSWAP diff --git a/mm/zswap.c b/mm/zswap.c index 0c5ca896edf2..331ee276afbd 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -144,6 +144,19 @@ static bool zswap_exclusive_loads_enabled = IS_ENABLED( CONFIG_ZSWAP_EXCLUSIVE_LOADS_DEFAULT_ON); module_param_named(exclusive_loads, zswap_exclusive_loads_enabled, bool, 0644); + +#ifdef CONFIG_ZSWAP_WRITEBACK_TIME_ON +/* zswap writeback time threshold in second */ +static unsigned int zswap_writeback_time_thr; +static int zswap_writeback_time_thr_param_set(const char *, const struct kernel_param *); +static const struct kernel_param_ops zswap_writeback_param_ops = { + .set = zswap_writeback_time_thr_param_set, + .get = param_get_uint, +}; +module_param_cb(writeback_time_threshold, &zswap_writeback_param_ops, + &zswap_writeback_time_thr, 0644); +#endif + /* Number of zpools in zswap_pool (empirically determined for scalability) */ #define ZSWAP_NR_ZPOOLS 32 @@ -200,6 +213,7 @@ struct zswap_pool { * value - value of the same-value filled pages which have same content * objcg - the obj_cgroup that the compressed memory is charged to * lru - handle to the pool's lru used to evict pages. + * last_ac_time - the last accessed time of zswap_entry. */ struct zswap_entry { struct rb_node rbnode; @@ -213,6 +227,9 @@ struct zswap_entry { }; struct obj_cgroup *objcg; struct list_head lru; +#ifdef CONFIG_ZSWAP_WRITEBACK_TIME_ON + ktime_t last_ac_time; +#endif }; /* @@ -291,6 +308,27 @@ static void zswap_update_total_size(void) zswap_pool_total_size = total; } +#ifdef CONFIG_ZSWAP_WRITEBACK_TIME_ON +static void zswap_set_access_time(struct zswap_entry *entry) +{ + entry->last_ac_time = ktime_get_boottime(); +} + +static void zswap_clear_access_time(struct zswap_entry *entry) +{ + entry->last_ac_time = 0; +} +#else +static void zswap_set_access_time(struct zswap_entry *entry) +{ +} + +static void zswap_clear_access_time(struct zswap_entry *entry) +{ +} +#endif + + /********************************* * zswap entry functions **********************************/ @@ -398,6 +436,7 @@ static void zswap_free_entry(struct zswap_entry *entry) else { spin_lock(&entry->pool->lru_lock); list_del(&entry->lru); + zswap_clear_access_time(entry); spin_unlock(&entry->pool->lru_lock); zpool_free(zswap_find_zpool(entry), entry->handle); zswap_pool_put(entry->pool); @@ -712,6 +751,52 @@ static void shrink_worker(struct work_struct *w) zswap_pool_put(pool); } +#ifdef CONFIG_ZSWAP_WRITEBACK_TIME_ON +static bool zswap_reach_timethr(struct zswap_pool *pool) +{ + struct zswap_entry *entry; + ktime_t expire_time = 0; + bool ret = false; + + spin_lock(&pool->lru_lock); + + if (list_empty(&pool->lru)) + goto out; + + entry = list_last_entry(&pool->lru, struct zswap_entry, lru); + expire_time = ktime_add(entry->last_ac_time, + ns_to_ktime(zswap_writeback_time_thr * NSEC_PER_SEC)); + + if (ktime_after(ktime_get_boottime(), expire_time)) + ret = true; +out: + spin_unlock(&pool->lru_lock); + return ret; +} + +static void zswap_reclaim_entry_by_timethr(void) +{ + struct zswap_pool *pool = zswap_pool_current_get(); + int ret, failures = 0; + + if (!pool) + return; + + while (zswap_reach_timethr(pool)) { + ret = zswap_reclaim_entry(pool); + if (ret) { + zswap_reject_reclaim_fail++; + if (ret != -EAGAIN) + break; + if (++failures == MAX_RECLAIM_RETRIES) + break; + } + cond_resched(); + } + zswap_pool_put(pool); +} +#endif + static struct zswap_pool *zswap_pool_create(char *type, char *compressor) { int i; @@ -1040,6 +1125,23 @@ static int zswap_enabled_param_set(const char *val, return ret; } +#ifdef CONFIG_ZSWAP_WRITEBACK_TIME_ON +static int zswap_writeback_time_thr_param_set(const char *val, + const struct kernel_param *kp) +{ + int ret = -ENODEV; + + /* if this is load-time (pre-init) param setting, just return. */ + if (system_state != SYSTEM_RUNNING) + return ret; + + ret = param_set_uint(val, kp); + if (!ret) + zswap_reclaim_entry_by_timethr(); + return ret; +} +#endif + /********************************* * writeback code **********************************/ @@ -1372,6 +1474,7 @@ bool zswap_store(struct folio *folio) if (entry->length) { spin_lock(&entry->pool->lru_lock); list_add(&entry->lru, &entry->pool->lru); + zswap_set_access_time(entry); spin_unlock(&entry->pool->lru_lock); } spin_unlock(&tree->lock); @@ -1484,6 +1587,7 @@ bool zswap_load(struct folio *folio) folio_mark_dirty(folio); } else if (entry->length) { spin_lock(&entry->pool->lru_lock); + zswap_set_access_time(entry); list_move(&entry->lru, &entry->pool->lru); spin_unlock(&entry->pool->lru_lock); }