From patchwork Tue Mar 12 15:34:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 13590149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16559C54E5D for ; Tue, 12 Mar 2024 15:39:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9DB788D0036; Tue, 12 Mar 2024 11:39:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 964228D0057; Tue, 12 Mar 2024 11:39:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E8648D0036; Tue, 12 Mar 2024 11:39:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 65AF38D0036 for ; Tue, 12 Mar 2024 11:39:10 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 31CEC1C0AE7 for ; Tue, 12 Mar 2024 15:39:10 +0000 (UTC) X-FDA: 81888795660.27.9B81B8E Received: from mail-yb1-f182.google.com (mail-yb1-f182.google.com [209.85.219.182]) by imf18.hostedemail.com (Postfix) with ESMTP id 44EBB1C0012 for ; Tue, 12 Mar 2024 15:39:08 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=U+fRqYAd; spf=pass (imf18.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710257948; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=uWZy9viPn3xVfMwSo1fLlHap3PyL9EaP0f09VNsk6Bo=; b=A9FTtV5gFb8KAeFQ6TdWiChPSEJM7IMfBsJafp/EbOqq+5x5vYbEkLfRJp9q051I807WZV mU15A+g6927kE0nOnqU7tOtr+ZVXDBb6ZBKiz8kRmuusqrQQFES9+2+sMmHmYh6zev0uHd I4I+2/C8P8NdZOu0JmIfvF6c4I+LFv0= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=U+fRqYAd; spf=pass (imf18.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710257948; a=rsa-sha256; cv=none; b=SPyeoM3qg44zumiFwnsFnvGAQTY/CsbzkYP14/QRRRQWUtAJJ6OdBWVuKhlbKYT6brcJ7J Mwn8QiSenSzT3GuJ8Avs6+x8DV0YcbWLaBCCyIStCqAdyk+HdVJIGCtSd98G5EOTh0YU0/ 5/fuZnDDx76OL3mY4AjImqkW3iVxdEQ= Received: by mail-yb1-f182.google.com with SMTP id 3f1490d57ef6-dcd7c526cc0so5545219276.1 for ; Tue, 12 Mar 2024 08:39:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1710257947; x=1710862747; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=uWZy9viPn3xVfMwSo1fLlHap3PyL9EaP0f09VNsk6Bo=; b=U+fRqYAdJWfSOMNL3G8fwy0wp4WSicC6UBQLlqyrPSii/qMiyTv7DGe53+Kpl0KwOD A/XTpBzCaqalIPymd1IZ9lAh8xSM5clt1eRivpWzEdkmiy9Uze+mFmeNYpHT57JzHZr4 2kWhMgrnHBAzld/fOmaUfZtYux6FiU6XYt83jQ1XR+Mt7+HkpisQV2UpvrOPoqIUhKiX IVvrxPnmvIBoipBpY9GDs1LbfxQkxRg5rNUnefsTcLwpIdTgaYrrbkJKHYvUlj2m+1jg sePQyEBDxYqhCMW6RZdO21i6QniIFV+wgxk51zerzyQAMbm1P+DU+d9hZarUz5Q3l8RZ 3CXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710257947; x=1710862747; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uWZy9viPn3xVfMwSo1fLlHap3PyL9EaP0f09VNsk6Bo=; b=ZN2S8AyvOxqPxzg939hzlBnsU4IdlMiB6Ar8Ktq4nfcwtNhfQTfuCAJRukxy+OH6NV AGtqoiuXiNpNlElimTDGk0c1gQty7bq051AXx4zpUifHEeeDpuE9PFOlzRy3UP4wGolx XxGRS/HuEpW/lL4lodoYj4k93zmT0B2MOTYoxtT9ildYZaAsQcVKMZv3KKVUdbr/IXP7 21VRwbUxLxKVFJt5VxtRl3uubj98ilULoc/U0GfpAD4KLkE974eiktflULxlZ7IuOy3c jj0lI+udKzCL++KgoePOWexL/wLkAcUukvoMO99iunQLN9UTRzlB/QhyWCPiYmPZ8mBP BQYQ== X-Forwarded-Encrypted: i=1; AJvYcCVPrRTkX2BJ2IpbY1NXBbvUDpZe+0KJ0H5uf9XtlW5K1lfaWDm2fb/X3E2nQjJm4+a7Bi9fn4+oxhtsDMVzj/+GvSs= X-Gm-Message-State: AOJu0YwfY7ne+i64SF35+Cdfwtlhm54jV7GAUHfoFHlt0ksM5/nIxjEZ i7alHGx5W4o6g6K/lTUF+7iYLYVqJ/fawrdtopcSu8HLySK29M60dG+sLFYIjkU= X-Google-Smtp-Source: AGHT+IHDHnX9SmKbMH9otvB7qodDk5ietPk+zbRHlckOs6d8X6iQGzYMyKQrhCJWwH2JcNl/HDRb4A== X-Received: by 2002:a25:db86:0:b0:dc6:ddfc:1736 with SMTP id g128-20020a25db86000000b00dc6ddfc1736mr8146867ybf.17.1710257947206; Tue, 12 Mar 2024 08:39:07 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-da5e-d3ff-fee7-26e7.res6.spectrum.com. [2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id k16-20020a0cfa50000000b00691012c2042sm256432qvo.127.2024.03.12.08.39.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 08:39:06 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Yosry Ahmed , Nhat Pham , Chengming Zhou , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chengming Zhou Subject: [PATCH V2 1/2] mm: zswap: optimize zswap pool size tracking Date: Tue, 12 Mar 2024 11:34:11 -0400 Message-ID: <20240312153901.3441-1-hannes@cmpxchg.org> X-Mailer: git-send-email 2.44.0 MIME-Version: 1.0 X-Rspamd-Queue-Id: 44EBB1C0012 X-Rspam-User: X-Stat-Signature: unbauh9z6uhqriptfjdig7gfxm3ewo65 X-Rspamd-Server: rspam01 X-HE-Tag: 1710257948-550972 X-HE-Meta: U2FsdGVkX18aamolUY8tWmoNof+kt5GXM0wl7xa7vLtOGgKeyAYTL+aX8u7cCjZQGUjbtdL+ims+04otPf2NJngcdL4C6d0MwYoTl0EaLgD/m5/BsU1oAoqbd4Cf33AaxjzKu/W8OJBI4i7R0eKHZqrNNxurYdpfmwjLESljd2yGiFd9Vsf4xv2TWewmYlNNRNQqkjoR4FLm+W5bGCISznAUF4gKYmMSWWoH7Sv1Kr4nRbUEzsgoDjji/FDPv4bPEQpoVWupOcYU18kPfI2K8NhRPBu6gAPzNF2OmgHHGXLbtdn/CWeiNy+LHrDcnMcAgIdgM8X9vqOmGutqkW2E5IPzME+bVDxmlLF/ZQCNvsOA+sZRwVhyv2X2pwYnBbBM+yQKUxE5UDpi8axbpC8AZ/VuzafOJfzHvD5qW+tB3Egon2+aoH54oejBSZ6VZIBcnC9B3AMQgGHVRX/9xA7t0WLK7VUMONCviUo3ykWTxYA0aVTyW08qK37Izhm3GKFPNuNaIzMXmUEEWhg6vuXK27l/eCgROZ8QevIRtTa/MCZYTnK1n7DgZWI7CegGq85P3PhfVNZRKi5HFMCkBhhzN338ybo2ph6XUHwvgvJisou66YzJAZr5j1qpEmAT1oaX8zDSGzen+Jv+lCJukIT+JyWNn0LcrqwbJ3UuHqNXmsA0efkhM66M/9Ry42xdGYKYCVdSmadUePesHdGBC3t6/3THIlz11dE1Js8OJdY1tx3K0HuSNoLM71G8v0kZCcmvjmWKrLIzcJX3clM1418JqugfC2nxY+Mv9bNODEvRt1Rmq4p5XN2FBYyrCul5wW/sxRxH18rgGmeNnCw0a4X8sF92wXTxGNducD2aWo3N9/Gim8JHIOXYpfzgfU9O9HfSGh9dST2U6X9kGghaA3bFKaliDKvf0Qoeyi5zgDK1PFoMN/vOGV7bIBSgPSh50PgupHGpoqaRb3YFe+6hw0Y 5lSMmbKl eMdDBpVi0Ys3K4NyhXMecX0jSYDuFOM04N+rgwZuoveo2jUo3/WeRX4KYL2LSUolr0mfObTZNUJHrQskBy3+mr2bS/zEOfNXJOVsqktvhTTf3KSA62XyowShE8FuE23/Kr+4kmLeVRAQs50FmMaWVXymXsw8Rdjtba/k4hDLs09y4mI5K3soVzegIyQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Profiling the munmap() of a zswapped memory region shows 60% of the total cycles currently going into updating the zswap_pool_total_size. There are three consumers of this counter: - store, to enforce the globally configured pool limit - meminfo & debugfs, to report the size to the user - shrink, to determine the batch size for each cycle Instead of aggregating everytime an entry enters or exits the zswap pool, aggregate the value from the zpools on-demand: - Stores aggregate the counter anyway upon success. Aggregating to check the limit instead is the same amount of work. - Meminfo & debugfs might benefit somewhat from a pre-aggregated counter, but aren't exactly hotpaths. - Shrinking can aggregate once for every cycle instead of doing it for every freed entry. As the shrinker might work on tens or hundreds of objects per scan cycle, this is a large reduction in aggregations. The paths that benefit dramatically are swapin, swapoff, and unmaps. There could be millions of pages being processed until somebody asks for the pool size again. This eliminates the pool size updates from those paths entirely. Top profile entries for a 24G range munmap(), before: 38.54% zswap-unmap [kernel.kallsyms] [k] zs_zpool_total_size 12.51% zswap-unmap [kernel.kallsyms] [k] zpool_get_total_size 9.10% zswap-unmap [kernel.kallsyms] [k] zswap_update_total_size 2.95% zswap-unmap [kernel.kallsyms] [k] obj_cgroup_uncharge_zswap 2.88% zswap-unmap [kernel.kallsyms] [k] __slab_free 2.86% zswap-unmap [kernel.kallsyms] [k] xas_store and after: 7.70% zswap-unmap [kernel.kallsyms] [k] __slab_free 7.16% zswap-unmap [kernel.kallsyms] [k] obj_cgroup_uncharge_zswap 6.74% zswap-unmap [kernel.kallsyms] [k] xas_store It was also briefly considered to move to a single atomic in zswap that is updated by the backends, since zswap only cares about the sum of all pools anyway. However, zram directly needs per-pool information out of zsmalloc. To keep the backend from having to update two atomics every time, I opted for the lazy aggregation instead for now. Signed-off-by: Johannes Weiner Acked-by: Yosry Ahmed Reviewed-by: Chengming Zhou Reviewed-by: Nhat Pham --- fs/proc/meminfo.c | 3 +- include/linux/zswap.h | 2 +- mm/zswap.c | 101 +++++++++++++++++++++--------------------- 3 files changed, 52 insertions(+), 54 deletions(-) v2: - added profile info (Yosry). Counter footprint is actually 60%, I had missed the third line in perf's graphed output previously. - zswap_accept_thr_pages() helper (Yosry) - fixed debugfs file missing newline (Yosry) - added changelog note on a single zswap atomic for the backend size (Yosry) - collected acks and reviews diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 45af9a989d40..245171d9164b 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -89,8 +89,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) show_val_kb(m, "SwapTotal: ", i.totalswap); show_val_kb(m, "SwapFree: ", i.freeswap); #ifdef CONFIG_ZSWAP - seq_printf(m, "Zswap: %8lu kB\n", - (unsigned long)(zswap_pool_total_size >> 10)); + show_val_kb(m, "Zswap: ", zswap_total_pages()); seq_printf(m, "Zswapped: %8lu kB\n", (unsigned long)atomic_read(&zswap_stored_pages) << (PAGE_SHIFT - 10)); diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 341aea490070..2a85b941db97 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -7,7 +7,6 @@ struct lruvec; -extern u64 zswap_pool_total_size; extern atomic_t zswap_stored_pages; #ifdef CONFIG_ZSWAP @@ -27,6 +26,7 @@ struct zswap_lruvec_state { atomic_long_t nr_zswap_protected; }; +unsigned long zswap_total_pages(void); bool zswap_store(struct folio *folio); bool zswap_load(struct folio *folio); void zswap_invalidate(swp_entry_t swp); diff --git a/mm/zswap.c b/mm/zswap.c index 9a3237752082..1a5cc7298306 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -43,8 +43,6 @@ /********************************* * statistics **********************************/ -/* Total bytes used by the compressed storage */ -u64 zswap_pool_total_size; /* The number of compressed pages currently stored in zswap */ atomic_t zswap_stored_pages = ATOMIC_INIT(0); /* The number of same-value filled pages currently stored in zswap */ @@ -264,45 +262,6 @@ static inline struct zswap_tree *swap_zswap_tree(swp_entry_t swp) pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ zpool_get_type((p)->zpools[0])) -static bool zswap_is_full(void) -{ - return totalram_pages() * zswap_max_pool_percent / 100 < - DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE); -} - -static bool zswap_can_accept(void) -{ - return totalram_pages() * zswap_accept_thr_percent / 100 * - zswap_max_pool_percent / 100 > - DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE); -} - -static u64 get_zswap_pool_size(struct zswap_pool *pool) -{ - u64 pool_size = 0; - int i; - - for (i = 0; i < ZSWAP_NR_ZPOOLS; i++) - pool_size += zpool_get_total_size(pool->zpools[i]); - - return pool_size; -} - -static void zswap_update_total_size(void) -{ - struct zswap_pool *pool; - u64 total = 0; - - rcu_read_lock(); - - list_for_each_entry_rcu(pool, &zswap_pools, list) - total += get_zswap_pool_size(pool); - - rcu_read_unlock(); - - zswap_pool_total_size = total; -} - /********************************* * pool functions **********************************/ @@ -540,6 +499,33 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) return NULL; } +static unsigned long zswap_max_pages(void) +{ + return totalram_pages() * zswap_max_pool_percent / 100; +} + +static unsigned long zswap_accept_thr_pages(void) +{ + return zswap_max_pages() * zswap_accept_thr_percent / 100; +} + +unsigned long zswap_total_pages(void) +{ + struct zswap_pool *pool; + u64 total = 0; + + rcu_read_lock(); + list_for_each_entry_rcu(pool, &zswap_pools, list) { + int i; + + for (i = 0; i < ZSWAP_NR_ZPOOLS; i++) + total += zpool_get_total_size(pool->zpools[i]); + } + rcu_read_unlock(); + + return total >> PAGE_SHIFT; +} + /********************************* * param callbacks **********************************/ @@ -912,7 +898,6 @@ static void zswap_entry_free(struct zswap_entry *entry) } zswap_entry_cache_free(entry); atomic_dec(&zswap_stored_pages); - zswap_update_total_size(); } /* @@ -1317,7 +1302,7 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); #else /* use pool stats instead of memcg stats */ - nr_backing = zswap_pool_total_size >> PAGE_SHIFT; + nr_backing = zswap_total_pages(); nr_stored = atomic_read(&zswap_nr_stored); #endif @@ -1385,6 +1370,10 @@ static void shrink_worker(struct work_struct *w) { struct mem_cgroup *memcg; int ret, failures = 0; + unsigned long thr; + + /* Reclaim down to the accept threshold */ + thr = zswap_accept_thr_pages(); /* global reclaim will select cgroup in a round-robin fashion. */ do { @@ -1432,10 +1421,9 @@ static void shrink_worker(struct work_struct *w) break; if (ret && ++failures == MAX_RECLAIM_RETRIES) break; - resched: cond_resched(); - } while (!zswap_can_accept()); + } while (zswap_total_pages() > thr); } static int zswap_is_page_same_filled(void *ptr, unsigned long *value) @@ -1476,6 +1464,7 @@ bool zswap_store(struct folio *folio) struct zswap_entry *entry, *dupentry; struct obj_cgroup *objcg = NULL; struct mem_cgroup *memcg = NULL; + unsigned long max_pages, cur_pages; VM_WARN_ON_ONCE(!folio_test_locked(folio)); VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); @@ -1487,6 +1476,7 @@ bool zswap_store(struct folio *folio) if (!zswap_enabled) goto check_old; + /* Check cgroup limits */ objcg = get_obj_cgroup_from_folio(folio); if (objcg && !obj_cgroup_may_zswap(objcg)) { memcg = get_mem_cgroup_from_objcg(objcg); @@ -1497,15 +1487,18 @@ bool zswap_store(struct folio *folio) mem_cgroup_put(memcg); } - /* reclaim space if needed */ - if (zswap_is_full()) { + /* Check global limits */ + cur_pages = zswap_total_pages(); + max_pages = zswap_max_pages(); + + if (cur_pages >= max_pages) { zswap_pool_limit_hit++; zswap_pool_reached_full = true; goto shrink; } if (zswap_pool_reached_full) { - if (!zswap_can_accept()) + if (cur_pages > zswap_accept_thr_pages()) goto shrink; else zswap_pool_reached_full = false; @@ -1581,7 +1574,6 @@ bool zswap_store(struct folio *folio) /* update stats */ atomic_inc(&zswap_stored_pages); - zswap_update_total_size(); count_vm_event(ZSWPOUT); return true; @@ -1711,6 +1703,13 @@ void zswap_swapoff(int type) static struct dentry *zswap_debugfs_root; +static int debugfs_get_total_size(void *data, u64 *val) +{ + *val = zswap_total_pages() * PAGE_SIZE; + return 0; +} +DEFINE_DEBUGFS_ATTRIBUTE(total_size_fops, debugfs_get_total_size, NULL, "%llu\n"); + static int zswap_debugfs_init(void) { if (!debugfs_initialized()) @@ -1732,8 +1731,8 @@ static int zswap_debugfs_init(void) zswap_debugfs_root, &zswap_reject_compress_poor); debugfs_create_u64("written_back_pages", 0444, zswap_debugfs_root, &zswap_written_back_pages); - debugfs_create_u64("pool_total_size", 0444, - zswap_debugfs_root, &zswap_pool_total_size); + debugfs_create_file("pool_total_size", 0444, + zswap_debugfs_root, NULL, &total_size_fops); debugfs_create_atomic_t("stored_pages", 0444, zswap_debugfs_root, &zswap_stored_pages); debugfs_create_atomic_t("same_filled_pages", 0444,