From patchwork Tue Sep 19 17:14:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13391618 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0DA7CE79A9 for ; Tue, 19 Sep 2023 17:14:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6BBDD6B0098; Tue, 19 Sep 2023 13:14:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 66BD26B0099; Tue, 19 Sep 2023 13:14:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 534416B009B; Tue, 19 Sep 2023 13:14:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 40FC16B0098 for ; Tue, 19 Sep 2023 13:14:51 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 163A0160507 for ; Tue, 19 Sep 2023 17:14:51 +0000 (UTC) X-FDA: 81253996782.22.2D91873 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf05.hostedemail.com (Postfix) with ESMTP id 5252F100018 for ; Tue, 19 Sep 2023 17:14:49 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dk1Syz2R; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695143689; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Va9ct3om+RUZ45gwrdFa1vNPGMkYE9KP43+unVd5xH8=; b=zKMoE4FQCA2lAxXzA3aKou6Vy4K+uvpQuoNfCd8vZMPjmu2Brf5IoqBhZVAWh379zxdVDx mmdV+ROPuzL8MZkWXWCT4vMjKOsSUnS2PvN2q615Qx4Kr6gx/YQpzGRCxgCySNOXqXZ+S7 b0m9kQ0PiGL179j87v5qa5RarQMbIMo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695143689; a=rsa-sha256; cv=none; b=ibHEzfzrmAUUVrVXeEdHQyEaZkVF5Jdb0Z46XgtnQw665R4ceI8NL/Q+aDIrZoNM6KL1Je u7fVilh9PGw9EioVmWVsWOiW25n1FGdxEVRS3ghnbhL0o8tC6zvV4lyX221B6zb2OTne4O ldZ4RVNUwgUTz6c6ZBZ0G8wNKAcjptE= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dk1Syz2R; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-68fba57030fso5137011b3a.3 for ; Tue, 19 Sep 2023 10:14:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695143688; x=1695748488; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Va9ct3om+RUZ45gwrdFa1vNPGMkYE9KP43+unVd5xH8=; b=dk1Syz2RJ0VaD9SMHNfk+ZbsvXUCzuEhmBfoUJBqcrLC6oPr+YUNdFSVb3H3wTISSX 22KmqOjEb9jZAhRduzu3Oj67056Pc7yN9qMY/K6xf4pfyyqnPPIAyIL4lU1GrvC8gL4/ nsZgQotbtz3HbQZMKYkfNM0Qgp75EEn5xyS3Dt05YhIZH2jHWcdYNDywhfSbiYkI2+vM ab41iBU9AGAD0UwkNS85H2EHq3G3Nu9hhf+zO+LAFO2KtphepIW4epHZswcokRSEPwSA ueJdkzh3qi5aHbCpk4dr+kBz3x20KqC67CtvTHj1PyKcsCuNmSRiTH7jQL3v0shZt/cp 6OCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695143688; x=1695748488; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Va9ct3om+RUZ45gwrdFa1vNPGMkYE9KP43+unVd5xH8=; b=UpiNXpUtlsNri+wfm5QcAPqkksQILdVdmhmpFXnArxMT91Q2Pcq7zO1g6yyJfdtSLJ m3xC2o4VQxbAK+Rdq1Dq1a1Bs0gMoobvn71vwJEVNBTrkpL2u0vWLdePruo6NuyzLjqn 4axPOP2ySUjw4J6SmSvnmpHHzdYS9sp3MygjlP7ynuTGIRtt0XL6s+/n9OX+DnH9MxQP Z8PXOn5VImonMMBdks1bOyS6eKOvzEkArB+F2IAcWpm90FFSo4AI7DVCZtoGgn34McT3 q9S7QEspiNnc1IX8+UEsDRpfmlxqo3DC6zuOxQzGkaVKSMcNYi7EtuCp4hKr/+273oSW vhhQ== X-Gm-Message-State: AOJu0YwKDKQXLcazUfKS+4h4AGHG5QVrX/rJVtprHk6Uzyhc4+F7ScdE xy/onfMCFqMm84SFAF5+dJ4= X-Google-Smtp-Source: AGHT+IG2PgnEQbQKlMZX3kplRTiPT30A7w6wESPlIGK1vXmPLPorShdYz0qk6ThmL+4QKPjDW/QSYA== X-Received: by 2002:a05:6a21:3d8b:b0:137:3c67:85d7 with SMTP id bj11-20020a056a213d8b00b001373c6785d7mr196099pzc.16.1695143687894; Tue, 19 Sep 2023 10:14:47 -0700 (PDT) Received: from localhost (fwdproxy-prn-119.fbsv.net. [2a03:2880:ff:77::face:b00c]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c24d00b001bb988ac243sm10248337plg.297.2023.09.19.10.14.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 10:14:47 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH v2 0/2] workload-specific and memory pressure-driven zswap writeback Date: Tue, 19 Sep 2023 10:14:45 -0700 Message-Id: <20230919171447.2712746-1-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: 5252F100018 X-Rspam-User: X-Stat-Signature: sgyaw6tjfpceok1z98yfcsayp6sb57jy X-Rspamd-Server: rspam03 X-HE-Tag: 1695143689-643014 X-HE-Meta: U2FsdGVkX195R8ZsssghR2sdIRr63s9qBRUbxNfFn4BPTpQ0KsWTrYy5WUbDnNbRsvI1KCFJr88qh5y0scpWvDIe/opuzO11gxDdL0dumhGzC+jsW6Gdev5pg5iVL2kP9X7xCp4d4WH2B0VFkj1YMMnL4JqcqS/qKrz/Z4Ufq+jNJkGqbepGSZXKY1iBZLAyXrNaq2xrb6j9GDAEO/lOaDeaeHmr0GCRU93quJmXzwf5Pt10y4L2771N30krdn9K/HbSAnWI8Hdw+0nr9iKg8ak5uFHoLFw8CoPI1Oh/DclIqYTlI/PZSayrcw+bohiZkLymQuLWdc1roy6MxZf3QRms1+DHUg0/8VYvdKH4GP1UFlb3ObOr+BiSQVmAYsmJprMa/+S3kWapqOcx8rtGbw4T7IxGoj0BdYeKlHROKBcCc2sOshty1hXgy6jGEhyI7AYJydgJL0N6u5aVeBPEq0bU5xWucf81TJoTcK/Gw0vg4CJto19ePIPNfr9j7qzZd/RLQfuwmM1tbxKKBaeQUhYlwIB0v6HtN+bzUji7X2wJINfBe2mV/lnRszulZRclECFLU/7PazmBb6KRM1TBTZ9kHf0pcS207y6lzrmV2Uif3W6R0wUvgQ+zOo5OpH2o9llcYVWhOIOpIRKatj4qweWBbR3clnE3EZ+a1qpQigoMI8YDA9Yo2aW1x8YmfXnGPG+JoeGTifZxNkmf8GSqF/IB7QtwWg1dcsZbU0kq5J+Zhs0uAFqgDw43zRbZr2RCw+35tJit1LyAsx488PXnxIQQ3tA77u9Fj9xuAs4AO/wQORu8zqwWaEG6Hd57jo8e7+u4710+pTiW5ZYgVyproz+FtlgCu0Gxz235Y/imb/YzY7wLqgUvuLpZE58SPGXL1iw4dysgdgIv9nIG8dc+T8mZYpPlTidNKD5iPV7W5aX/lia63RR1w/GWg8b23KE5MUkBWSspND0diK8RC1V X/vzqmBZ iTiae28XKbGGuil/vAOznOJ+r9eqTVuaD8k0Bc5qFNoB+5+bZNqUXcDhps45LfvhbnwD3060fR7HGrmDM2wm8sPBbORjmqzXLhcNMGmMS1SDJBBSukcgmqj83Idr5gigjx4Rj3KoxA/+TJ9qSNlmCL2CcnljpKui2PKzR0ffukSJScws7B6z15K+vgegdJfd+tUlqOvSl0BhxJY/x/8OoMfY2XES+NfClXqNRGeQK0xnhLK5DCFHw5hBw6PNEA0KynHB8XDAYq8+bHCNqJW//K3juPthTV5hQSs2cdGpzqKvF93RC8zfW4SQkiIiJq20ftfieKN+XE3dIK3FxSHUr0HrBdYbDMcLBevf53m0T21GXWiPhz0NcpB2iPifkoJYSidPcR7HNaQRRKUdmq6n4n3D3Wjlp1D9jKpoiBifFNATtQECKhlvlFWJwDxr3E5QwtCBDKVxY0V50E6xVakUuoTzFzE3jcdZK5W/iJS7gU3yh0Xo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000076, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Changelog: v2: * Fix loongarch compiler errors * Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap. This makes it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in zswap. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. On a benchmark that we have run: (without the shrinker) real -- mean: 153.27s, median: 153.199s sys -- mean: 541.652s, median: 541.903s user -- mean: 4384.9673999999995s, median: 4385.471s (with the shrinker) real -- mean: 151.4956s, median: 151.456s sys -- mean: 461.14639999999997s, median: 465.656s user -- mean: 4384.7118s, median: 4384.675s We observed a 14-15% reduction in kernel CPU time, which translated to over 1% reduction in real time. On another benchmark, where there was a lot more cold memory residing in zswap, we observed even more pronounced gains: (without the shrinker) real -- mean: 157.52519999999998s, median: 157.281s sys -- mean: 769.3082s, median: 780.545s user -- mean: 4378.1622s, median: 4378.286s (with the shrinker) real -- mean: 152.9608s, median: 152.845s sys -- mean: 517.4446s, median: 506.749s user -- mean: 4387.694s, median: 4387.935s Here, we saw around 32-35% reduction in kernel CPU time, which translated to 2.8% reduction in real time. These results confirm our hypothesis that the shrinker is more helpful the more cold memory we have. Domenico Cerasuolo (1): zswap: make shrinking memcg-aware Nhat Pham (1): zswap: shrinks zswap pool based on memory pressure Documentation/admin-guide/mm/zswap.rst | 12 + include/linux/list_lru.h | 39 +++ include/linux/memcontrol.h | 6 + include/linux/mmzone.h | 14 + include/linux/zswap.h | 9 + mm/list_lru.c | 46 ++- mm/memcontrol.c | 33 ++ mm/swap_state.c | 50 +++- mm/zswap.c | 397 ++++++++++++++++++++++--- 9 files changed, 548 insertions(+), 58 deletions(-)