From patchwork Wed Apr 27 16:00:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 12829011 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7DF6C433FE for ; Wed, 27 Apr 2022 16:01:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A3B3A6B007B; Wed, 27 Apr 2022 12:01:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9747B6B007D; Wed, 27 Apr 2022 12:01:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A00C6B007E; Wed, 27 Apr 2022 12:01:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 613B66B007B for ; Wed, 27 Apr 2022 12:01:09 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 4130F81440 for ; Wed, 27 Apr 2022 16:01:09 +0000 (UTC) X-FDA: 79403123058.16.E9A7226 Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) by imf10.hostedemail.com (Postfix) with ESMTP id E3C00C0055 for ; Wed, 27 Apr 2022 16:00:49 +0000 (UTC) Received: by mail-qv1-f49.google.com with SMTP id jt15so1335736qvb.13 for ; Wed, 27 Apr 2022 09:00:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bsj16Ee0guBN0gpdXp2AuLzbDIq1BETjMO2kD2xZlKg=; b=xi8WITLcAxwkSdTOuZL5LSJbzMRqu28W9ukS7yVDQ8SJy5Neabs9pQJOQ3iK1+e8FY NdIdJ9ggDKHPS/HSOBpDp6kjIYsmL+fLdAQICPOJJFmnlnh92CvZK5RZkvDYlmD05Xh9 vEixjTl/uHsygAbohQm+HoKc9nk7cGrcyWoVSNrEfNYiZZtMrlbqAW+uTczPRJGSCDTm ClTVAFYUMgRydj0oOKOPLkCleCYl8J5At648Erjc496dnGgfHg9YCJyt4kCFPGRAbuJu 16KbgJrA5NqyF4ktmStgevvE9dOM0RoW0wkWCXt3XSKUXovHFI2N+v4JoIBrVHv3n6Zd uxrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bsj16Ee0guBN0gpdXp2AuLzbDIq1BETjMO2kD2xZlKg=; b=G/YqBN0AzJvSE7V/Ix1drux4X8+0y+0fjsA09a98m5lfxa9tnbY/mFQMS6BDBhaTb8 J52FXvoZGlRMN2th3bSYGK4nmpHFbMmtAqRNxJ8SdGCqcumcVZ4UKealih2aGsG2FTY9 gUdOiQCxJpznCZGUVkK/Bn6NIFaSBqbZLx7atHvEWzWVKJNNH/B/UeHJiA1LmnGZqQXq CFnjmCEe0f5MlmN/ZFt1K0goPdBr2jphThSOTCKCYJyfQweDCI2TVmH/LfFTJ6NzpA5n HOm44WtVZyB/ahIJKSC6umxb0aT6+kPv88/NgsglUTCS9cBr0IXZA0LN8wQjx5Wq7A7O f9xA== X-Gm-Message-State: AOAM533kpCkGsf6HoWFyxIetp/F6Frrg6hrUadheYcOwYaM1uNH/AG7B 7/POKD1J8U3tRcVbt0Npjd2tGw== X-Google-Smtp-Source: ABdhPJyjurGv9Lq7dXVCqefkV3eZHt7gPYatEbi29JWj+LdhbdS7AAmtti+5rWfR4063T8UB7XIL/A== X-Received: by 2002:ad4:5d48:0:b0:456:446e:4595 with SMTP id jk8-20020ad45d48000000b00456446e4595mr6623480qvb.62.1651075257824; Wed, 27 Apr 2022 09:00:57 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:f617]) by smtp.gmail.com with ESMTPSA id 11-20020a05620a040b00b0069f95b27869sm605867qkp.125.2022.04.27.09.00.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 09:00:57 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Seth Jennings , Dan Streetman , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 1/5] mm: Kconfig: move swap and slab config options to the MM section Date: Wed, 27 Apr 2022 12:00:12 -0400 Message-Id: <20220427160016.144237-2-hannes@cmpxchg.org> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220427160016.144237-1-hannes@cmpxchg.org> References: <20220427160016.144237-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: E3C00C0055 Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=xi8WITLc; spf=pass (imf10.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.49 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org X-Rspam-User: X-Stat-Signature: 6mghxeckkzyn7fppdcia86g43x8dqsuh X-HE-Tag: 1651075249-916021 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: These are currently under General Setup. MM seems like a better fit. Signed-off-by: Johannes Weiner --- init/Kconfig | 123 --------------------------------------------------- mm/Kconfig | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 123 insertions(+), 123 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 4489416f1e5c..468fe27cec0b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -375,23 +375,6 @@ config DEFAULT_HOSTNAME but you may wish to use a different default here to make a minimal system more usable with less configuration. -# -# For some reason microblaze and nios2 hard code SWAP=n. Hopefully we can -# add proper SWAP support to them, in which case this can be remove. -# -config ARCH_NO_SWAP - bool - -config SWAP - bool "Support for paging of anonymous memory (swap)" - depends on MMU && BLOCK && !ARCH_NO_SWAP - default y - help - This option allows you to choose whether you want to have support - for so called swap devices or swap files in your kernel that are - used to provide more virtual memory than the actual RAM present - in your computer. If unsure say Y. - config SYSVIPC bool "System V IPC" help @@ -1909,112 +1892,6 @@ config COMPAT_BRK On non-ancient distros (post-2000 ones) N is usually a safe choice. -choice - prompt "Choose SLAB allocator" - default SLUB - help - This option allows to select a slab allocator. - -config SLAB - bool "SLAB" - depends on !PREEMPT_RT - select HAVE_HARDENED_USERCOPY_ALLOCATOR - help - The regular slab allocator that is established and known to work - well in all environments. It organizes cache hot objects in - per cpu and per node queues. - -config SLUB - bool "SLUB (Unqueued Allocator)" - select HAVE_HARDENED_USERCOPY_ALLOCATOR - help - SLUB is a slab allocator that minimizes cache line usage - instead of managing queues of cached objects (SLAB approach). - Per cpu caching is realized using slabs of objects instead - of queues of objects. SLUB can use memory efficiently - and has enhanced diagnostics. SLUB is the default choice for - a slab allocator. - -config SLOB - depends on EXPERT - bool "SLOB (Simple Allocator)" - depends on !PREEMPT_RT - help - SLOB replaces the stock allocator with a drastically simpler - allocator. SLOB is generally more space efficient but - does not perform as well on large systems. - -endchoice - -config SLAB_MERGE_DEFAULT - bool "Allow slab caches to be merged" - default y - depends on SLAB || SLUB - help - For reduced kernel memory fragmentation, slab caches can be - merged when they share the same size and other characteristics. - This carries a risk of kernel heap overflows being able to - overwrite objects from merged caches (and more easily control - cache layout), which makes such heap attacks easier to exploit - by attackers. By keeping caches unmerged, these kinds of exploits - can usually only damage objects in the same cache. To disable - merging at runtime, "slab_nomerge" can be passed on the kernel - command line. - -config SLAB_FREELIST_RANDOM - bool "Randomize slab freelist" - depends on SLAB || SLUB - help - Randomizes the freelist order used on creating new pages. This - security feature reduces the predictability of the kernel slab - allocator against heap overflows. - -config SLAB_FREELIST_HARDENED - bool "Harden slab freelist metadata" - depends on SLAB || SLUB - help - Many kernel heap attacks try to target slab cache metadata and - other infrastructure. This options makes minor performance - sacrifices to harden the kernel slab allocator against common - freelist exploit methods. Some slab implementations have more - sanity-checking than others. This option is most effective with - CONFIG_SLUB. - -config SHUFFLE_PAGE_ALLOCATOR - bool "Page allocator randomization" - default SLAB_FREELIST_RANDOM && ACPI_NUMA - help - Randomization of the page allocator improves the average - utilization of a direct-mapped memory-side-cache. See section - 5.2.27 Heterogeneous Memory Attribute Table (HMAT) in the ACPI - 6.2a specification for an example of how a platform advertises - the presence of a memory-side-cache. There are also incidental - security benefits as it reduces the predictability of page - allocations to compliment SLAB_FREELIST_RANDOM, but the - default granularity of shuffling on the "MAX_ORDER - 1" i.e, - 10th order of pages is selected based on cache utilization - benefits on x86. - - While the randomization improves cache utilization it may - negatively impact workloads on platforms without a cache. For - this reason, by default, the randomization is enabled only - after runtime detection of a direct-mapped memory-side-cache. - Otherwise, the randomization may be force enabled with the - 'page_alloc.shuffle' kernel command line parameter. - - Say Y if unsure. - -config SLUB_CPU_PARTIAL - default y - depends on SLUB && SMP - bool "SLUB per cpu partial cache" - help - Per cpu partial caches accelerate objects allocation and freeing - that is local to a processor at the price of more indeterminism - in the latency of the free. On overflow these caches will be cleared - which requires the taking of locks that may cause latency spikes. - Typically one would choose no for a realtime system. - config MMAP_ALLOW_UNINITIALIZED bool "Allow mmapped anonymous memory to be uninitialized" depends on EXPERT && !MMU diff --git a/mm/Kconfig b/mm/Kconfig index c2141dd639e3..675a6be43739 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -2,6 +2,129 @@ menu "Memory Management options" +# +# For some reason microblaze and nios2 hard code SWAP=n. Hopefully we can +# add proper SWAP support to them, in which case this can be remove. +# +config ARCH_NO_SWAP + bool + +config SWAP + bool "Support for paging of anonymous memory (swap)" + depends on MMU && BLOCK && !ARCH_NO_SWAP + default y + help + This option allows you to choose whether you want to have support + for so called swap devices or swap files in your kernel that are + used to provide more virtual memory than the actual RAM present + in your computer. If unsure say Y. + +choice + prompt "Choose SLAB allocator" + default SLUB + help + This option allows to select a slab allocator. + +config SLAB + bool "SLAB" + depends on !PREEMPT_RT + select HAVE_HARDENED_USERCOPY_ALLOCATOR + help + The regular slab allocator that is established and known to work + well in all environments. It organizes cache hot objects in + per cpu and per node queues. + +config SLUB + bool "SLUB (Unqueued Allocator)" + select HAVE_HARDENED_USERCOPY_ALLOCATOR + help + SLUB is a slab allocator that minimizes cache line usage + instead of managing queues of cached objects (SLAB approach). + Per cpu caching is realized using slabs of objects instead + of queues of objects. SLUB can use memory efficiently + and has enhanced diagnostics. SLUB is the default choice for + a slab allocator. + +config SLOB + depends on EXPERT + bool "SLOB (Simple Allocator)" + depends on !PREEMPT_RT + help + SLOB replaces the stock allocator with a drastically simpler + allocator. SLOB is generally more space efficient but + does not perform as well on large systems. + +endchoice + +config SLAB_MERGE_DEFAULT + bool "Allow slab caches to be merged" + default y + depends on SLAB || SLUB + help + For reduced kernel memory fragmentation, slab caches can be + merged when they share the same size and other characteristics. + This carries a risk of kernel heap overflows being able to + overwrite objects from merged caches (and more easily control + cache layout), which makes such heap attacks easier to exploit + by attackers. By keeping caches unmerged, these kinds of exploits + can usually only damage objects in the same cache. To disable + merging at runtime, "slab_nomerge" can be passed on the kernel + command line. + +config SLAB_FREELIST_RANDOM + bool "Randomize slab freelist" + depends on SLAB || SLUB + help + Randomizes the freelist order used on creating new pages. This + security feature reduces the predictability of the kernel slab + allocator against heap overflows. + +config SLAB_FREELIST_HARDENED + bool "Harden slab freelist metadata" + depends on SLAB || SLUB + help + Many kernel heap attacks try to target slab cache metadata and + other infrastructure. This options makes minor performance + sacrifices to harden the kernel slab allocator against common + freelist exploit methods. Some slab implementations have more + sanity-checking than others. This option is most effective with + CONFIG_SLUB. + +config SHUFFLE_PAGE_ALLOCATOR + bool "Page allocator randomization" + default SLAB_FREELIST_RANDOM && ACPI_NUMA + help + Randomization of the page allocator improves the average + utilization of a direct-mapped memory-side-cache. See section + 5.2.27 Heterogeneous Memory Attribute Table (HMAT) in the ACPI + 6.2a specification for an example of how a platform advertises + the presence of a memory-side-cache. There are also incidental + security benefits as it reduces the predictability of page + allocations to compliment SLAB_FREELIST_RANDOM, but the + default granularity of shuffling on the "MAX_ORDER - 1" i.e, + 10th order of pages is selected based on cache utilization + benefits on x86. + + While the randomization improves cache utilization it may + negatively impact workloads on platforms without a cache. For + this reason, by default, the randomization is enabled only + after runtime detection of a direct-mapped memory-side-cache. + Otherwise, the randomization may be force enabled with the + 'page_alloc.shuffle' kernel command line parameter. + + Say Y if unsure. + +config SLUB_CPU_PARTIAL + default y + depends on SLUB && SMP + bool "SLUB per cpu partial cache" + help + Per cpu partial caches accelerate objects allocation and freeing + that is local to a processor at the price of more indeterminism + in the latency of the free. On overflow these caches will be cleared + which requires the taking of locks that may cause latency spikes. + Typically one would choose no for a realtime system. + config SELECT_MEMORY_MODEL def_bool y depends on ARCH_SELECT_MEMORY_MODEL From patchwork Wed Apr 27 16:00:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 12829054 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 930FAC433F5 for ; Wed, 27 Apr 2022 16:07:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2B8036B0088; Wed, 27 Apr 2022 12:07:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 267616B0089; Wed, 27 Apr 2022 12:07:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 108346B008A; Wed, 27 Apr 2022 12:07:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 023E16B0088 for ; Wed, 27 Apr 2022 12:07:47 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id BC9291213B2 for ; Wed, 27 Apr 2022 16:07:46 +0000 (UTC) X-FDA: 79403139732.08.93310EA Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf16.hostedemail.com (Postfix) with ESMTP id A67B8180082 for ; Wed, 27 Apr 2022 16:07:29 +0000 (UTC) Received: by mail-pg1-f174.google.com with SMTP id x12so1804525pgj.7 for ; Wed, 27 Apr 2022 09:07:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=K/0K4rhg5IgvB73PlrQVfwuUAVK4bF/MkHixi5HKK+E=; b=azivg7zPNFD5+jwnj9eFiWNBsth+HzyDUKu7UpW+HW+EAUT65CWXq+g8WatJBzyg0R FPMFrc1b90UaLW2+spq18zhQHWkbYd/pQ9OtyW/dgSVVO3LLHoAjuglyE/vzbXXvNnxI epAvtx3YTJjoMiKDnpqZKdZw0spF3AQr9u43CA93HTyPZNtJtiMw3g4ARpKwgGkNwbSN iD4nkLl4hzfhmWu4p0q5AudvxQW5FvrVSXuQUYgqK8a5Of0BcTlTt6cBGn2+xhQAwJsw hrLmwUG3spQAsfoUacVCbmo5Gf13VOqRVv8r9Hhzz+R2GEG3b2gnL5W3YohnKcKEOPUN T5GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=K/0K4rhg5IgvB73PlrQVfwuUAVK4bF/MkHixi5HKK+E=; b=RZ418jbnvdxSWtJ9s2Fw29pM+jRYJGWZJUPF45av7/gzPGjpw37XN125HCbx4mJ3L5 FNRVUQcbjKWCLPAHbOkhC96pdp8u6YOOj/rXGn/U5lFZJNsH9laGqB+Av6Fn6dHwvNWS a/jZEuXi31FKpgXs6HxHCw5GSmAU/S8VM7DyiDCZlIHEyY5alvVWfWQq8bm1N3fYdxRO j/JQ5qU9Hay9Y6NAOU+szPJDLcPI3SXQU2Cwvh9paSx3wigi5jVokqGLuoUZMR6SVOmQ CSUffLax0MS0rWHtu7FZ6UatbO/eWYcjtzpe0xXE33TfVDLJe6d0iZC46HV+ot86iaEP qKHA== X-Gm-Message-State: AOAM532YG0UY0JjTuYat1nN/GNzXCK5/rjQd2RatTrnBojndtQWpAImV Uf4fCN7GfeCW0mUai5omNJFBWQhU1wFehA== X-Google-Smtp-Source: ABdhPJzEwvhzqs1UuJm8LYCNgaMSHPJlROKkK0LCzd4dEmfy2ScYs1a/bOe1K6xjltmKgT6YFs5Gsw== X-Received: by 2002:a05:6214:2304:b0:438:458e:eafc with SMTP id gc4-20020a056214230400b00438458eeafcmr20372999qvb.118.1651075259325; Wed, 27 Apr 2022 09:00:59 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:f617]) by smtp.gmail.com with ESMTPSA id f11-20020a05620a12eb00b0069c88d15b6asm8032759qkl.68.2022.04.27.09.00.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 09:00:59 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Seth Jennings , Dan Streetman , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 2/5] mm: Kconfig: group swap, slab, hotplug and thp options into submenus Date: Wed, 27 Apr 2022 12:00:13 -0400 Message-Id: <20220427160016.144237-3-hannes@cmpxchg.org> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220427160016.144237-1-hannes@cmpxchg.org> References: <20220427160016.144237-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: A67B8180082 X-Stat-Signature: eqjudexw47c1xoz4a6qs6cq8zyebn97k Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=azivg7zP; spf=temperror (imf16.hostedemail.com: error in processing during lookup of hannes@cmpxchg.org: DNS error) smtp.mailfrom=hannes@cmpxchg.org X-HE-Tag: 1651075649-979156 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There are several clusters of related config options spread throughout the mostly flat MM submenu. Group them together and put specialization options into further subdirectories to make the MM submenu a bit more organized and easier to navigate. Signed-off-by: Johannes Weiner --- mm/Kconfig | 429 +++++++++++++++++++++++++++-------------------------- 1 file changed, 221 insertions(+), 208 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index 675a6be43739..2c5935a28edf 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -9,7 +9,7 @@ menu "Memory Management options" config ARCH_NO_SWAP bool -config SWAP +menuconfig SWAP bool "Support for paging of anonymous memory (swap)" depends on MMU && BLOCK && !ARCH_NO_SWAP default y @@ -19,6 +19,191 @@ config SWAP used to provide more virtual memory than the actual RAM present in your computer. If unsure say Y. +config ZSWAP + bool "Compressed cache for swap pages (EXPERIMENTAL)" + depends on SWAP && CRYPTO=y + select FRONTSWAP + select ZPOOL + help + A lightweight compressed cache for swap pages. It takes + pages that are in the process of being swapped out and attempts to + compress them into a dynamically allocated RAM-based memory pool. + This can result in a significant I/O reduction on swap device and, + in the case where decompressing from RAM is faster that swap device + reads, can also improve workload performance. + + This is marked experimental because it is a new feature (as of + v3.11) that interacts heavily with memory reclaim. While these + interactions don't cause any known issues on simple memory setups, + they have not be fully explored on the large set of potential + configurations and workloads that exist. + +choice + prompt "Compressed cache for swap pages default compressor" + depends on ZSWAP + default ZSWAP_COMPRESSOR_DEFAULT_LZO + help + Selects the default compression algorithm for the compressed cache + for swap pages. + + For an overview what kind of performance can be expected from + a particular compression algorithm please refer to the benchmarks + available at the following LWN page: + https://lwn.net/Articles/751795/ + + If in doubt, select 'LZO'. + + The selection made here can be overridden by using the kernel + command line 'zswap.compressor=' option. + +config ZSWAP_COMPRESSOR_DEFAULT_DEFLATE + bool "Deflate" + select CRYPTO_DEFLATE + help + Use the Deflate algorithm as the default compression algorithm. + +config ZSWAP_COMPRESSOR_DEFAULT_LZO + bool "LZO" + select CRYPTO_LZO + help + Use the LZO algorithm as the default compression algorithm. + +config ZSWAP_COMPRESSOR_DEFAULT_842 + bool "842" + select CRYPTO_842 + help + Use the 842 algorithm as the default compression algorithm. + +config ZSWAP_COMPRESSOR_DEFAULT_LZ4 + bool "LZ4" + select CRYPTO_LZ4 + help + Use the LZ4 algorithm as the default compression algorithm. + +config ZSWAP_COMPRESSOR_DEFAULT_LZ4HC + bool "LZ4HC" + select CRYPTO_LZ4HC + help + Use the LZ4HC algorithm as the default compression algorithm. + +config ZSWAP_COMPRESSOR_DEFAULT_ZSTD + bool "zstd" + select CRYPTO_ZSTD + help + Use the zstd algorithm as the default compression algorithm. +endchoice + +config ZSWAP_COMPRESSOR_DEFAULT + string + depends on ZSWAP + default "deflate" if ZSWAP_COMPRESSOR_DEFAULT_DEFLATE + default "lzo" if ZSWAP_COMPRESSOR_DEFAULT_LZO + default "842" if ZSWAP_COMPRESSOR_DEFAULT_842 + default "lz4" if ZSWAP_COMPRESSOR_DEFAULT_LZ4 + default "lz4hc" if ZSWAP_COMPRESSOR_DEFAULT_LZ4HC + default "zstd" if ZSWAP_COMPRESSOR_DEFAULT_ZSTD + default "" + +choice + prompt "Compressed cache for swap pages default allocator" + depends on ZSWAP + default ZSWAP_ZPOOL_DEFAULT_ZBUD + help + Selects the default allocator for the compressed cache for + swap pages. + The default is 'zbud' for compatibility, however please do + read the description of each of the allocators below before + making a right choice. + + The selection made here can be overridden by using the kernel + command line 'zswap.zpool=' option. + +config ZSWAP_ZPOOL_DEFAULT_ZBUD + bool "zbud" + select ZBUD + help + Use the zbud allocator as the default allocator. + +config ZSWAP_ZPOOL_DEFAULT_Z3FOLD + bool "z3fold" + select Z3FOLD + help + Use the z3fold allocator as the default allocator. + +config ZSWAP_ZPOOL_DEFAULT_ZSMALLOC + bool "zsmalloc" + select ZSMALLOC + help + Use the zsmalloc allocator as the default allocator. +endchoice + +config ZSWAP_ZPOOL_DEFAULT + string + depends on ZSWAP + default "zbud" if ZSWAP_ZPOOL_DEFAULT_ZBUD + default "z3fold" if ZSWAP_ZPOOL_DEFAULT_Z3FOLD + default "zsmalloc" if ZSWAP_ZPOOL_DEFAULT_ZSMALLOC + default "" + +config ZSWAP_DEFAULT_ON + bool "Enable the compressed cache for swap pages by default" + depends on ZSWAP + help + If selected, the compressed cache for swap pages will be enabled + at boot, otherwise it will be disabled. + + The selection made here can be overridden by using the kernel + command line 'zswap.enabled=' option. + +config ZPOOL + tristate "Common API for compressed memory storage" + depends on ZSWAP + help + Compressed memory storage API. This allows using either zbud or + zsmalloc. + +config ZBUD + tristate "Low (Up to 2x) density storage for compressed pages" + depends on ZPOOL + help + A special purpose allocator for storing compressed pages. + It is designed to store up to two compressed pages per physical + page. While this design limits storage density, it has simple and + deterministic reclaim properties that make it preferable to a higher + density approach when reclaim will be used. + +config Z3FOLD + tristate "Up to 3x density storage for compressed pages" + depends on ZPOOL + help + A special purpose allocator for storing compressed pages. + It is designed to store up to three compressed pages per physical + page. It is a ZBUD derivative so the simplicity and determinism are + still there. + +config ZSMALLOC + tristate "Memory allocator for compressed pages" + depends on MMU + help + zsmalloc is a slab-based memory allocator designed to store + compressed RAM pages. zsmalloc uses virtual memory mapping + in order to reduce fragmentation. However, this results in a + non-standard allocator interface where a handle, not a pointer, is + returned by an alloc(). This handle must be mapped in order to + access the allocated space. + +config ZSMALLOC_STAT + bool "Export zsmalloc statistics" + depends on ZSMALLOC + select DEBUG_FS + help + This option enables code in the zsmalloc to collect various + statistics about what's happening in zsmalloc and exports that + information to userspace via debugfs. + If unsure, say N. + +menu "SLAB allocator options" + choice prompt "Choose SLAB allocator" default SLUB @@ -90,6 +275,19 @@ config SLAB_FREELIST_HARDENED sanity-checking than others. This option is most effective with CONFIG_SLUB. +config SLUB_CPU_PARTIAL + default y + depends on SLUB && SMP + bool "SLUB per cpu partial cache" + help + Per cpu partial caches accelerate objects allocation and freeing + that is local to a processor at the price of more indeterminism + in the latency of the free. On overflow these caches will be cleared + which requires the taking of locks that may cause latency spikes. + Typically one would choose no for a realtime system. + +endmenu # SLAB allocator options + config SHUFFLE_PAGE_ALLOCATOR bool "Page allocator randomization" default SLAB_FREELIST_RANDOM && ACPI_NUMA @@ -114,17 +312,6 @@ config SHUFFLE_PAGE_ALLOCATOR Say Y if unsure. -config SLUB_CPU_PARTIAL - default y - depends on SLUB && SMP - bool "SLUB per cpu partial cache" - help - Per cpu partial caches accelerate objects allocation and freeing - that is local to a processor at the price of more indeterminism - in the latency of the free. On overflow these caches will be cleared - which requires the taking of locks that may cause latency spikes. - Typically one would choose no for a realtime system. - config SELECT_MEMORY_MODEL def_bool y depends on ARCH_SELECT_MEMORY_MODEL @@ -250,14 +437,16 @@ config ARCH_ENABLE_MEMORY_HOTPLUG bool # eventually, we can have this option just 'select SPARSEMEM' -config MEMORY_HOTPLUG - bool "Allow for memory hot-add" +menuconfig MEMORY_HOTPLUG + bool "Memory hotplug" select MEMORY_ISOLATION depends on SPARSEMEM depends on ARCH_ENABLE_MEMORY_HOTPLUG depends on 64BIT select NUMA_KEEP_MEMINFO if NUMA +if MEMORY_HOTPLUG + config MEMORY_HOTPLUG_DEFAULT_ONLINE bool "Online the newly added memory blocks by default" depends on MEMORY_HOTPLUG @@ -287,6 +476,8 @@ config MHP_MEMMAP_ON_MEMORY depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE +endif # MEMORY_HOTPLUG + # Heavily threaded applications may benefit from splitting the mm-wide # page_table_lock, so that faults on different parts of the user address # space can be handled with less contention: split it at this NR_CPUS. @@ -501,7 +692,7 @@ config NOMMU_INITIAL_TRIM_EXCESS See Documentation/admin-guide/mm/nommu-mmap.rst for more information. -config TRANSPARENT_HUGEPAGE +menuconfig TRANSPARENT_HUGEPAGE bool "Transparent Hugepage Support" depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE && !PREEMPT_RT select COMPACTION @@ -516,6 +707,8 @@ config TRANSPARENT_HUGEPAGE If memory constrained on embedded, you may want to say N. +if TRANSPARENT_HUGEPAGE + choice prompt "Transparent Hugepage Support sysfs defaults" depends on TRANSPARENT_HUGEPAGE @@ -556,6 +749,19 @@ config THP_SWAP For selection by architectures with reasonable THP sizes. +config READ_ONLY_THP_FOR_FS + bool "Read-only THP for filesystems (EXPERIMENTAL)" + depends on TRANSPARENT_HUGEPAGE && SHMEM + + help + Allow khugepaged to put read-only file-backed pages in THP. + + This is marked experimental because it is a new feature. Write + support of file THPs will be developed in the next few release + cycles. + +endif # TRANSPARENT_HUGEPAGE + # # UP and nommu archs use km based percpu allocator # @@ -640,188 +846,6 @@ config MEM_SOFT_DIRTY See Documentation/admin-guide/mm/soft-dirty.rst for more details. -config ZSWAP - bool "Compressed cache for swap pages (EXPERIMENTAL)" - depends on SWAP && CRYPTO=y - select FRONTSWAP - select ZPOOL - help - A lightweight compressed cache for swap pages. It takes - pages that are in the process of being swapped out and attempts to - compress them into a dynamically allocated RAM-based memory pool. - This can result in a significant I/O reduction on swap device and, - in the case where decompressing from RAM is faster that swap device - reads, can also improve workload performance. - - This is marked experimental because it is a new feature (as of - v3.11) that interacts heavily with memory reclaim. While these - interactions don't cause any known issues on simple memory setups, - they have not be fully explored on the large set of potential - configurations and workloads that exist. - -choice - prompt "Compressed cache for swap pages default compressor" - depends on ZSWAP - default ZSWAP_COMPRESSOR_DEFAULT_LZO - help - Selects the default compression algorithm for the compressed cache - for swap pages. - - For an overview what kind of performance can be expected from - a particular compression algorithm please refer to the benchmarks - available at the following LWN page: - https://lwn.net/Articles/751795/ - - If in doubt, select 'LZO'. - - The selection made here can be overridden by using the kernel - command line 'zswap.compressor=' option. - -config ZSWAP_COMPRESSOR_DEFAULT_DEFLATE - bool "Deflate" - select CRYPTO_DEFLATE - help - Use the Deflate algorithm as the default compression algorithm. - -config ZSWAP_COMPRESSOR_DEFAULT_LZO - bool "LZO" - select CRYPTO_LZO - help - Use the LZO algorithm as the default compression algorithm. - -config ZSWAP_COMPRESSOR_DEFAULT_842 - bool "842" - select CRYPTO_842 - help - Use the 842 algorithm as the default compression algorithm. - -config ZSWAP_COMPRESSOR_DEFAULT_LZ4 - bool "LZ4" - select CRYPTO_LZ4 - help - Use the LZ4 algorithm as the default compression algorithm. - -config ZSWAP_COMPRESSOR_DEFAULT_LZ4HC - bool "LZ4HC" - select CRYPTO_LZ4HC - help - Use the LZ4HC algorithm as the default compression algorithm. - -config ZSWAP_COMPRESSOR_DEFAULT_ZSTD - bool "zstd" - select CRYPTO_ZSTD - help - Use the zstd algorithm as the default compression algorithm. -endchoice - -config ZSWAP_COMPRESSOR_DEFAULT - string - depends on ZSWAP - default "deflate" if ZSWAP_COMPRESSOR_DEFAULT_DEFLATE - default "lzo" if ZSWAP_COMPRESSOR_DEFAULT_LZO - default "842" if ZSWAP_COMPRESSOR_DEFAULT_842 - default "lz4" if ZSWAP_COMPRESSOR_DEFAULT_LZ4 - default "lz4hc" if ZSWAP_COMPRESSOR_DEFAULT_LZ4HC - default "zstd" if ZSWAP_COMPRESSOR_DEFAULT_ZSTD - default "" - -choice - prompt "Compressed cache for swap pages default allocator" - depends on ZSWAP - default ZSWAP_ZPOOL_DEFAULT_ZBUD - help - Selects the default allocator for the compressed cache for - swap pages. - The default is 'zbud' for compatibility, however please do - read the description of each of the allocators below before - making a right choice. - - The selection made here can be overridden by using the kernel - command line 'zswap.zpool=' option. - -config ZSWAP_ZPOOL_DEFAULT_ZBUD - bool "zbud" - select ZBUD - help - Use the zbud allocator as the default allocator. - -config ZSWAP_ZPOOL_DEFAULT_Z3FOLD - bool "z3fold" - select Z3FOLD - help - Use the z3fold allocator as the default allocator. - -config ZSWAP_ZPOOL_DEFAULT_ZSMALLOC - bool "zsmalloc" - select ZSMALLOC - help - Use the zsmalloc allocator as the default allocator. -endchoice - -config ZSWAP_ZPOOL_DEFAULT - string - depends on ZSWAP - default "zbud" if ZSWAP_ZPOOL_DEFAULT_ZBUD - default "z3fold" if ZSWAP_ZPOOL_DEFAULT_Z3FOLD - default "zsmalloc" if ZSWAP_ZPOOL_DEFAULT_ZSMALLOC - default "" - -config ZSWAP_DEFAULT_ON - bool "Enable the compressed cache for swap pages by default" - depends on ZSWAP - help - If selected, the compressed cache for swap pages will be enabled - at boot, otherwise it will be disabled. - - The selection made here can be overridden by using the kernel - command line 'zswap.enabled=' option. - -config ZPOOL - tristate "Common API for compressed memory storage" - help - Compressed memory storage API. This allows using either zbud or - zsmalloc. - -config ZBUD - tristate "Low (Up to 2x) density storage for compressed pages" - depends on ZPOOL - help - A special purpose allocator for storing compressed pages. - It is designed to store up to two compressed pages per physical - page. While this design limits storage density, it has simple and - deterministic reclaim properties that make it preferable to a higher - density approach when reclaim will be used. - -config Z3FOLD - tristate "Up to 3x density storage for compressed pages" - depends on ZPOOL - help - A special purpose allocator for storing compressed pages. - It is designed to store up to three compressed pages per physical - page. It is a ZBUD derivative so the simplicity and determinism are - still there. - -config ZSMALLOC - tristate "Memory allocator for compressed pages" - depends on MMU - help - zsmalloc is a slab-based memory allocator designed to store - compressed RAM pages. zsmalloc uses virtual memory mapping - in order to reduce fragmentation. However, this results in a - non-standard allocator interface where a handle, not a pointer, is - returned by an alloc(). This handle must be mapped in order to - access the allocated space. - -config ZSMALLOC_STAT - bool "Export zsmalloc statistics" - depends on ZSMALLOC - select DEBUG_FS - help - This option enables code in the zsmalloc to collect various - statistics about what's happening in zsmalloc and exports that - information to userspace via debugfs. - If unsure, say N. - config GENERIC_EARLY_IOREMAP bool @@ -978,17 +1002,6 @@ comment "GUP_TEST needs to have DEBUG_FS enabled" config GUP_GET_PTE_LOW_HIGH bool -config READ_ONLY_THP_FOR_FS - bool "Read-only THP for filesystems (EXPERIMENTAL)" - depends on TRANSPARENT_HUGEPAGE && SHMEM - - help - Allow khugepaged to put read-only file-backed pages in THP. - - This is marked experimental because it is a new feature. Write - support of file THPs will be developed in the next few release - cycles. - config ARCH_HAS_PTE_SPECIAL bool From patchwork Wed Apr 27 16:00:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 12829008 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D34ACC433F5 for ; Wed, 27 Apr 2022 16:01:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35B016B0074; Wed, 27 Apr 2022 12:01:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E4716B0075; Wed, 27 Apr 2022 12:01:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0B6596B007B; Wed, 27 Apr 2022 12:01:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id C92BE6B0074 for ; Wed, 27 Apr 2022 12:01:06 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A3F89288F5 for ; Wed, 27 Apr 2022 16:01:06 +0000 (UTC) X-FDA: 79403122932.24.0A362A8 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) by imf08.hostedemail.com (Postfix) with ESMTP id 500DF16004F for ; Wed, 27 Apr 2022 16:00:57 +0000 (UTC) Received: by mail-qk1-f176.google.com with SMTP id i2so1588483qke.12 for ; Wed, 27 Apr 2022 09:01:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vinZWgoZEojVZkBqAXTBbUFaPvnoX78F1BhFbj27rbo=; b=ZFL07umLq+hzQI4JolWQKoGpTL0cqgEy7zHkI4QwTmDPIsOs1WMGvS7YKH6pyMPPxP UDKKdA1rtfogn3yQj8wj+jjyPk5ABVEcfAZYXVUsDsIrVEgPqWCDmxa71Lf1xkrJfYFn dh1oWnFrnf6S/5Ok7DskR7ExlS/0oJ42h+aEi7TG0jLCb9E2yQAtyGS5JPjx+CE9kOku U73pXh265MAmidgNSHGd4assyIkcTVYgjYAuN24Xt2l31JS3jXVQJVnM/XNry4qtPkr7 Cwb54hSIIw959Ok1y0OGpocs6mYukblcRcQkqv3Lmy7oZkOLxZmQ65fW2zrNWY63dR2K u1Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vinZWgoZEojVZkBqAXTBbUFaPvnoX78F1BhFbj27rbo=; b=rjCc0B2aUcB2yFRv434pexvIBIl0wn7bBv/yRuRcGbN6w6exjrnkbfSTy/qK2E6Biz IIE/8lAYDAyXWiEzonKYR8lNv5Txg0jF26NC7rBtgNeZP7h4oEk6vxPnCx1iCyLgEDLk r346bfcEEjlEpCpfl23n/rhjeaO3ih1wB4frZYWDomaMBAW5qaLX6Fa6SP6JdX8lE7OR +f8g1cZNHoRjkX73tjgK03n6QDfLmFE1i77zOlUwLBFLPPqIc/gs1s50A+obBCKuPw9d 7sSrTw5ODcGNflNaPpRlxdRbkDPYOZlZiuQ8nuUML3oYF8j6KTOjHdCFiVgEUJsnJTnn uXpA== X-Gm-Message-State: AOAM530Fn6Bm2e9VJJ8ZZm67DactjWvIIk9XS0PorR5rUKWuvT8rCW1L yL8UeaNEKW0J/01kJks/2rW81w== X-Google-Smtp-Source: ABdhPJzRBAx6aLov/suPqj6dRMHWr7HI8Dkvw0gLToUQyrwMqcKGOlz+ixNHNMGVvP7o8B/92dk1GA== X-Received: by 2002:a05:620a:bd4:b0:47b:4c75:894e with SMTP id s20-20020a05620a0bd400b0047b4c75894emr16456770qki.425.1651075261659; Wed, 27 Apr 2022 09:01:01 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:f617]) by smtp.gmail.com with ESMTPSA id v12-20020a05620a0a8c00b0069eabadd6dasm7954916qkg.41.2022.04.27.09.01.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 09:01:01 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Seth Jennings , Dan Streetman , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 3/5] mm: Kconfig: simplify zswap configuration Date: Wed, 27 Apr 2022 12:00:14 -0400 Message-Id: <20220427160016.144237-4-hannes@cmpxchg.org> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220427160016.144237-1-hannes@cmpxchg.org> References: <20220427160016.144237-1-hannes@cmpxchg.org> MIME-Version: 1.0 Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=ZFL07umL; spf=pass (imf08.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.176 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 500DF16004F X-Stat-Signature: qj5fondxf9ik58hut3gebnjygyteiijw X-HE-Tag: 1651075257-437936 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: - CONFIG_ZRAM: Zram is a user-facing feature, whereas zsmalloc is not. Don't make the user chase down a technical dependency like that, just select it in automatically when zram is requested. The CONFIG_CRYPTO dependency is redundant due to more specific deps. - CONFIG_ZPOOL: This is not a user-facing feature. Hide the symbol and have it selected in as needed. - CONFIG_ZSWAP: Select CRYPTO instead of depend. Common pattern. - Make the ZSWAP suboptions and their descriptions (compression, allocation backend) a bit more straight-forward for the user. Signed-off-by: Johannes Weiner --- drivers/block/zram/Kconfig | 3 ++- mm/Kconfig | 55 +++++++++++++++++--------------------- 2 files changed, 27 insertions(+), 31 deletions(-) diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index 668c6bf2554d..e4163d4b936b 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -1,8 +1,9 @@ # SPDX-License-Identifier: GPL-2.0 config ZRAM tristate "Compressed RAM block device support" - depends on BLOCK && SYSFS && ZSMALLOC && CRYPTO + depends on BLOCK && SYSFS depends on CRYPTO_LZO || CRYPTO_ZSTD || CRYPTO_LZ4 || CRYPTO_LZ4HC || CRYPTO_842 + select ZSMALLOC help Creates virtual block devices called /dev/zramX (X = 0, 1, ...). Pages written to these disks are compressed and stored in memory diff --git a/mm/Kconfig b/mm/Kconfig index 2c5935a28edf..c87ffd0d98b3 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -9,6 +9,9 @@ menu "Memory Management options" config ARCH_NO_SWAP bool +config ZPOOL + bool + menuconfig SWAP bool "Support for paging of anonymous memory (swap)" depends on MMU && BLOCK && !ARCH_NO_SWAP @@ -21,8 +24,9 @@ menuconfig SWAP config ZSWAP bool "Compressed cache for swap pages (EXPERIMENTAL)" - depends on SWAP && CRYPTO=y + depends on SWAP select FRONTSWAP + select CRYPTO select ZPOOL help A lightweight compressed cache for swap pages. It takes @@ -38,8 +42,18 @@ config ZSWAP they have not be fully explored on the large set of potential configurations and workloads that exist. +config ZSWAP_DEFAULT_ON + bool "Enable the compressed cache for swap pages by default" + depends on ZSWAP + help + If selected, the compressed cache for swap pages will be enabled + at boot, otherwise it will be disabled. + + The selection made here can be overridden by using the kernel + command line 'zswap.enabled=' option. + choice - prompt "Compressed cache for swap pages default compressor" + prompt "Default compressor" depends on ZSWAP default ZSWAP_COMPRESSOR_DEFAULT_LZO help @@ -105,7 +119,7 @@ config ZSWAP_COMPRESSOR_DEFAULT default "" choice - prompt "Compressed cache for swap pages default allocator" + prompt "Default allocator" depends on ZSWAP default ZSWAP_ZPOOL_DEFAULT_ZBUD help @@ -145,26 +159,9 @@ config ZSWAP_ZPOOL_DEFAULT default "zsmalloc" if ZSWAP_ZPOOL_DEFAULT_ZSMALLOC default "" -config ZSWAP_DEFAULT_ON - bool "Enable the compressed cache for swap pages by default" - depends on ZSWAP - help - If selected, the compressed cache for swap pages will be enabled - at boot, otherwise it will be disabled. - - The selection made here can be overridden by using the kernel - command line 'zswap.enabled=' option. - -config ZPOOL - tristate "Common API for compressed memory storage" - depends on ZSWAP - help - Compressed memory storage API. This allows using either zbud or - zsmalloc. - config ZBUD - tristate "Low (Up to 2x) density storage for compressed pages" - depends on ZPOOL + tristate "2:1 compression allocator (zbud)" + depends on ZSWAP help A special purpose allocator for storing compressed pages. It is designed to store up to two compressed pages per physical @@ -173,8 +170,8 @@ config ZBUD density approach when reclaim will be used. config Z3FOLD - tristate "Up to 3x density storage for compressed pages" - depends on ZPOOL + tristate "3:1 compression allocator (z3fold)" + depends on ZSWAP help A special purpose allocator for storing compressed pages. It is designed to store up to three compressed pages per physical @@ -182,15 +179,13 @@ config Z3FOLD still there. config ZSMALLOC - tristate "Memory allocator for compressed pages" + tristate + prompt "N:1 compression allocator (zsmalloc)" if ZSWAP depends on MMU help zsmalloc is a slab-based memory allocator designed to store - compressed RAM pages. zsmalloc uses virtual memory mapping - in order to reduce fragmentation. However, this results in a - non-standard allocator interface where a handle, not a pointer, is - returned by an alloc(). This handle must be mapped in order to - access the allocated space. + pages of various compression levels efficiently. It achieves + the highest storage density with the least amount of fragmentation. config ZSMALLOC_STAT bool "Export zsmalloc statistics" From patchwork Wed Apr 27 16:00:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 12829007 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10845C433EF for ; Wed, 27 Apr 2022 16:01:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E31516B0071; Wed, 27 Apr 2022 12:01:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DC89C6B0075; Wed, 27 Apr 2022 12:01:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9EAFB6B0078; Wed, 27 Apr 2022 12:01:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 8217B6B0074 for ; Wed, 27 Apr 2022 12:01:06 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 54EA8288F5 for ; Wed, 27 Apr 2022 16:01:06 +0000 (UTC) X-FDA: 79403122932.20.119D3F7 Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) by imf01.hostedemail.com (Postfix) with ESMTP id 5A8A840058 for ; Wed, 27 Apr 2022 16:01:00 +0000 (UTC) Received: by mail-qt1-f179.google.com with SMTP id t11so1437499qto.11 for ; Wed, 27 Apr 2022 09:01:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZBgzXdZ466rlUwUQs0EI6DMIFSVYYfxcwIa0A3d1nko=; b=fhTowtKO2j1gIyehsLjMg/Ft2943puC7KRcfEZIVbM5qcRk82f1+ST9TguF/qrnxuX a9bXww4C8aR196N7uarXdk5H0jokAohUVeAKI/HORyHGmZ//sbuDt6+a4Qwp1azl2rUE JIbm07to9H1IyQkS8DuWRRfE++12PUX7Sghe34t7kNHf57BKJBB1N2CWsKgZ9C2CMBZ7 e422CKVuaMPKW92Hel8KMu/yHBGk762q8V3UB3iN2J1MjkSaZaEcKTsvc7W5hDRyVK07 WHsfe42LQ8R/fNSho0yYxEdItM0KgOI//PeSkQ2WQ4B8zqqSFuNG6r4uzD/9r/CZiWcC 0SSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZBgzXdZ466rlUwUQs0EI6DMIFSVYYfxcwIa0A3d1nko=; b=fraNeVqcfkv1qNt/tE1ugAZL/Vma5AyZG9xTS+cxRCMVuptMWs/8GqHI5m9iEz74yF WLQRzaC0e5/yPU+U1wfZGN0m/X7aFrwvMO/VxE9ci9wVi/9Ny8XioThzOUQN+t2CxtcZ IwyrhN6Gy0eTJXZg90vHIxHoMVKdDEvbt6CGtJhdOpW+UpU0IfL9XT7BBQTNLtlCm9Fn wVXgblhjLz5eHs3q0m8OVuVPksYxYaA9eTPcPTEDbdr2ywoeNgCGYbT1qpRwkNI0mHF0 89FKmGGR0eZvJcrfQhoWqCr851sxuXV+jXkmIUpha4/3rYZe4Ns66OLP5aN7vQLPmHrF O1dw== X-Gm-Message-State: AOAM533D8DTj1GuVbNfDWaaB609FWoGaw55u8haZeE++LVrsnMhTSp0X HDGCx/INnn4EmtEKcVaSxCKlFA== X-Google-Smtp-Source: ABdhPJxXvzVjG81fUfBnZdK0tLhIy90Z/zLg6n1UAWZwTJ3z7Kz21ZTF6uUj9GZ0GVWofbPhgz5MPQ== X-Received: by 2002:a05:622a:342:b0:2f3:5c21:1bed with SMTP id r2-20020a05622a034200b002f35c211bedmr17432648qtw.123.1651075263679; Wed, 27 Apr 2022 09:01:03 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:f617]) by smtp.gmail.com with ESMTPSA id y7-20020a05620a0e0700b00699a30d6d10sm7837059qkm.111.2022.04.27.09.01.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 09:01:03 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Seth Jennings , Dan Streetman , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 4/5] mm: zswap: add basic meminfo and vmstat coverage Date: Wed, 27 Apr 2022 12:00:15 -0400 Message-Id: <20220427160016.144237-5-hannes@cmpxchg.org> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220427160016.144237-1-hannes@cmpxchg.org> References: <20220427160016.144237-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5A8A840058 X-Stat-Signature: 1bb5u7kq7o87no1a8o3na94pipfpzgi1 X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=fhTowtKO; spf=pass (imf01.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.179 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org X-HE-Tag: 1651075260-5441 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently it requires poking at debugfs to figure out the size and population of the zswap cache on a host. There are no counters for reads and writes against the cache. As a result, it's difficult to understand zswap behavior on production systems. Print zswap memory consumption and how many pages are zswapped out in /proc/meminfo. Count zswapouts and zswapins in /proc/vmstat. Signed-off-by: Johannes Weiner Signed-off-by: Johannes Weiner Signed-off-by: Johannes Weiner Signed-off-by: Johannes Weiner Nacked-by: Johannes Weiner --- fs/proc/meminfo.c | 7 +++++++ include/linux/swap.h | 5 +++++ include/linux/vm_event_item.h | 4 ++++ mm/vmstat.c | 4 ++++ mm/zswap.c | 13 ++++++------- 5 files changed, 26 insertions(+), 7 deletions(-) diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 6fa761c9cc78..6e89f0e2fd20 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -86,6 +86,13 @@ static int meminfo_proc_show(struct seq_file *m, void *v) show_val_kb(m, "SwapTotal: ", i.totalswap); show_val_kb(m, "SwapFree: ", i.freeswap); +#ifdef CONFIG_ZSWAP + seq_printf(m, "Zswap: %8lu kB\n", + (unsigned long)(zswap_pool_total_size >> 10)); + seq_printf(m, "Zswapped: %8lu kB\n", + (unsigned long)atomic_read(&zswap_stored_pages) << + (PAGE_SHIFT - 10)); +#endif show_val_kb(m, "Dirty: ", global_node_page_state(NR_FILE_DIRTY)); show_val_kb(m, "Writeback: ", diff --git a/include/linux/swap.h b/include/linux/swap.h index b82c196d8867..07074afa79a7 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -632,6 +632,11 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem) } #endif +#ifdef CONFIG_ZSWAP +extern u64 zswap_pool_total_size; +extern atomic_t zswap_stored_pages; +#endif + #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP) extern void __cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask); static inline void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 5e80138ce624..1ce8fadb2b1c 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -132,6 +132,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_KSM COW_KSM, #endif +#ifdef CONFIG_ZSWAP + ZSWPIN, + ZSWPOUT, +#endif #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, DIRECT_MAP_LEVEL3_SPLIT, diff --git a/mm/vmstat.c b/mm/vmstat.c index 4a2aa2fa88db..da7e389cf33c 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1392,6 +1392,10 @@ const char * const vmstat_text[] = { #ifdef CONFIG_KSM "cow_ksm", #endif +#ifdef CONFIG_ZSWAP + "zswpin", + "zswpout", +#endif #ifdef CONFIG_X86 "direct_map_level2_splits", "direct_map_level3_splits", diff --git a/mm/zswap.c b/mm/zswap.c index 2c5db4cbedea..e3c16a70f533 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -42,9 +42,9 @@ * statistics **********************************/ /* Total bytes used by the compressed storage */ -static u64 zswap_pool_total_size; +u64 zswap_pool_total_size; /* The number of compressed pages currently stored in zswap */ -static atomic_t zswap_stored_pages = ATOMIC_INIT(0); +atomic_t zswap_stored_pages = ATOMIC_INIT(0); /* The number of same-value filled pages currently stored in zswap */ static atomic_t zswap_same_filled_pages = ATOMIC_INIT(0); @@ -1243,6 +1243,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, /* update stats */ atomic_inc(&zswap_stored_pages); zswap_update_total_size(); + count_vm_event(ZSWPOUT); return 0; @@ -1285,11 +1286,10 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, zswap_fill_page(dst, entry->value); kunmap_atomic(dst); ret = 0; - goto freeentry; + goto stats; } if (!zpool_can_sleep_mapped(entry->pool->zpool)) { - tmp = kmalloc(entry->length, GFP_ATOMIC); if (!tmp) { ret = -ENOMEM; @@ -1304,10 +1304,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, src += sizeof(struct zswap_header); if (!zpool_can_sleep_mapped(entry->pool->zpool)) { - memcpy(tmp, src, entry->length); src = tmp; - zpool_unmap_handle(entry->pool->zpool, entry->handle); } @@ -1326,7 +1324,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, kfree(tmp); BUG_ON(ret); - +stats: + count_vm_event(ZSWPIN); freeentry: spin_lock(&tree->lock); zswap_entry_put(tree, entry); From patchwork Wed Apr 27 16:00:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 12829010 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAED8C433F5 for ; Wed, 27 Apr 2022 16:01:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D21086B0078; Wed, 27 Apr 2022 12:01:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD1376B007B; Wed, 27 Apr 2022 12:01:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFB356B007D; Wed, 27 Apr 2022 12:01:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 90D096B0078 for ; Wed, 27 Apr 2022 12:01:08 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5D6D621477 for ; Wed, 27 Apr 2022 16:01:08 +0000 (UTC) X-FDA: 79403123016.22.D811E9F Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf08.hostedemail.com (Postfix) with ESMTP id 0592C16006D for ; Wed, 27 Apr 2022 16:01:00 +0000 (UTC) Received: by mail-qt1-f175.google.com with SMTP id ay11so1460234qtb.4 for ; Wed, 27 Apr 2022 09:01:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TLvyvM3ubX4eYoVtvn+FUTceeN1el4qJVnrrDOh0mWM=; b=Mj6svXRADNNDotYLYufLSDG04YhnBsoG/Fd+dRwG13TPkfktR7ziLU3JcvT/kDLER3 7g40ovWfxl9e8P1gJb6yFUTiPU2wozQgwl08cz/oUmyBrpuWFiOgLmHYu8Y+WssH+o0j f+9k8FRWAu/HBfCa2m4VIoL8gI/wcrct0YM6oOjy5Gq2mSGPTqh677WozQQpJrenp2JF u8A6qS7AhRCEZNydSCPSdAow3xEfCxdoOgUAawybiitMJNQf/PaYi8QtNG3Z/m1Q1tzB zYZ86l3+za/gUo3mNwO4Jyh4jJerrHkTsewUcISq2D+w5G1C79d5hnRLKnoVmVH6UAbc pSMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TLvyvM3ubX4eYoVtvn+FUTceeN1el4qJVnrrDOh0mWM=; b=TTMxywX9vwNdUC264bcdSEiSCXziwyEQ2/rzTdWvWTWpJDBehJDjREux/c0ijBWOy3 eXDB47yq5VZTHj9SDRj89cZlhvyx5RhcdEpnkJ03xGskBv+b/F83m7VSn7wD51ymbS4H JsoFHGkwyLtKOBggCqlgyXGtmZ+u+HLnOv26N8a9D4p+wnzfxSmivImYHinruSTdNiyz sljZDZC8VDUfechmzQkMCYBQQcfkGtknv+q3gd7pvOkBz/Pe4A5THCzAEcYv7uaqTNzR pehns3xAiyPESH5hb5BkzS8fEt6UnmgM+GXjD9nGVIO01fenlng0pomWYTiKbwN9YS+V GJPg== X-Gm-Message-State: AOAM5324oHpc1kizuGbYxv/vSRY5FrQQqear9QMhXZbcNOXvESLzM7eK ureD+B878Wwj8Fq67g7DI+VUKSamv83YtA== X-Google-Smtp-Source: ABdhPJwMRPa/5AiNA/EFoCwp0dlqv1bV7wjT4/h8Bj38OzBysf1AgtDIOeaNeP4qTiQOfqvPsaiGww== X-Received: by 2002:a05:622a:14cb:b0:2e1:9fc5:424d with SMTP id u11-20020a05622a14cb00b002e19fc5424dmr19292764qtx.543.1651075265682; Wed, 27 Apr 2022 09:01:05 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:f617]) by smtp.gmail.com with ESMTPSA id o134-20020a37a58c000000b0069f8e381167sm829028qke.76.2022.04.27.09.01.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 09:01:05 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Seth Jennings , Dan Streetman , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 5/5] zswap: memcg accounting Date: Wed, 27 Apr 2022 12:00:16 -0400 Message-Id: <20220427160016.144237-6-hannes@cmpxchg.org> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220427160016.144237-1-hannes@cmpxchg.org> References: <20220427160016.144237-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 0592C16006D X-Stat-Signature: y84hzs5amezbk8re7sfpqaysa6hi1ew9 Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=Mj6svXRA; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf08.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.175 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org X-Rspam-User: X-HE-Tag: 1651075260-49365 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Applications can currently escape their cgroup memory containment when zswap is enabled. This patch adds per-cgroup tracking and limiting of zswap backend memory to rectify this. The existing cgroup2 memory.stat file is extended to show zswap statistics analogous to what's in meminfo and vmstat. Furthermore, two new control files, memory.zswap.current and memory.zswap.max, are added to allow tuning zswap usage on a per-workload basis. This is important since not all workloads benefit from zswap equally; some even suffer compared to disk swap when memory contents don't compress well. The optimal size of the zswap pool, and the threshold for writeback, also depends on the size of the workload's warm set. The implementation doesn't use a traditional page_counter transaction. zswap is unconventional as a memory consumer in that we only know the amount of memory to charge once expensive compression has occurred. If zwap is disabled or the limit is already exceeded we obviously don't want to compress page upon page only to reject them all. Instead, the limit is checked against current usage, then we compress and charge. This allows some limit overrun, but not enough to matter in practice. Signed-off-by: Johannes Weiner --- Documentation/admin-guide/cgroup-v2.rst | 21 +++ include/linux/memcontrol.h | 54 +++++++ mm/memcontrol.c | 196 +++++++++++++++++++++++- mm/zswap.c | 37 ++++- 4 files changed, 293 insertions(+), 15 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 19bcd73cad03..b4c262e99b5f 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1347,6 +1347,12 @@ PAGE_SIZE multiple when read back. Amount of cached filesystem data that is swap-backed, such as tmpfs, shm segments, shared anonymous mmap()s + zswap + Amount of memory consumed by the zswap compression backend. + + zswapped + Amount of application memory swapped out to zswap. + file_mapped Amount of cached filesystem data mapped with mmap() @@ -1537,6 +1543,21 @@ PAGE_SIZE multiple when read back. higher than the limit for an extended period of time. This reduces the impact on the workload and memory management. + memory.zswap.current + A read-only single value file which exists on non-root + cgroups. + + The total amount of memory consumed by the zswap compression + backend. + + memory.zswap.max + A read-write single value file which exists on non-root + cgroups. The default is "max". + + Zswap usage hard limit. If a cgroup's zswap pool reaches this + limit, it will refuse to take any more stores before existing + entries fault back in or are written out to disk. + memory.pressure A read-only nested-keyed file. diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index fe580cb96683..3385ce81ecf3 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -35,6 +35,8 @@ enum memcg_stat_item { MEMCG_PERCPU_B, MEMCG_VMALLOC, MEMCG_KMEM, + MEMCG_ZSWAP_B, + MEMCG_ZSWAPPED, MEMCG_NR_STAT, }; @@ -252,6 +254,10 @@ struct mem_cgroup { /* Range enforcement for interrupt charges */ struct work_struct high_work; +#ifdef CONFIG_ZSWAP + unsigned long zswap_max; +#endif + unsigned long soft_limit; /* vmpressure notifications */ @@ -1264,6 +1270,10 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css) return NULL; } +static inline void obj_cgroup_put(struct obj_cgroup *objcg) +{ +} + static inline void mem_cgroup_put(struct mem_cgroup *memcg) { } @@ -1680,6 +1690,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order); void __memcg_kmem_uncharge_page(struct page *page, int order); struct obj_cgroup *get_obj_cgroup_from_current(void); +struct obj_cgroup *get_obj_cgroup_from_page(struct page *page); int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size); void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size); @@ -1716,6 +1727,20 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg) struct mem_cgroup *mem_cgroup_from_obj(void *p); +static inline void count_objcg_event(struct obj_cgroup *objcg, + enum vm_event_item idx) +{ + struct mem_cgroup *memcg; + + if (mem_cgroup_kmem_disabled()) + return; + + rcu_read_lock(); + memcg = obj_cgroup_memcg(objcg); + count_memcg_events(memcg, idx, 1); + rcu_read_unlock(); +} + #else static inline bool mem_cgroup_kmem_disabled(void) { @@ -1742,6 +1767,11 @@ static inline void __memcg_kmem_uncharge_page(struct page *page, int order) { } +static inline struct obj_cgroup *get_obj_cgroup_from_page(struct page *page) +{ + return NULL; +} + static inline bool memcg_kmem_enabled(void) { return false; @@ -1757,6 +1787,30 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p) return NULL; } +static inline void count_objcg_event(struct obj_cgroup *objcg, + enum vm_event_item idx) +{ +} + #endif /* CONFIG_MEMCG_KMEM */ +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) +bool obj_cgroup_may_zswap(struct obj_cgroup *objcg); +void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size); +void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size); +#else +static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) +{ + return true; +} +static inline void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, + size_t size) +{ +} +static inline void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, + size_t size) +{ +} +#endif + #endif /* _LINUX_MEMCONTROL_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 04cea4fa362a..cbb9b43bdb80 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1398,6 +1398,10 @@ static const struct memory_stat memory_stats[] = { { "sock", MEMCG_SOCK }, { "vmalloc", MEMCG_VMALLOC }, { "shmem", NR_SHMEM }, +#ifdef CONFIG_ZSWAP + { "zswap", MEMCG_ZSWAP_B }, + { "zswapped", MEMCG_ZSWAPPED }, +#endif { "file_mapped", NR_FILE_MAPPED }, { "file_dirty", NR_FILE_DIRTY }, { "file_writeback", NR_WRITEBACK }, @@ -1432,6 +1436,7 @@ static int memcg_page_state_unit(int item) { switch (item) { case MEMCG_PERCPU_B: + case MEMCG_ZSWAP_B: case NR_SLAB_RECLAIMABLE_B: case NR_SLAB_UNRECLAIMABLE_B: case WORKINGSET_REFAULT_ANON: @@ -1512,6 +1517,13 @@ static char *memory_stat_format(struct mem_cgroup *memcg) seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGLAZYFREED), memcg_events(memcg, PGLAZYFREED)); +#ifdef CONFIG_ZSWAP + seq_buf_printf(&s, "%s %lu\n", vm_event_name(ZSWPIN), + memcg_events(memcg, ZSWPIN)); + seq_buf_printf(&s, "%s %lu\n", vm_event_name(ZSWPOUT), + memcg_events(memcg, ZSWPOUT)); +#endif + #ifdef CONFIG_TRANSPARENT_HUGEPAGE seq_buf_printf(&s, "%s %lu\n", vm_event_name(THP_FAULT_ALLOC), memcg_events(memcg, THP_FAULT_ALLOC)); @@ -2883,6 +2895,19 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p) return page_memcg_check(folio_page(folio, 0)); } +static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg) +{ + struct obj_cgroup *objcg = NULL; + + for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) { + objcg = rcu_dereference(memcg->objcg); + if (objcg && obj_cgroup_tryget(objcg)) + break; + objcg = NULL; + } + return objcg; +} + __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) { struct obj_cgroup *objcg = NULL; @@ -2896,15 +2921,32 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) memcg = active_memcg(); else memcg = mem_cgroup_from_task(current); - - for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) { - objcg = rcu_dereference(memcg->objcg); - if (objcg && obj_cgroup_tryget(objcg)) - break; - objcg = NULL; - } + objcg = __get_obj_cgroup_from_memcg(memcg); rcu_read_unlock(); + return objcg; +} + +struct obj_cgroup *get_obj_cgroup_from_page(struct page *page) +{ + struct obj_cgroup *objcg; + + if (!memcg_kmem_enabled() || memcg_kmem_bypass()) + return NULL; + if (PageMemcgKmem(page)) { + objcg = __folio_objcg(page_folio(page)); + obj_cgroup_get(objcg); + } else { + struct mem_cgroup *memcg; + + rcu_read_lock(); + memcg = __folio_memcg(page_folio(page)); + if (memcg) + objcg = __get_obj_cgroup_from_memcg(memcg); + else + objcg = NULL; + rcu_read_unlock(); + } return objcg; } @@ -5142,6 +5184,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX); memcg->soft_limit = PAGE_COUNTER_MAX; +#ifdef CONFIG_ZSWAP + memcg->zswap_max = PAGE_COUNTER_MAX; +#endif page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX); if (parent) { memcg->swappiness = mem_cgroup_swappiness(parent); @@ -7406,6 +7451,139 @@ static struct cftype memsw_files[] = { { }, /* terminate */ }; +#ifdef CONFIG_ZSWAP +/** + * obj_cgroup_may_zswap - check if this cgroup can zswap + * @objcg: the object cgroup + * + * Check if the hierarchical zswap limit has been reached. + * + * This doesn't check for specific headroom, and it is not atomic + * either. But with zswap, the size of the allocation is only known + * once compression has occured, and this optimistic pre-check avoids + * spending cycles on compression when there is already no room left + * or zswap is disabled altogether somewhere in the hierarchy. + */ +bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) +{ + struct mem_cgroup *memcg, *original_memcg; + bool ret = true; + + original_memcg = get_mem_cgroup_from_objcg(objcg); + for (memcg = original_memcg; memcg != root_mem_cgroup; + memcg = parent_mem_cgroup(memcg)) { + unsigned long max = READ_ONCE(memcg->zswap_max); + unsigned long pages; + + if (max == PAGE_COUNTER_MAX) + continue; + if (max == 0) { + ret = false; + break; + } + + cgroup_rstat_flush(memcg->css.cgroup); + pages = memcg_page_state(memcg, MEMCG_ZSWAP_B) / PAGE_SIZE; + if (pages < max) + continue; + ret = false; + break; + } + mem_cgroup_put(original_memcg); + return ret; +} + +/** + * obj_cgroup_charge_zswap - charge compression backend memory + * @objcg: the object cgroup + * @size: size of compressed object + * + * This forces the charge after obj_cgroup_may_swap() allowed + * compression and storage in zwap for this cgroup to go ahead. + */ +void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size) +{ + struct mem_cgroup *memcg; + + VM_WARN_ON_ONCE(!(current->flags & PF_MEMALLOC)); + + /* PF_MEMALLOC context, charging must succeed */ + if (obj_cgroup_charge(objcg, GFP_KERNEL, size)) + VM_WARN_ON_ONCE(1); + + rcu_read_lock(); + memcg = obj_cgroup_memcg(objcg); + mod_memcg_state(memcg, MEMCG_ZSWAP_B, size); + mod_memcg_state(memcg, MEMCG_ZSWAPPED, 1); + rcu_read_unlock(); +} + +/** + * obj_cgroup_uncharge_zswap - uncharge compression backend memory + * @objcg: the object cgroup + * @size: size of compressed object + * + * Uncharges zswap memory on page in. + */ +void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size) +{ + struct mem_cgroup *memcg; + + obj_cgroup_uncharge(objcg, size); + + rcu_read_lock(); + memcg = obj_cgroup_memcg(objcg); + mod_memcg_state(memcg, MEMCG_ZSWAP_B, -size); + mod_memcg_state(memcg, MEMCG_ZSWAPPED, -1); + rcu_read_unlock(); +} + +static u64 zswap_current_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + cgroup_rstat_flush(css->cgroup); + return memcg_page_state(mem_cgroup_from_css(css), MEMCG_ZSWAP_B); +} + +static int zswap_max_show(struct seq_file *m, void *v) +{ + return seq_puts_memcg_tunable(m, + READ_ONCE(mem_cgroup_from_seq(m)->zswap_max)); +} + +static ssize_t zswap_max_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + unsigned long max; + int err; + + buf = strstrip(buf); + err = page_counter_memparse(buf, "max", &max); + if (err) + return err; + + xchg(&memcg->zswap_max, max); + + return nbytes; +} + +static struct cftype zswap_files[] = { + { + .name = "zswap.current", + .flags = CFTYPE_NOT_ON_ROOT, + .read_u64 = zswap_current_read, + }, + { + .name = "zswap.max", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = zswap_max_show, + .write = zswap_max_write, + }, + { } /* terminate */ +}; +#endif /* CONFIG_ZSWAP */ + /* * If mem_cgroup_swap_init() is implemented as a subsys_initcall() * instead of a core_initcall(), this could mean cgroup_memory_noswap still @@ -7424,7 +7602,9 @@ static int __init mem_cgroup_swap_init(void) WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, swap_files)); WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys, memsw_files)); - +#ifdef CONFIG_ZSWAP + WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, zswap_files)); +#endif return 0; } core_initcall(mem_cgroup_swap_init); diff --git a/mm/zswap.c b/mm/zswap.c index e3c16a70f533..104835b379ec 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -188,6 +188,7 @@ struct zswap_entry { unsigned long handle; unsigned long value; }; + struct obj_cgroup *objcg; }; struct zswap_header { @@ -359,6 +360,10 @@ static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) */ static void zswap_free_entry(struct zswap_entry *entry) { + if (entry->objcg) { + obj_cgroup_uncharge_zswap(entry->objcg, entry->length); + obj_cgroup_put(entry->objcg); + } if (!entry->length) atomic_dec(&zswap_same_filled_pages); else { @@ -1096,6 +1101,8 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, struct zswap_entry *entry, *dupentry; struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; + struct obj_cgroup *objcg = NULL; + struct zswap_pool *pool; int ret; unsigned int hlen, dlen = PAGE_SIZE; unsigned long handle, value; @@ -1115,17 +1122,15 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, goto reject; } + objcg = get_obj_cgroup_from_page(page); + if (objcg && !obj_cgroup_may_zswap(objcg)) + goto shrink; + /* reclaim space if needed */ if (zswap_is_full()) { - struct zswap_pool *pool; - zswap_pool_limit_hit++; zswap_pool_reached_full = true; - pool = zswap_pool_last_get(); - if (pool) - queue_work(shrink_wq, &pool->shrink_work); - ret = -ENOMEM; - goto reject; + goto shrink; } if (zswap_pool_reached_full) { @@ -1227,6 +1232,13 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, entry->length = dlen; insert_entry: + entry->objcg = objcg; + if (objcg) { + obj_cgroup_charge_zswap(objcg, entry->length); + /* Account before objcg ref is moved to tree */ + count_objcg_event(objcg, ZSWPOUT); + } + /* map */ spin_lock(&tree->lock); do { @@ -1253,7 +1265,16 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, freepage: zswap_entry_cache_free(entry); reject: + if (objcg) + obj_cgroup_put(objcg); return ret; + +shrink: + pool = zswap_pool_last_get(); + if (pool) + queue_work(shrink_wq, &pool->shrink_work); + ret = -ENOMEM; + goto reject; } /* @@ -1326,6 +1347,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, BUG_ON(ret); stats: count_vm_event(ZSWPIN); + if (entry->objcg) + count_objcg_event(entry->objcg, ZSWPIN); freeentry: spin_lock(&tree->lock); zswap_entry_put(tree, entry);