From patchwork Fri Jul 5 10:28:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13724914 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BAB7C30658 for ; Fri, 5 Jul 2024 10:29:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6724D6B009B; Fri, 5 Jul 2024 06:29:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FB766B009C; Fri, 5 Jul 2024 06:29:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 49A436B009E; Fri, 5 Jul 2024 06:29:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 255526B009B for ; Fri, 5 Jul 2024 06:29:03 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 95DB7A3C21 for ; Fri, 5 Jul 2024 10:29:02 +0000 (UTC) X-FDA: 82305326124.08.76FCEC6 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf13.hostedemail.com (Postfix) with ESMTP id C56802001C for ; Fri, 5 Jul 2024 10:29:00 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720175315; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=vaA4dplw5sCPoHnm0tC4+A6XKfYBLkzM6nrIv610h3Q=; b=KP57aeJLKypqd44jINXMg5fiI92zhQto97alzFzrDH3OHkQgT/oIxhGA+wUmYc1WDOwML7 xHAk2eT7tXr1oya2NXCnNhlhSi/x15XdPHl9tBkTZtrb7Uf/yG2iVH8Y2iseTN+2T5MUE8 y97ivu9xn6Bs5q9x01ezdf7w4eACSEQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720175315; a=rsa-sha256; cv=none; b=luF0pbTydjmSNDMuHOVdnQGAJ1B+NNT6BqxKbm1xsfrQHJS6dlLjwfrBfL1kVRnr74JoFr vxdf39/M1XG1Ur7w/KaFOeNvW23wyciqEUJ+cpRsWozCY29LKG8f0gjBSCW7nCt2ECMlQb w4r2lHe+MSQ+OoH7w1tYdOQEKiLDDfk= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D69BC367; Fri, 5 Jul 2024 03:29:24 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 17CCC3F766; Fri, 5 Jul 2024 03:28:57 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Jonathan Corbet , David Hildenbrand , Barry Song , Baolin Wang , Lance Yang , Yang Shi Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v3] mm: Fix khugepaged activation policy Date: Fri, 5 Jul 2024 11:28:48 +0100 Message-ID: <20240705102849.2479686-1-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C56802001C X-Stat-Signature: 4mo4fyxx5f4z8dx5bpzeej9c6fahj57p X-HE-Tag: 1720175340-1613 X-HE-Meta: U2FsdGVkX197jxVYqWsJqhp35e0n6V9Ec+3R5D9XkdF7RDJiFuu5cK7PC4rFQ5J/fU27NyV8r5SkumUxObil82smwKmX7ubhaO5DPqbDTJ15bcuIsGbOaxUblbuqnRzPOwnwf8et6MIEy1OUqkLD7aDetv0qRM4flH8NTECmkn7/MvAEGflLBNE+B/zxfFZtLGjmlJEz8RqnQO6QG4xnchG3lWYOuj5StytfXmbWR3etOjoWA+keqz77r95XwBvG2osn1zv1l0ew8RXAwlX6DtbwWtMecYIbSUX6emyFTniDixVyxV3Vbv95VV3kEgibUdOeIPrQDwU2JDDs1vBjoqCbVPn34+RHTeyXeNOtqU8CR74pAeP/lSbC6h//Xn45ElnQQfMP2nkV5naDeuCV6IhLEqCaCvNA4o2b/stVcBWmDptXFa2oa59EWqC8mKwFBMK1hKwLrMyr/+13u8S23d1YRYCd8W09F5YX1ZY3wDMjAq0qwGuLDLr78CprgSqH0g8yBdZxWRJnKb/0dN7bKmpPr6ABZNkcrwRU7LKG+KoRMotvGNthwvDIlqbVEuV8BDhXDe+J76FOOWbdqtAin1/throo5dp55YXKfm0eRmbulMXzUbnHNYlPpbndwZYJatTxYyOCY6ltKJuD/3M/Kt7vL4hTp34cbeOVhry3LgJX5Y2sIB46xKK3vYLjsMhKIyJyDW68Bl77lad5SDOp/1YBRKPuz0th4oYLq+htSN3UeNk3gCwxFEGpcdxaubVcCqyYj6yyRuaDd4VuG7p2f5lC6QTgLNMDKBIcvbw+/RrX3vF2XcFHKUhm2UBrlvfXCw221qevIrt9BbwAoleKrLIVltMksbRZ+v6gcL53MfsHGn7lHOf/gaEl9F4Ck9hfnmTEnUTS8PG8ur6EBy+Cc2/4HLCFG0gsTHOQyylhceqn5C5Lm9NDqYW8twRt1pUgwAeRjH+1pFtqHjZoR2T /duairAD zAE2QpvTr19sCGAQRSMy2mqCbriY26jVqUolhza25OX2Z1aDNhGn4TTLCqemsYwx7d4SM2OpBE9mvc7l8Nw2yAHzRYXeV+PefCBNpnbgyySDI/HCWG0TWgSUQ03UxYGkqEcibi/Rr15WE/2TUEb/xUOC6SLNRYFMadQtwTv6BQ4zWdL8HciXD4Um7IisbypSK+vKzYWZ0GX1s2skn0Ann9f7dqkWr+djgfJLgQRmZKzmG0n8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Since the introduction of mTHP, the docuementation has stated that khugepaged would be enabled when any mTHP size is enabled, and disabled when all mTHP sizes are disabled. There are 2 problems with this; 1. this is not what was implemented by the code and 2. this is not the desirable behavior. Desirable behavior is for khugepaged to be enabled when any PMD-sized THP is enabled, anon or file. (Note that file THP is still controlled by the top-level control so we must always consider that, as well as the PMD-size mTHP control for anon). khugepaged only supports collapsing to PMD-sized THP so there is no value in enabling it when PMD-sized THP is disabled. So let's change the code and documentation to reflect this policy. Further, per-size enabled control modification events were not previously forwarded to khugepaged to give it an opportunity to start or stop. Consequently the following was resulting in khugepaged eroneously not being activated: echo never > /sys/kernel/mm/transparent_hugepage/enabled echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled Signed-off-by: Ryan Roberts Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface") Closes: https://lore.kernel.org/linux-mm/7a0bbe69-1e3d-4263-b206-da007791a5c4@redhat.com/ Cc: stable@vger.kernel.org Acked-by: David Hildenbrand --- Hi All, Applies on top of mm-unstable from a couple of days ago (9bb8753acdd8). No regressions observed in mm selftests. When fixing this I also noticed that khugepaged doesn't get (and never has been) activated/deactivated by `shmem_enabled=`. I've concluded that this is definitely a (separate) bug. But I'm waiting for the conclusion of the conversation at [3] before fixing, so will send separately. Changes since v1 [1] ==================== - hugepage_pmd_enabled() now considers CONFIG_READ_ONLY_THP_FOR_FS as part of decision; means that for kernels without this config, khugepaged will not be started when only the top-level control is enabled. Changes since v2 [2] ==================== - Make hugepage_pmd_enabled() out-of-line static in khugepaged.c (per Andrew) - Refactor hugepage_pmd_enabled() for better readability (per Andrew) [1] https://lore.kernel.org/linux-mm/20240702144617.2291480-1-ryan.roberts@arm.com/ [2] https://lore.kernel.org/linux-mm/20240704091051.2411934-1-ryan.roberts@arm.com/ [3] https://lore.kernel.org/linux-mm/65c37315-2741-481f-b433-cec35ef1af35@arm.com/ Thanks, Ryan Documentation/admin-guide/mm/transhuge.rst | 11 ++++---- include/linux/huge_mm.h | 12 -------- mm/huge_memory.c | 7 +++++ mm/khugepaged.c | 33 +++++++++++++++++----- 4 files changed, 38 insertions(+), 25 deletions(-) -- 2.43.0 diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 709fe10b60f4..fc321d40b8ac 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -202,12 +202,11 @@ PMD-mappable transparent hugepage:: cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size -khugepaged will be automatically started when one or more hugepage -sizes are enabled (either by directly setting "always" or "madvise", -or by setting "inherit" while the top-level enabled is set to "always" -or "madvise"), and it'll be automatically shutdown when the last -hugepage size is disabled (either by directly setting "never", or by -setting "inherit" while the top-level enabled is set to "never"). +khugepaged will be automatically started when PMD-sized THP is enabled +(either of the per-size anon control or the top-level control are set +to "always" or "madvise"), and it'll be automatically shutdown when +PMD-sized THP is disabled (when both the per-size anon control and the +top-level control are "never") Khugepaged controls ------------------- diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4d155c7a4792..107da5c4eba4 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -128,18 +128,6 @@ static inline bool hugepage_global_always(void) (1< 0) { + int err; + + err = start_stop_khugepaged(); + if (err) + ret = err; + } return ret; } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 409f67a817f1..a5ec03ef8722 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -413,6 +413,26 @@ static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) test_bit(MMF_DISABLE_THP, &mm->flags); } +static bool hugepage_pmd_enabled(void) +{ + /* + * We cover both the anon and the file-backed case here; file-backed + * hugepages, when configured in, are determined by the global control. + * Anon pmd-sized hugepages are determined by the pmd-size control. + */ + if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && + hugepage_global_enabled()) + return true; + if (test_bit(PMD_ORDER, &huge_anon_orders_always)) + return true; + if (test_bit(PMD_ORDER, &huge_anon_orders_madvise)) + return true; + if (test_bit(PMD_ORDER, &huge_anon_orders_inherit) && + hugepage_global_enabled()) + return true; + return false; +} + void __khugepaged_enter(struct mm_struct *mm) { struct khugepaged_mm_slot *mm_slot; @@ -449,7 +469,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, unsigned long vm_flags) { if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) && - hugepage_flags_enabled()) { + hugepage_pmd_enabled()) { if (thp_vma_allowable_order(vma, vm_flags, TVA_ENFORCE_SYSFS, PMD_ORDER)) __khugepaged_enter(vma->vm_mm); @@ -2462,8 +2482,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, static int khugepaged_has_work(void) { - return !list_empty(&khugepaged_scan.mm_head) && - hugepage_flags_enabled(); + return !list_empty(&khugepaged_scan.mm_head) && hugepage_pmd_enabled(); } static int khugepaged_wait_event(void) @@ -2536,7 +2555,7 @@ static void khugepaged_wait_work(void) return; } - if (hugepage_flags_enabled()) + if (hugepage_pmd_enabled()) wait_event_freezable(khugepaged_wait, khugepaged_wait_event()); } @@ -2567,7 +2586,7 @@ static void set_recommended_min_free_kbytes(void) int nr_zones = 0; unsigned long recommended_min; - if (!hugepage_flags_enabled()) { + if (!hugepage_pmd_enabled()) { calculate_min_free_kbytes(); goto update_wmarks; } @@ -2617,7 +2636,7 @@ int start_stop_khugepaged(void) int err = 0; mutex_lock(&khugepaged_mutex); - if (hugepage_flags_enabled()) { + if (hugepage_pmd_enabled()) { if (!khugepaged_thread) khugepaged_thread = kthread_run(khugepaged, NULL, "khugepaged"); @@ -2643,7 +2662,7 @@ int start_stop_khugepaged(void) void khugepaged_min_free_kbytes_update(void) { mutex_lock(&khugepaged_mutex); - if (hugepage_flags_enabled() && khugepaged_thread) + if (hugepage_pmd_enabled() && khugepaged_thread) set_recommended_min_free_kbytes(); mutex_unlock(&khugepaged_mutex); }