From patchwork Mon Apr 14 22:24:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 14051093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA480C369B2 for ; Mon, 14 Apr 2025 22:25:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E245C2800A8; Mon, 14 Apr 2025 18:25:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DAAB72800A7; Mon, 14 Apr 2025 18:25:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C24782800A8; Mon, 14 Apr 2025 18:25:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9E1292800A7 for ; Mon, 14 Apr 2025 18:25:36 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B95945AC66 for ; Mon, 14 Apr 2025 22:25:37 +0000 (UTC) X-FDA: 83334082314.17.BAE0364 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf10.hostedemail.com (Postfix) with ESMTP id D2413C0003 for ; Mon, 14 Apr 2025 22:25:35 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZHNtrNJo; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf10.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744669536; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Wasdk2OzhDHLNgttFk1rmIb7qXMNrsMULtRHlj6BUo4=; b=ImTEKNtcbr7lPTE9SPV2zVuEVrMT7fHD6+UcWz/jiNEhn0VnTsdSl55hEu6i+Bte/ZNj38 FtY9jhQxiCWQt4ZTNOU10smtdFb+IdKEUvkFxjfz/itt3InvD/s6Ktvr4OABoJIomFlAHL u6Tea9yJHAIVF/aDKKHPGoikMqWACPY= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZHNtrNJo; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf10.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744669536; a=rsa-sha256; cv=none; b=4MMUycNwgVfbvrBGPniEM6rS2Lu8CNtRN2TmxDX3XIm7TsCZHRApu6mRcb4wfJfsWpHxK/ 5VendWDqrPOQGoDhS97UgVwsXmt/eJXdQga1enSniDyM6bI8V2NZ6992Tmp/RJUOPdfM3+ s27Oe2x9JfHyaijK/Oz3R9H6inxuUdA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1744669535; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=Wasdk2OzhDHLNgttFk1rmIb7qXMNrsMULtRHlj6BUo4=; b=ZHNtrNJoBr6/6Z6IYKQymF6neO0mL8OywUygM/GKtc1vBxUUjDVXVKcCfHSAUcUytP/8C7 CldSk/m//UoXCwICJXCmiVfhSigcfVjC6mlnzZT9ZDJoUylH8yywX1RjIW9uWarBDkH+54 0gPKclxStHSjqDAUHqFw4WnSBS9fM1Q= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-588-nNWww7m0NbSHMf6qDuTOAw-1; Mon, 14 Apr 2025 18:25:29 -0400 X-MC-Unique: nNWww7m0NbSHMf6qDuTOAw-1 X-Mimecast-MFC-AGG-ID: nNWww7m0NbSHMf6qDuTOAw_1744669525 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6CDC61800349; Mon, 14 Apr 2025 22:25:24 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.91]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C771F180B487; Mon, 14 Apr 2025 22:25:13 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, shuah@kernel.org, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ioworker0@gmail.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, dev.jain@arm.com, mhocko@suse.com, rientjes@google.com, hannes@cmpxchg.org, zokeefe@google.com, surenb@google.com, jglisse@google.com, cl@gentwo.org, jack@suse.cz, dave.hansen@linux.intel.com, will@kernel.org, tiwai@suse.de, catalin.marinas@arm.com, anshuman.khandual@arm.com, raquini@redhat.com, aarcange@redhat.com, kirill.shutemov@linux.intel.com, yang@os.amperecomputing.com, thomas.hellstrom@linux.intel.com, vishal.moola@gmail.com, sunnanyong@huawei.com, usamaarif642@gmail.com, mathieu.desnoyers@efficios.com, mhiramat@kernel.org, rostedt@goodmis.org Subject: [PATCH v3 0/4] mm: introduce THP deferred setting Date: Mon, 14 Apr 2025 16:24:52 -0600 Message-ID: <20250414222456.43212-1-npache@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Rspamd-Server: rspam01 X-Stat-Signature: sh5bcb3mer67jsaax57dr69yr8ga7q4y X-Rspam-User: X-Rspamd-Queue-Id: D2413C0003 X-HE-Tag: 1744669535-759758 X-HE-Meta: U2FsdGVkX1+r1/GXp9hIThEn7Kv5BhqGoOcCIopfIAfwX1VKKl7NUAzpHREgerYMaDjAykDv8ta+jWxKB6D1sonjgWQQEztvejyRKfcj09aNAeXNR/cB0o4PCqeQfc8hTjCDpLvA+LKzplSBm+KOhUT2B6Rp+Kc/23WzQ6RlqD0h6hqXTZSVxHID347UFr516d795qNVIva+8j1hPUDmA3iXs9a9CHNu2p2kyAHPgxXVsmZKix9rixWIbxuAYgQjygxTYCPpeOaEaEgNIwa0ojoqVNjMJ5Y56n2bjdpM81kUIRDHC3Xe5DdwtW2rKESgtfWjfIJjHU4r2VjeRkZhocDH7+Or9QWXlLLkjJ5Wcc4fNq2kmCgDyTPTT5cD9fzLxMzc+MQqOBQqJcOAwAcAKyAkiMtGtH5Zj8pi9Y5fW/lnz5hjbvM2jMTel4eXpgOf2/Vq6WXBJDdGzEfy8IypBgQXp2m17sz9sqDSlcL2Kg7lPRLNqzx9PItfywUljMNkJdq16hkNSHNKjnQLF74psExOIj8qtpc1G280SlZ5V/LeYzMBaLqT4+jzy1JnktEQzFCVxs2u/otC1W8fl2UfhI5N6tO+UbKLc3to3FRyJv3auYC+AyHdfuo6joVbv/kzfwBx5prSCCa6+qlX6gMiaTX00crIMHbUIIRdfEOMbriercMbQ2DjYID+UobxLdgCSqo0thaFgLyU1cZdVc4P6uvpFjE08yagznB5U3Ct0H0kYSuibo9z+Lq/v4m+smkmI7pcCptFmFYs/5/S09S0sZELumeJs23NAdoz6ctgPoGONdyg2SE4hqLMG/vh68htXHsxBPVgCskVKSZ5PvV3LCFaZNBAXGMAOGdxRnqs385BR00yLeQ5OIkc9Of2emSF6KgRli5apu92Tt4XhuWs73poQ6ExV4Pakeh5C7Thuj3+BafzOaOlyiPl5YiS10zonSrEYu/FsR8S19M/E/K JU7OGsYs a9G4TVJvh9VOa3W31ufyLLIHzPxu3hEb2rXJDDw3W6bJRe5RG4qBnQaoQK1v4rnyJhXWKo8pWpRw/clfj1grVhU6K+FTdOXSLdhkrEx+O2BEedNssTr1rAPvZZNRMCsWlpRNvw9i+F2eIDd6RJQMNDn0qvk3udPfZmpwplKvdfcCppDl+yPXFNi1+lVliqSTZLGeTEYXw9ogeDoIF2BPIXDa61p4tYbkMEpTOONXT0OC3pRNEIiIAG7ilMosHgoA4F9D3bbHtTspo+Ak6ZOSha/dH1pLmWCswvHpi6qyZNvbiwKkzd/vbB1XMF3kAtik/T9bQetWSx+TvWUTdOC8JNfvjNr/DYQAA61ggnc2T3N+XG6t+xrYTIev+GZg+BgPl4RxbW7A4BcZeY0rS0Veb1m/S8jnEG8GisDYUHrvULgayNwHd9dfNLxqec/hCTPBq1CEtW9oO7UrnzZNgcZ47UPbPj+7xLsTvBWZ/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This series is a follow-up to [1], which adds mTHP support to khugepaged. mTHP khugepaged support is a "loose" dependency for the sysfs/sysctl configs to make sense. Without it global="defer" and mTHP="inherit" case is "undefined" behavior. We've seen cases were customers switching from RHEL7 to RHEL8 see a significant increase in the memory footprint for the same workloads. Through our investigations we found that a large contributing factor to the increase in RSS was an increase in THP usage. For workloads like MySQL, or when using allocators like jemalloc, it is often recommended to set /transparent_hugepages/enabled=never. This is in part due to performance degradations and increased memory waste. This series introduces enabled=defer, this setting acts as a middle ground between always and madvise. If the mapping is MADV_HUGEPAGE, the page fault handler will act normally, making a hugepage if possible. If the allocation is not MADV_HUGEPAGE, then the page fault handler will default to the base size allocation. The caveat is that khugepaged can still operate on pages thats not MADV_HUGEPAGE. This allows for three things... one, applications specifically designed to use hugepages will get them, and two, applications that don't use hugepages can still benefit from them without aggressively inserting THPs at every possible chance. This curbs the memory waste, and defers the use of hugepages to khugepaged. Khugepaged can then scan the memory for eligible collapsing. Lastly there is the added benefit for those who want THPs but experience higher latency PFs. Now you can get base page performance at the PF handler and Hugepage performance for those mappings after they collapse. Admins may want to lower max_ptes_none, if not, khugepaged may aggressively collapse single allocations into hugepages. TESTING: - Built for x86_64, aarch64, ppc64le, and s390x - selftests mm - In [1] I provided a script [2] that has multiple access patterns - lots of general use. These changes have been running in my VM for some time - redis testing. This test was my original case for the defer mode. What I was able to prove was that THP=always leads to increased max_latency cases; hence why it is recommended to disable THPs for redis servers. However with 'defer' we dont have the max_latency spikes and can still get the system to utilize THPs. I further tested this with the mTHP defer setting and found that redis (and probably other jmalloc users) can utilize THPs via defer (+mTHP defer) without a large latency penalty and some potential gains. I uploaded some mmtest results here [3] which compares: stock+thp=never stock+(m)thp=always khugepaged-mthp + defer (max_ptes_none=64) The results show that (m)THPs can cause some throughput regression in some cases, but also has gains in other cases. The mTHP+defer results have more gains and less losses over the (m)THP=always case. V3 Changes: - moved some Documentation to the other series and merged the remaining Documentation updates into one V2 Changes: - base changes on mTHP khugepaged support - Fix selftests parsing issue - add mTHP defer option - add mTHP defer Documentation [1] - https://lore.kernel.org/lkml/20250414220557.35388-1-npache@redhat.com/ [2] - https://gitlab.com/npache/khugepaged_mthp_test [3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput2/output.html Nico Pache (4): mm: defer THP insertion to khugepaged mm: document (m)THP defer usage khugepaged: add defer option to mTHP options selftests: mm: add defer to thp setting parser Documentation/admin-guide/mm/transhuge.rst | 31 +++++++--- include/linux/huge_mm.h | 18 +++++- mm/huge_memory.c | 69 +++++++++++++++++++--- mm/khugepaged.c | 10 ++-- tools/testing/selftests/mm/thp_settings.c | 1 + tools/testing/selftests/mm/thp_settings.h | 1 + 6 files changed, 107 insertions(+), 23 deletions(-)