From patchwork Thu May 19 12:46:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12854878 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B996C433F5 for ; Thu, 19 May 2022 12:47:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11B596B0073; Thu, 19 May 2022 08:47:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C9B08D0001; Thu, 19 May 2022 08:47:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EAC9A6B0075; Thu, 19 May 2022 08:47:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DDF846B0073 for ; Thu, 19 May 2022 08:47:24 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B4C7B3345D for ; Thu, 19 May 2022 12:47:24 +0000 (UTC) X-FDA: 79482468408.05.94C2FC4 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf28.hostedemail.com (Postfix) with ESMTP id 25C9FC0002 for ; Thu, 19 May 2022 12:46:57 +0000 (UTC) Received: by mail-pg1-f173.google.com with SMTP id q76so4947980pgq.10 for ; Thu, 19 May 2022 05:47:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dzDzJxcdt6Vr6LSZbq5aB5njkqRqCUNvSK/gLTZQNyU=; b=hjzu5oIbS2YDSt7SX0AIM0J77tBEvwKCffox16iIcJfyB2tnrkfrMZ5pw7ZxoN8egh A69hxRZhRqSG3JBtY/9hab+kKeAgcZEI9Q+I8ZSUVboKBjbg2H08dp1CQQ3kTyBQx/d6 uxVwX+L975nfnPI3iEp9LOZ5H7sKL4grfLUuTBS+y1TbrrovFSbrqGp0zrr7vlnGXl/j db5ApdpOgDhyYcZqEOrTuFNzeZuHutpuBqqZb1bEOqC9/lgog32oX3oKkZj5OBRcdr7x TMmzTE/8PiNcutT8CiNTcvpDr7ywRo8YKk0XfTl9dkon4z+RE45FHETYGL1GzX1gz0H1 ZlqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dzDzJxcdt6Vr6LSZbq5aB5njkqRqCUNvSK/gLTZQNyU=; b=3heDfxF0sqpg9Xc8YtW3zJ3k6mbAvRT9sRNbU5fW9JVmlOX9dRBkmwc+tb8c8GprDy 9DC67/EKeY6/hnvRtDHi1Oqt8k9YoQR+D92QfGzFEBXtqYYI9J4K0HB7o6UxgrUEWjdW J3MKMsq6IWKmd8U9FDGN3wYHEVlLgvDxhxV/A8zeapxW08ZRQJL8LXAuMWSmmUd840XV 5pKyiU+9JwQvAqRbwDNwVmpYRKueDzudTy1jXaH26Br+XvrvOw8xObZPZcB90c9UYlDS dZA6cbfYj2bBapUswi/svi+17Yz85lIVdCnyIWtEaW+OfonPSAUKLd751i9dbAHhMNKe 4Pbw== X-Gm-Message-State: AOAM530wzJZvDhVzXBhmHh1OCZ1nigEyDqAkcAUeO/xqBqxlU8RMV54l qM19CLCD0hQS6p6W2f3sOL2BXA== X-Google-Smtp-Source: ABdhPJw/4kVA05kz3JrbwCI7GmLGLx7x2D2bFJiCAOnoFxwEZbHb9LvLVp/Ei+zsYhZbtkf+RQ4LhA== X-Received: by 2002:a63:ed0c:0:b0:3f6:57df:91e4 with SMTP id d12-20020a63ed0c000000b003f657df91e4mr743007pgi.382.1652964443311; Thu, 19 May 2022 05:47:23 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id t5-20020a17090ad14500b001d960eaed66sm3441296pjw.42.2022.05.19.05.47.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 May 2022 05:47:23 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, akpm@linux-foundation.org, paulmck@kernel.org, mike.kravetz@oracle.com, osalvador@suse.de, david@redhat.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH 1/2] mm: memory_hotplug: enumerate all supported section flags Date: Thu, 19 May 2022 20:46:31 +0800 Message-Id: <20220519124632.92091-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220519124632.92091-1-songmuchun@bytedance.com> References: <20220519124632.92091-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 25C9FC0002 X-Stat-Signature: 88m55hgcd4thqb1pcwkxoq9k7osdt733 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=hjzu5oIb; spf=pass (imf28.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1652964417-911877 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We are almost running out of free slots, only one bit is available in the worst case (powerpc with 256k pages). However, there are still some free slots on other architectures (e.g. x86_64 has 10 bits available, arm64 has 8 bits available with worst case of 64K pages). We have hard coded those numbers in code, it is inconvenient to use those bits on other architectures except powerpc. So transfer those section flags to enumeration to make it easy to add new section flags in the future. Also, move SECTION_TAINT_ZONE_DEVICE into the scope of CONFIG_ZONE_DEVICE to save a bit on non-zone-device case. Signed-off-by: Muchun Song Reported-by: kernel test robot --- include/linux/kconfig.h | 1 + include/linux/mmzone.h | 47 +++++++++++++++++++++++++++++++++++++++-------- mm/memory_hotplug.c | 6 ++++++ 3 files changed, 46 insertions(+), 8 deletions(-) diff --git a/include/linux/kconfig.h b/include/linux/kconfig.h index 20d1079e92b4..7044032b9f42 100644 --- a/include/linux/kconfig.h +++ b/include/linux/kconfig.h @@ -10,6 +10,7 @@ #define __LITTLE_ENDIAN 1234 #endif +#define __ARG_PLACEHOLDER_ 0, #define __ARG_PLACEHOLDER_1 0, #define __take_second_arg(__ignored, val, ...) val diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 299259cfe462..e0b7618d7212 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1422,16 +1422,47 @@ extern size_t mem_section_usage_size(void); * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the * worst combination is powerpc with 256k pages, * which results in PFN_SECTION_SHIFT equal 6. - * To sum it up, at least 6 bits are available. + * To sum it up, at least 6 bits are available on all architectures. + * However, we can exceed 6 bits on some other architectures except + * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available + * with the worst case of 64K pages on arm64) if we make sure the + * exceeded bit is not applicable to powerpc. */ -#define SECTION_MARKED_PRESENT (1UL<<0) -#define SECTION_HAS_MEM_MAP (1UL<<1) -#define SECTION_IS_ONLINE (1UL<<2) -#define SECTION_IS_EARLY (1UL<<3) -#define SECTION_TAINT_ZONE_DEVICE (1UL<<4) -#define SECTION_MAP_LAST_BIT (1UL<<5) +#define ENUM_SECTION_FLAG(MAPPER) \ + MAPPER(MARKED_PRESENT) \ + MAPPER(HAS_MEM_MAP) \ + MAPPER(IS_ONLINE) \ + MAPPER(IS_EARLY) \ + MAPPER(TAINT_ZONE_DEVICE, CONFIG_ZONE_DEVICE) \ + MAPPER(MAP_LAST_BIT) + +#define __SECTION_SHIFT_FLAG_MAPPER_0(x) +#define __SECTION_SHIFT_FLAG_MAPPER_1(x) SECTION_##x##_SHIFT, +#define __SECTION_SHIFT_FLAG_MAPPER(x, ...) \ + __PASTE(__SECTION_SHIFT_FLAG_MAPPER_, IS_ENABLED(__VA_ARGS__))(x) + +#define __SECTION_FLAG_MAPPER_0(x) +#define __SECTION_FLAG_MAPPER_1(x) SECTION_##x = BIT(SECTION_##x##_SHIFT), +#define __SECTION_FLAG_MAPPER(x, ...) \ + __PASTE(__SECTION_FLAG_MAPPER_, IS_ENABLED(__VA_ARGS__))(x) + +enum { + /* + * Generate a series of enumeration flags like SECTION_$name_SHIFT. + * Each entry in ENUM_SECTION_FLAG() macro will be generated to one + * enumeration iff the 2nd parameter of MAPPER() is defined or absent. + * The $name comes from the 1st parameter of MAPPER() macro. + */ + ENUM_SECTION_FLAG(__SECTION_SHIFT_FLAG_MAPPER) + /* + * Generate a series of enumeration flags like: + * SECTION_$name = BIT(SECTION_$name_SHIFT) + */ + ENUM_SECTION_FLAG(__SECTION_FLAG_MAPPER) +}; + #define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1)) -#define SECTION_NID_SHIFT 6 +#define SECTION_NID_SHIFT SECTION_MAP_LAST_BIT_SHIFT static inline struct page *__section_mem_map_addr(struct mem_section *section) { diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 1213d0c67a53..3b360eda933f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -672,12 +672,18 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon } +#ifdef CONFIG_ZONE_DEVICE static void section_taint_zone_device(unsigned long pfn) { struct mem_section *ms = __pfn_to_section(pfn); ms->section_mem_map |= SECTION_TAINT_ZONE_DEVICE; } +#else +static inline void section_taint_zone_device(unsigned long pfn) +{ +} +#endif /* * Associate the pfn range with the given zone, initializing the memmaps From patchwork Thu May 19 12:46:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12854879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B10CC433F5 for ; Thu, 19 May 2022 12:47:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A5908D0002; Thu, 19 May 2022 08:47:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 92DC38D0001; Thu, 19 May 2022 08:47:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F55D8D0002; Thu, 19 May 2022 08:47:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7216F8D0001 for ; Thu, 19 May 2022 08:47:30 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4547A32C2E for ; Thu, 19 May 2022 12:47:30 +0000 (UTC) X-FDA: 79482468660.12.BF4B619 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf18.hostedemail.com (Postfix) with ESMTP id 930B81C00C4 for ; Thu, 19 May 2022 12:47:15 +0000 (UTC) Received: by mail-pl1-f174.google.com with SMTP id d22so4677332plr.9 for ; Thu, 19 May 2022 05:47:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WcILqSPKpKM1ieuMAubHYoA522zDgq1QGQHfQOO5iAo=; b=i1EyqSfj/2Rf2GZjBE0D5AHT22mui/DxsSLYkSsV6duNTJQGnVGukkA7Rcx/ETyL9I UlYMu/n2MJiw7ATW2+xEPGyaECqjwrllgZvMAh3ywgeNEITrUuHBNtQrs7PSeVFgAaH5 2qHNGcHjLwqeoicf0AiEARUlKlPVWVvfiAKPHwFsQcEzOGjz1i8weV+jz7+UBEPYJol8 //9+gtyhkolAdCAlbYJ4J/r81HOHdQqFI0a3Y6djuO14419ozkFTg5cjYvANM8E5PiD9 qFTGXhfidzthTtm+EiIPW2GETwwJGzg8F5EOaRByxlVafSvCo4eZFZHwJewdHcp9S+Lm aJRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WcILqSPKpKM1ieuMAubHYoA522zDgq1QGQHfQOO5iAo=; b=ENgQLXE6Ve9MErQb5gHHYlMBuOPif3dTJVFGDiATgXU9ff93w6CfSfZM9fcBFm704r 2k7bYIgtdPXVUHSAsd3ydKGiEy5IByN8+jtSyswVBGWhFhzzJw3mYinEUPdHyP1jbqim /TRECrw2dkc+QC1AErBarvSXrEokS42/u+Fr2kvpYBgLcohPUTJc7UBzYMalXQL52OzQ NfUYfV8MeOogu2v6yVNqa7tb/6jw/jTS6o2xKUV9f52yspvIkWQwErb0t3wDoZH7U4lm U3pL9lQ3AQNzCFQ8Ukei5c+jDsAoS6phtkgDiZfmCbMOHj3V7OZ9Rmd2bOwM/4AggPBU BjdQ== X-Gm-Message-State: AOAM530kCTioq92kv3ozzx1sxxInGrb2N3xPlDD9njZasz5wbdTcBxge GP/96Z6BLGK7qdSHU+XguU6nQg== X-Google-Smtp-Source: ABdhPJxeIAsHF8VyN+llpWr8cC7jI+2ytsBJoR2SYlXRLrxfN9MtF+R3b28WkAJdzDYYKR7ktvffJg== X-Received: by 2002:a17:90a:6ac5:b0:1df:77a0:a72f with SMTP id b5-20020a17090a6ac500b001df77a0a72fmr5499840pjm.125.1652964448573; Thu, 19 May 2022 05:47:28 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id t5-20020a17090ad14500b001d960eaed66sm3441296pjw.42.2022.05.19.05.47.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 May 2022 05:47:28 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, akpm@linux-foundation.org, paulmck@kernel.org, mike.kravetz@oracle.com, osalvador@suse.de, david@redhat.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH 2/2] mm: memory_hotplug: introduce SECTION_CANNOT_OPTIMIZE_VMEMMAP Date: Thu, 19 May 2022 20:46:32 +0800 Message-Id: <20220519124632.92091-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220519124632.92091-1-songmuchun@bytedance.com> References: <20220519124632.92091-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 930B81C00C4 X-Stat-Signature: 365p9ha8qdpekscfdoeacymt3h8jopku X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=i1EyqSfj; spf=pass (imf18.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1652964435-714638 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For now, the feature of hugetlb_free_vmemmap is not compatible with the feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap takes precedence over memory_hotplug.memmap_on_memory. However, someone wants to make memory_hotplug.memmap_on_memory takes precedence over hugetlb_free_vmemmap since memmap_on_memory makes it more likely to succeed memory hotplug in close-to-OOM situations. So the decision of making hugetlb_free_vmemmap take precedence is not wise and elegant. The proper approach is to have hugetlb_vmemmap.c do the check whether the section which the HugeTLB pages belong to can be optimized. If the section's vmemmap pages are allocated from the added memory block itself, hugetlb_free_vmemmap should refuse to optimize the vmemmap, otherwise, do the optimization. Then both kernel parameters are compatible. So this patch introduces SECTION_CANNOT_OPTIMIZE_VMEMMAP to indicate whether the section could be optimized. Signed-off-by: Muchun Song --- Documentation/admin-guide/kernel-parameters.txt | 22 +++++++++---------- Documentation/admin-guide/sysctl/vm.rst | 5 ++--- include/linux/memory_hotplug.h | 9 -------- include/linux/mmzone.h | 17 +++++++++++++++ mm/hugetlb_vmemmap.c | 28 ++++++++++++++++++------- mm/memory_hotplug.c | 22 +++++++------------ mm/sparse.c | 8 +++++++ 7 files changed, 66 insertions(+), 45 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index c087f578d9d8..5359ffb04a84 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1730,9 +1730,11 @@ Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y, the default is on. - This is not compatible with memory_hotplug.memmap_on_memory. - If both parameters are enabled, hugetlb_free_vmemmap takes - precedence over memory_hotplug.memmap_on_memory. + Note that the vmemmap pages may be allocated from the added + memory block itself when memory_hotplug.memmap_on_memory is + enabled, those vmemmap pages cannot be optimized even if this + feature is enabled. Other vmemmap pages not allocated from + the added memory block itself do not be affected. hung_task_panic= [KNL] Should the hung task detector generate panics. @@ -3077,10 +3079,12 @@ [KNL,X86,ARM] Boolean flag to enable this feature. Format: {on | off (default)} When enabled, runtime hotplugged memory will - allocate its internal metadata (struct pages) - from the hotadded memory which will allow to - hotadd a lot of memory without requiring - additional memory to do so. + allocate its internal metadata (struct pages, + those vmemmap pages cannot be optimized even + if hugetlb_free_vmemmap is enabled) from the + hotadded memory which will allow to hotadd a + lot of memory without requiring additional + memory to do so. This feature is disabled by default because it has some implication on large (e.g. GB) allocations in some configurations (e.g. small @@ -3090,10 +3094,6 @@ Note that even when enabled, there are a few cases where the feature is not effective. - This is not compatible with hugetlb_free_vmemmap. If - both parameters are enabled, hugetlb_free_vmemmap takes - precedence over memory_hotplug.memmap_on_memory. - memtest= [KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest Format: default : 0 diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 5c9aa171a0d3..d7374a1e8ac9 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -565,9 +565,8 @@ See Documentation/admin-guide/mm/hugetlbpage.rst hugetlb_optimize_vmemmap ======================== -This knob is not available when memory_hotplug.memmap_on_memory (kernel parameter) -is configured or the size of 'struct page' (a structure defined in -include/linux/mm_types.h) is not power of two (an unusual system config could +This knob is not available when the size of 'struct page' (a structure defined +in include/linux/mm_types.h) is not power of two (an unusual system config could result in this). Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 20d7edf62a6a..e0b2209ab71c 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -351,13 +351,4 @@ void arch_remove_linear_mapping(u64 start, u64 size); extern bool mhp_supports_memmap_on_memory(unsigned long size); #endif /* CONFIG_MEMORY_HOTPLUG */ -#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY -bool mhp_memmap_on_memory(void); -#else -static inline bool mhp_memmap_on_memory(void) -{ - return false; -} -#endif - #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e0b7618d7212..3f73f24f394c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1434,6 +1434,7 @@ extern size_t mem_section_usage_size(void); MAPPER(IS_ONLINE) \ MAPPER(IS_EARLY) \ MAPPER(TAINT_ZONE_DEVICE, CONFIG_ZONE_DEVICE) \ + MAPPER(CANNOT_OPTIMIZE_VMEMMAP, CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP) \ MAPPER(MAP_LAST_BIT) #define __SECTION_SHIFT_FLAG_MAPPER_0(x) @@ -1471,6 +1472,22 @@ static inline struct page *__section_mem_map_addr(struct mem_section *section) return (struct page *)map; } +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP +static inline void section_mark_cannot_optimize_vmemmap(struct mem_section *ms) +{ + ms->section_mem_map |= SECTION_CANNOT_OPTIMIZE_VMEMMAP; +} + +static inline int section_cannot_optimize_vmemmap(struct mem_section *ms) +{ + return (ms && (ms->section_mem_map & SECTION_CANNOT_OPTIMIZE_VMEMMAP)); +} +#else +static inline void section_mark_cannot_optimize_vmemmap(struct mem_section *ms) +{ +} +#endif + static inline int present_section(struct mem_section *section) { return (section && (section->section_mem_map & SECTION_MARKED_PRESENT)); diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index fcd9f7872064..f12170520337 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -97,18 +97,32 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head) return ret; } +static unsigned int optimizable_vmemmap_pages(struct hstate *h, + struct page *head) +{ + unsigned long pfn = page_to_pfn(head); + unsigned long end = pfn + pages_per_huge_page(h); + + if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF) + return 0; + + for (; pfn < end; pfn += PAGES_PER_SECTION) { + if (section_cannot_optimize_vmemmap(__pfn_to_section(pfn))) + return 0; + } + + return hugetlb_optimize_vmemmap_pages(h); +} + void hugetlb_vmemmap_free(struct hstate *h, struct page *head) { unsigned long vmemmap_addr = (unsigned long)head; unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; - vmemmap_pages = hugetlb_optimize_vmemmap_pages(h); + vmemmap_pages = optimizable_vmemmap_pages(h, head); if (!vmemmap_pages) return; - if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF) - return; - static_branch_inc(&hugetlb_optimize_vmemmap_key); vmemmap_addr += RESERVE_VMEMMAP_SIZE; @@ -199,10 +213,10 @@ static struct ctl_table hugetlb_vmemmap_sysctls[] = { static __init int hugetlb_vmemmap_sysctls_init(void) { /* - * If "memory_hotplug.memmap_on_memory" is enabled or "struct page" - * crosses page boundaries, the vmemmap pages cannot be optimized. + * If "struct page" crosses page boundaries, the vmemmap pages cannot + * be optimized. */ - if (!mhp_memmap_on_memory() && is_power_of_2(sizeof(struct page))) + if (is_power_of_2(sizeof(struct page))) register_sysctl_init("vm", hugetlb_vmemmap_sysctls); return 0; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 3b360eda933f..7309694c4dee 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -43,30 +43,22 @@ #include "shuffle.h" #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY -static int memmap_on_memory_set(const char *val, const struct kernel_param *kp) -{ - if (hugetlb_optimize_vmemmap_enabled()) - return 0; - return param_set_bool(val, kp); -} - -static const struct kernel_param_ops memmap_on_memory_ops = { - .flags = KERNEL_PARAM_OPS_FL_NOARG, - .set = memmap_on_memory_set, - .get = param_get_bool, -}; - /* * memory_hotplug.memmap_on_memory parameter */ static bool memmap_on_memory __ro_after_init; -module_param_cb(memmap_on_memory, &memmap_on_memory_ops, &memmap_on_memory, 0444); +module_param(memmap_on_memory, bool, 0444); MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug"); -bool mhp_memmap_on_memory(void) +static inline bool mhp_memmap_on_memory(void) { return memmap_on_memory; } +#else +static inline bool mhp_memmap_on_memory(void) +{ + return false; +} #endif enum { diff --git a/mm/sparse.c b/mm/sparse.c index cb3bfae64036..1f353bf9ea6b 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -913,6 +913,14 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn, ms = __nr_to_section(section_nr); set_section_nid(section_nr, nid); __section_mark_present(ms, section_nr); + /* + * Mark whole section as non-optimizable once there is a subsection + * whose vmemmap pages are allocated from alternative allocator. The + * early section is always optimizable since the early section's + * vmemmap pages do not consider partially being populated. + */ + if (!early_section(ms) && altmap) + section_mark_cannot_optimize_vmemmap(ms); /* Align memmap to section boundary in the subsection case */ if (section_nr_to_pfn(section_nr) != start_pfn)