From patchwork Thu Mar 20 01:55:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Changyuan Lyu X-Patchwork-Id: 14023304 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D51FC36000 for ; Thu, 20 Mar 2025 01:56:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5EFA6280006; Wed, 19 Mar 2025 21:56:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 52915280001; Wed, 19 Mar 2025 21:56:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3571D280006; Wed, 19 Mar 2025 21:56:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0AD53280001 for ; Wed, 19 Mar 2025 21:56:09 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 73CD055F85 for ; Thu, 20 Mar 2025 01:56:09 +0000 (UTC) X-FDA: 83240264058.18.098FC95 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf12.hostedemail.com (Postfix) with ESMTP id 9D25640004 for ; Thu, 20 Mar 2025 01:56:07 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HgmtgbKG; spf=pass (imf12.hostedemail.com: domain of 3tnXbZwoKCFUz4xA3LHxA83BB381.zB985AHK-997Ixz7.BE3@flex--changyuanl.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3tnXbZwoKCFUz4xA3LHxA83BB381.zB985AHK-997Ixz7.BE3@flex--changyuanl.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742435767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o8JvgS9AJeolBSiA/4pI8Lzjs3pyle8TVUuTg3xWzw4=; b=4NW31wbBRQ7mihtMJabbw1rK0NxdDLHEgI79bpNHFLQ7C/amntKWykq0kEpaFqPWfBX6P9 wrXLYbWvM0YX0KKPIOBxSFw1No5+TzwVkix8oJQM0Isl7kiQM4WmzylolVzV/HrFCoIvlp oZzIRvnD6xcwwdqspgk6atj7eq+3PrY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742435767; a=rsa-sha256; cv=none; b=GEth6o1iKy7GXZGaYQO/7WIV7rDhLoLsIGXJnASQ9yaQaU6hLS4tyHzaazKeXoPLnaVPDM 2kW2yG0kZL+b8yH9fCW0a/YUnDFrBTYf/ZzrhZ3kMYPHBavroTwRaI6hOGQyOTF4MqwBZn 3glt4H9nuzaBKtQtWIZfEWUerI4PtW0= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HgmtgbKG; spf=pass (imf12.hostedemail.com: domain of 3tnXbZwoKCFUz4xA3LHxA83BB381.zB985AHK-997Ixz7.BE3@flex--changyuanl.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3tnXbZwoKCFUz4xA3LHxA83BB381.zB985AHK-997Ixz7.BE3@flex--changyuanl.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ff5296726fso820997a91.0 for ; Wed, 19 Mar 2025 18:56:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742435766; x=1743040566; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=o8JvgS9AJeolBSiA/4pI8Lzjs3pyle8TVUuTg3xWzw4=; b=HgmtgbKGpv9YkW9Fl6ZqsfMwMSDu+Vp23e6tr4kSMxAIgfuladUqCgk4yhXMdPcgkM xOD/gLd4zuIjEXoEAXavwWovT9etsdTRuVNha+PBiDO5FtZsijs5ta5NLZYGHASmkvno q6l+DPBn5VQ7cAPGtCJS8rs44tPTmaTL1HCMdslV8VH4kp6OTGnJeysETdKDeIH4mkZG 8rclsmCipaMvcsoaxFwKKz4YyZTLNGWDePotnzMhbBAoneXIi2jF96Db4FhxJf8Qp/7E 8ICAQhW8OQB4VMR6dSZVrw4/2aULi0/oG22201sm4EFx9Z/z+fu/f8MsdUOaWLvLMQH/ W4pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742435766; x=1743040566; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=o8JvgS9AJeolBSiA/4pI8Lzjs3pyle8TVUuTg3xWzw4=; b=G8233zVZv2cUC1nLD4KFy8o8buueuZvoddogYhE9Ht+HBNWpD7WXXPdemPnLJDGFjP iX+E6pbZN5JmAafoHpYpqEi/DdBVk26KMK9w04b4YcejN4bpsvgLWQYQWIZqfNCeuBvo qVhcDOFAnmZYEEehknWdgBWwhlY7LDot1prFdD7QMDQheayCDaa1YUrw4DNLmflPV2y/ 77wzeAm0MVMTmVrddCYTZ4oex6OS+fn/2uD1myxGiHm6EtkasIrBK/ivNQD5vYM1dhzV mDTO83DMPTsDek0Uo1T8HVrIpph8oUSEhIzeob7xQeTNhT6XbEQfQS4In0M5tTNb09Ht 4VOw== X-Forwarded-Encrypted: i=1; AJvYcCXmWVvBY5k3rNpaAIWuab9x2l7kdlUhh2TZ6q+pROcbThyFCnGyhZ5x9LOHDwSGdW0EYR2hYnzuTg==@kvack.org X-Gm-Message-State: AOJu0Yyn7owDugPCKoQrzANDF+ODkynSOOrkdxeZKwor7ygtp/Fmby+w nXd2OJn13zcUtVYSsQDcKDyTLUChDg5tRLy5wqLSinSGNz9j84heDyQ/d/Ui/f9+2v0cQrCob90 v1FaRFbzYNKGBd0Tn+Q== X-Google-Smtp-Source: AGHT+IHJ+6KP5vNZbXdYS1/CKUc+TX5VPg88WvJbDadlOPM+I9Fd2xGdM5rk7YaUv/4ZvrxELn+iFnZzlrBohdZi X-Received: from pjk8.prod.google.com ([2002:a17:90b:5588:b0:2fc:ccfe:368]) (user=changyuanl job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2651:b0:2ff:6ac2:c5a5 with SMTP id 98e67ed59e1d1-301be1f90d5mr6693304a91.26.1742435766515; Wed, 19 Mar 2025 18:56:06 -0700 (PDT) Date: Wed, 19 Mar 2025 18:55:39 -0700 In-Reply-To: <20250320015551.2157511-1-changyuanl@google.com> Mime-Version: 1.0 References: <20250320015551.2157511-1-changyuanl@google.com> X-Mailer: git-send-email 2.49.0.rc1.451.g8f38331e32-goog Message-ID: <20250320015551.2157511-5-changyuanl@google.com> Subject: [PATCH v5 04/16] memblock: Add support for scratch memory From: Changyuan Lyu To: linux-kernel@vger.kernel.org Cc: graf@amazon.com, akpm@linux-foundation.org, luto@kernel.org, anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com, benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com, dave.hansen@linux.intel.com, dwmw2@infradead.org, ebiederm@xmission.com, mingo@redhat.com, jgowans@amazon.com, corbet@lwn.net, krzk@kernel.org, rppt@kernel.org, mark.rutland@arm.com, pbonzini@redhat.com, pasha.tatashin@soleen.com, hpa@zytor.com, peterz@infradead.org, ptyadav@amazon.de, robh+dt@kernel.org, robh@kernel.org, saravanak@google.com, skinsburskii@linux.microsoft.com, rostedt@goodmis.org, tglx@linutronix.de, thomas.lendacky@amd.com, usama.arif@bytedance.com, will@kernel.org, devicetree@vger.kernel.org, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9D25640004 X-Stat-Signature: p6ncmnjj9z9zm98o59ngon4c99nbk5ww X-HE-Tag: 1742435767-951240 X-HE-Meta: U2FsdGVkX1+IMrjRM7HRUmPmGDwODrVtIyke7B20YX5fcuIkexFg5Nvmq6u2KOb8cqjF9maL7WEvSzBptRVrP8Qb2n/E9W+5TUtJSJZAACXV9u8NgXQw0/zBnly5r9+8x4DuyqnhAkc4EoPpVwBekS1zGsNgKpY/Ogo2qNhxEl2pCCEiQMuEWNMPi+9vUd/wpFzSHa3BZvVlGMGS1dB94a9rY2Earyli4byvGoro9kCPA309eJJU870Q2YyYIHg40yfwoqsLw4HUx3cKF8CthD6l37fFER575lZk7EwlAxrMmDQzHaYNMSBtlUNl41TMKDxxy2D0I/grMuh3cmjOnDhmpkczpkIjVBRZTHEHBstS4AMoUw7rWYCyrdImLir+Vj+Elbb9+NRsecW0JsSsZryqjzvoj57fwb2Zjwhh0KZVLemPqpGIFTnGSCMKdGSXkKO9sGqvatWyjSZfsh/C7JqqdVIIQZHk7kpcXfwnzwYgeo4s1WLX0ci5ydXQVh57CTXbRJ3UYvtICR1BIVWNC56v0XyWYWBX9BpEIQY/GZUekpvUz3ulrFTXBfYnqOSrjVqUiz0FOE1LDt7HLwqVV1aUWtFfiw/BO34I3Apmk4gCTdc1pFkAAEApVuiaX6fdeI3/s46xJcl0zBTSbDz+wAE8xOeaJ/mjw1IrXerkO4ycfP7iPZ3lKVuvyzgsZB3wu57wdDseXv0EgOpisa/pzWHOxm+f7y4QqGoQm6E+0ssNhery3RxfH04gEF5dpJ6uLBG2mR9G0nNztR2kvH4M5nuezJG34danuK1LT7reb1pgMbeKNIHfUEyAbWpJVIHtCanKvI8YsfCNvocCu+WCGDF6+chdRh6yJnezLy9oWYiebQcKRi1Xgm38Hszu10oRNgfyX70L5tdKEXAuygHkMu8C0w6rSFlA3u4HMKVpK7J7RFQFrdELbwvxBbutxH0pkegXtdwdjVQl/I1Eo3x IicF4bgQ 31k9WTdXVrTTqOlTNQq6VV8X3ajJ8Cjaer8lTsEJ51WUEV/L1SoWOrvaJRUnP9afXXe+/24//hM0phtHkTXCUOPoT+nOHuFWzOt6ZSiJZ+0K8RcyAQwrtsE7ZIiNL+nY7MdPMfAruULl56h6TqLM7lQ/hoDnogQzKBhUr9WjjTEPAkyuUdiDoBgPv3meWIZbsGNMHaeVXoQ8bESzaQcqQJYx3s4O7DHAYGLd1ClIOFsbtBetWcvKeADwGzs9GEX1oxv21iU593eOiDyta5lFbRTStkHxlahdquSWWvoHYAzE6geiYNDEn3UuhisAJO7qOZlVH4ZC6F0yk41Qkz9HW09gOb3DXPf890govMLKJi6X4Rf7xb8DCogw+NWSC086f7te6G8++l7a7zT2SblhCChtFLoYaRKsPGmOqYAVsCzG5p+FtvhE/seN2iV25Wv+BkhRxGC6FTrbZ3vQMuWpfgDV6XX5xcVwfgd8r4Oz6mwbJP5XpP/jL/V8LeLfs+FEi0YsETQ1I3lvR9Cn2f4HLNIytqvJf3VRNJFxpyEu76Yz5h4MJulC1MlHL2lgiepoZgpCHBpSglHT4dw8P6f+6n5Wwxv+GVB2eMKdcISh2Fzdq2250cVJf+GSH4MiCAtHfpl9+Y7nrpPlrycLMpU5TsweyvYnbdP8Rfw8lj5ME48LqkX64jZT6frFn6Br++c2TE++BfFlKRM/uo4Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexander Graf With KHO (Kexec HandOver), we need a way to ensure that the new kernel does not allocate memory on top of any memory regions that the previous kernel was handing over. But to know where those are, we need to include them in the memblock.reserved array which may not be big enough to hold all ranges that need to be persisted across kexec. To resize the array, we need to allocate memory. That brings us into a catch 22 situation. The solution to that is limit memblock allocations to the scratch regions: safe regions to operate in the case when there is memory that should remain intact across kexec. KHO provides several "scratch regions" as part of its metadata. These scratch regions are contiguous memory blocks that known not to contain any memory that should be persisted across kexec. These regions should be large enough to accommodate all memblock allocations done by the kexeced kernel. We introduce a new memblock_set_scratch_only() function that allows KHO to indicate that any memblock allocation must happen from the scratch regions. Later, we may want to perform another KHO kexec. For that, we reuse the same scratch regions. To ensure that no eventually handed over data gets allocated inside a scratch region, we flip the semantics of the scratch region with memblock_clear_scratch_only(): After that call, no allocations may happen from scratch memblock regions. We will lift that restriction in the next patch. Signed-off-by: Alexander Graf Co-developed-by: Mike Rapoport (Microsoft) Signed-off-by: Mike Rapoport (Microsoft) --- include/linux/memblock.h | 20 +++++++++++++ mm/Kconfig | 4 +++ mm/memblock.c | 61 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 85 insertions(+) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 1037fd7aabf4..a83738b7218b 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -45,6 +45,11 @@ extern unsigned long long max_possible_pfn; * @MEMBLOCK_RSRV_KERN: memory region that is reserved for kernel use, * either explictitly with memblock_reserve_kern() or via memblock * allocation APIs. All memblock allocations set this flag. + * @MEMBLOCK_KHO_SCRATCH: memory region that kexec can pass to the next + * kernel in handover mode. During early boot, we do not know about all + * memory reservations yet, so we get scratch memory from the previous + * kernel that we know is good to use. It is the only memory that + * allocations may happen from in this phase. */ enum memblock_flags { MEMBLOCK_NONE = 0x0, /* No special request */ @@ -54,6 +59,7 @@ enum memblock_flags { MEMBLOCK_DRIVER_MANAGED = 0x8, /* always detected via a driver */ MEMBLOCK_RSRV_NOINIT = 0x10, /* don't initialize struct pages */ MEMBLOCK_RSRV_KERN = 0x20, /* memory reserved for kernel use */ + MEMBLOCK_KHO_SCRATCH = 0x40, /* scratch memory for kexec handover */ }; /** @@ -148,6 +154,8 @@ int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); int memblock_mark_nomap(phys_addr_t base, phys_addr_t size); int memblock_clear_nomap(phys_addr_t base, phys_addr_t size); int memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t size); +int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size); +int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size); void memblock_free_all(void); void memblock_free(void *ptr, size_t size); @@ -292,6 +300,11 @@ static inline bool memblock_is_driver_managed(struct memblock_region *m) return m->flags & MEMBLOCK_DRIVER_MANAGED; } +static inline bool memblock_is_kho_scratch(struct memblock_region *m) +{ + return m->flags & MEMBLOCK_KHO_SCRATCH; +} + int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, unsigned long *end_pfn); void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, @@ -620,5 +633,12 @@ static inline void early_memtest(phys_addr_t start, phys_addr_t end) { } static inline void memtest_report_meminfo(struct seq_file *m) { } #endif +#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH +void memblock_set_kho_scratch_only(void); +void memblock_clear_kho_scratch_only(void); +#else +static inline void memblock_set_kho_scratch_only(void) { } +static inline void memblock_clear_kho_scratch_only(void) { } +#endif #endif /* _LINUX_MEMBLOCK_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 1b501db06417..550bbafe5c0b 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -506,6 +506,10 @@ config HAVE_GUP_FAST depends on MMU bool +# Enable memblock support for scratch memory which is needed for kexec handover +config MEMBLOCK_KHO_SCRATCH + bool + # Don't discard allocated memory used to track "memory" and "reserved" memblocks # after early boot, so it can still be used to test for validity of memory. # Also, memblocks are updated with memory hot(un)plug. diff --git a/mm/memblock.c b/mm/memblock.c index e704e3270b32..c0f7da7dff47 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -106,6 +106,13 @@ unsigned long min_low_pfn; unsigned long max_pfn; unsigned long long max_possible_pfn; +#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH +/* When set to true, only allocate from MEMBLOCK_KHO_SCRATCH ranges */ +static bool kho_scratch_only; +#else +#define kho_scratch_only false +#endif + static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_MEMORY_REGIONS] __initdata_memblock; static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_RESERVED_REGIONS] __initdata_memblock; #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP @@ -165,6 +172,10 @@ bool __init_memblock memblock_has_mirror(void) static enum memblock_flags __init_memblock choose_memblock_flags(void) { + /* skip non-scratch memory for kho early boot allocations */ + if (kho_scratch_only) + return MEMBLOCK_KHO_SCRATCH; + return system_has_some_mirror ? MEMBLOCK_MIRROR : MEMBLOCK_NONE; } @@ -924,6 +935,18 @@ int __init_memblock memblock_physmem_add(phys_addr_t base, phys_addr_t size) } #endif +#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH +__init_memblock void memblock_set_kho_scratch_only(void) +{ + kho_scratch_only = true; +} + +__init_memblock void memblock_clear_kho_scratch_only(void) +{ + kho_scratch_only = false; +} +#endif + /** * memblock_setclr_flag - set or clear flag for a memory region * @type: memblock type to set/clear flag for @@ -1049,6 +1072,36 @@ int __init_memblock memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t MEMBLOCK_RSRV_NOINIT); } +/** + * memblock_mark_kho_scratch - Mark a memory region as MEMBLOCK_KHO_SCRATCH. + * @base: the base phys addr of the region + * @size: the size of the region + * + * Only memory regions marked with %MEMBLOCK_KHO_SCRATCH will be considered + * for allocations during early boot with kexec handover. + * + * Return: 0 on success, -errno on failure. + */ +int __init_memblock memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size) +{ + return memblock_setclr_flag(&memblock.memory, base, size, 1, + MEMBLOCK_KHO_SCRATCH); +} + +/** + * memblock_clear_kho_scratch - Clear MEMBLOCK_KHO_SCRATCH flag for a + * specified region. + * @base: the base phys addr of the region + * @size: the size of the region + * + * Return: 0 on success, -errno on failure. + */ +int __init_memblock memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size) +{ + return memblock_setclr_flag(&memblock.memory, base, size, 0, + MEMBLOCK_KHO_SCRATCH); +} + static bool should_skip_region(struct memblock_type *type, struct memblock_region *m, int nid, int flags) @@ -1080,6 +1133,13 @@ static bool should_skip_region(struct memblock_type *type, if (!(flags & MEMBLOCK_DRIVER_MANAGED) && memblock_is_driver_managed(m)) return true; + /* + * In early alloc during kexec handover, we can only consider + * MEMBLOCK_KHO_SCRATCH regions for the allocations + */ + if ((flags & MEMBLOCK_KHO_SCRATCH) && !memblock_is_kho_scratch(m)) + return true; + return false; } @@ -2421,6 +2481,7 @@ static const char * const flagname[] = { [ilog2(MEMBLOCK_DRIVER_MANAGED)] = "DRV_MNG", [ilog2(MEMBLOCK_RSRV_NOINIT)] = "RSV_NIT", [ilog2(MEMBLOCK_RSRV_KERN)] = "RSV_KERN", + [ilog2(MEMBLOCK_KHO_SCRATCH)] = "KHO_SCRATCH", }; static int memblock_debug_show(struct seq_file *m, void *private)