From patchwork Fri Dec 22 19:35:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13503721 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0C9CC41535 for ; Fri, 22 Dec 2023 19:36:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 21B656B0074; Fri, 22 Dec 2023 14:36:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 17B8C6B007E; Fri, 22 Dec 2023 14:36:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE73A6B0074; Fri, 22 Dec 2023 14:36:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CBBFE6B007B for ; Fri, 22 Dec 2023 14:36:39 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 970F38086E for ; Fri, 22 Dec 2023 19:36:39 +0000 (UTC) X-FDA: 81595461318.18.727CBD2 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by imf18.hostedemail.com (Postfix) with ESMTP id 5D0DA1C002A for ; Fri, 22 Dec 2023 19:36:37 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=aS0y9edq; spf=pass (imf18.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 207.171.190.10 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703273797; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ksbp/Zxa/a+NFoOb3VtUCnlki1kzhpOKLW4fIgeVn2s=; b=YeUNiRUKNOgEor6k2BZ8EHWiGcWRelLT3W4q9XKZDKO0QWqVTIiP7xhtLmNmyqhpG4Chl3 NXNVhdCZtjXuxFojIA4yaaCTqUOXrl9JbMJc6mLjfKWaUWB4t/x7/iyDtU8QX3O4iPUU+c ZKABRr3vWwwbEObfmzQeuRlBa1D+42A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703273797; a=rsa-sha256; cv=none; b=KUK6SUkeUgfM8fnrMbHe2lLpYDiSojhXQ1P02PhC3gS/PTV12kp6pglOzFUoFvdwLAzdQ4 zQQFFUiGdlukGV1aM8g6LCFj7jp8ZI96iWzY7pVGO3fI3sOnwAsm1v3f9JeL7HAEg7Im1G ZrBGLL2ymRqwtbZtxPdWUBbfEpBi56w= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=aS0y9edq; spf=pass (imf18.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 207.171.190.10 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1703273798; x=1734809798; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ksbp/Zxa/a+NFoOb3VtUCnlki1kzhpOKLW4fIgeVn2s=; b=aS0y9edq29eejMRGFSiKL91b/T1zoRczlrNG2WiZbk4lGNQBbuogOsGz 1OHBH4mliams9VXsX96lqVN+rssUwm1rh/KSAXN8z7E4DFuFcPqR3h8Y1 8lFl/J+XLWbvvlklYmDWeMlCEZVKe/vomLo5Mtwl/A2R4D1iEjm5liQm1 4=; X-IronPort-AV: E=Sophos;i="6.04,297,1695686400"; d="scan'208";a="318656934" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-pdx-2c-m6i4x-fad5e78e.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-33001.sea14.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 19:36:27 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan3.pdx.amazon.com [10.39.38.70]) by email-inbound-relay-pdx-2c-m6i4x-fad5e78e.us-west-2.amazon.com (Postfix) with ESMTPS id 37F27A0E8B; Fri, 22 Dec 2023 19:36:17 +0000 (UTC) Received: from EX19MTAUWC001.ant.amazon.com [10.0.7.35:58867] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.47.55:2525] with esmtp (Farcaster) id 5fa57003-fc1c-4663-be2b-92c0c0b01216; Fri, 22 Dec 2023 19:36:17 +0000 (UTC) X-Farcaster-Flow-ID: 5fa57003-fc1c-4663-be2b-92c0c0b01216 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:36:17 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:36:13 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH v2 01/17] mm,memblock: Add support for scratch memory Date: Fri, 22 Dec 2023 19:35:51 +0000 Message-ID: <20231222193607.15474-2-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231222193607.15474-1-graf@amazon.com> References: <20231222193607.15474-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D035UWA004.ant.amazon.com (10.13.139.109) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Stat-Signature: qw7q3qec6mn46h6sfr6cxpzohjwn335x X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 5D0DA1C002A X-Rspam-User: X-HE-Tag: 1703273797-192912 X-HE-Meta: U2FsdGVkX19/53f1sS2YwskypqLPLpm46aiFAjLhcWLuTcF7ggvjJ14vhzmPq6S9/sCwTSqn6T/bPw1bJQNSAQTwAAwz6YfgHcmoDvPNYWPt7TTbcdlxYArLjn6GAxI2Wt7JXhMeMdg2ML5gCLrSBaJgb9gDnYDu5LTx8Npr4w0Jw/WrtOybjW9q5dAKYZgbVq6FPr19Rm3y22TXcD54l+Eumyp3M5vI9flXKX4pa0Io8EVr6gepn7T8IZ52l5xdveeR8sL2MuRK6QFKJr7C6Vf1vn6QB1xhcGv85AJMk02B88qwc0yq5mTL0bAAh8WVHX0d3+rwkPYSxNeBtg5DvsbMFq87qRuubcjLIuyWxQHyK3XVmodszkkg6GLUCXVAfi1ko7qHcxO3oBDCLUnLuy6jS2Qv0SITpkR17FldN2dTMvfRal/Yfp3n5iNnY8TEJzBPZg+CWCudH1j78UhdQuMbBvh0IZc4Wc7IXphPbUT/JjwggH18cXDi4wJvK9dzD+REjMx7/kK7HOwu7/ca4a4Rp6K4t4z6SrGdzcKXEx1McSVPCtRZ55C6DRFSRHQJE9XVHsUyc4SnxBCbjEhVvtLavjaxUDtSOUdu6Mq1kY5P5bqeJmatBFOHe3jDW3u0URTl2gkCLtvrEyQMsTkhX/MSmCXXbtwdUONJv/qcTFi2gb3xdZ4RFxeHwMf23cRgthFyPosH+Ur1Wp/PYILUc+PTl0hRI8txDHKwWk+CFy/HF6rxvIbCBSF386BuhaXOTFlEXb5gnDJUQ2H7CSbLfy9ANHeirRrknylcSUaKmn8NWcPNXhFweZTLPKr8Or7hem6sspM3mz7ILoQNfkh+v0OToGApPiXOPjI51Nm0pldBlW5bImD6ujV6CaZAIhoPDJvOi87toS6GESXODDrbhM50IfQLRhT2GN33XljA/bhjqpJHL0lyq6H7I52jOSwpI4qEEr8+SaYeddCkAx9 wSKGuQRo NBysoUeDJgwqjhSpm1D6m6d4FGksF3X01poFMHL5SIu1rtT+qQOiSfkh1RLpvYVjj0dUfwEtcZpOB7IQAxlZZXdi1401VMu0hqWZqCO3ZifsnT/a0DUnN8NPb/RM2ajGU5QbTp+MLMn+ZSLgS2cRPdYfFH+gjeugH/VN75BkNWdVwS6p/654jnWf3AMJLm50ejrvtcY8L5gEfjkzRK+c8HTcvYglxifZhX52Huhx4clhtVHwUUPNfhIAJqB5ehbjPjh05/6es8GqmXpIvnTYi3NWPtHOjL8/D7Dapf3dSxKJf2IkHwYKQ2KNe/f4RXToH4OTWbymaQE6a7jiHzDcxXilZQV4V4MtfPPEuAphAVzCpNfxgfX1i+cwlvclv07hwwwcFAopFlKrBxA8OmUPAzQobWuqeZx/+y8YeV80+m5a0aQbbSRLAr6fNesnjWeLtXuuWufRTZVoyRT2RmsjXO6y12DrNWIBw/k9f2qKLHaij+W8A6S2DDwvfzTKjcRwCZZwVMvah64Xw/B5AHzsmT2Hl8GhsNcGtEft0VJJpp8R7cirmygKY5LrE+2rLepRb9u7meNyFTOPRJ7e0PcfBNYqq7859rJTzOjBxakjoeMvNiZ2cMD809l9fRA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With KHO (Kexec HandOver), we need a way to ensure that the new kernel does not allocate memory on top of any memory regions that the previous kernel was handing over. But to know where those are, we need to include them in the reserved memblocks array which may not be big enough to hold all allocations. To resize the array, we need to allocate memory. That brings us into a catch 22 situation. The solution to that is the scratch region: a safe region to operate in. KHO provides a "scratch region" as part of its metadata. This scratch region is a single, contiguous memory block that we know does not contain any KHO allocations. We can exclusively allocate from there until we finish kernel initialization to a point where it knows about all the KHO memory reservations. We introduce a new memblock_set_scratch_only() function that allows KHO to indicate that any memblock allocation must happen from the scratch region. Later, we may want to perform another KHO kexec. For that, we reuse the same scratch region. To ensure that no eventually handed over data gets allocated inside that scratch region, we flip the semantics of the scratch region with memblock_clear_scratch_only(): After that call, no allocations may happen from scratch memblock regions. We will lift that restriction in the next patch. Signed-off-by: Alexander Graf --- include/linux/memblock.h | 19 +++++++++++++ mm/Kconfig | 4 +++ mm/memblock.c | 61 +++++++++++++++++++++++++++++++++++++++- 3 files changed, 83 insertions(+), 1 deletion(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index ae3bde302f70..14043f5b696f 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -42,6 +42,10 @@ extern unsigned long long max_possible_pfn; * kernel resource tree. * @MEMBLOCK_RSRV_NOINIT: memory region for which struct pages are * not initialized (only for reserved regions). + * @MEMBLOCK_SCRATCH: memory region that kexec can pass to the next kernel in + * handover mode. During early boot, we do not know about all memory reservations + * yet, so we get scratch memory from the previous kernel that we know is good + * to use. It is the only memory that allocations may happen from in this phase. */ enum memblock_flags { MEMBLOCK_NONE = 0x0, /* No special request */ @@ -50,6 +54,7 @@ enum memblock_flags { MEMBLOCK_NOMAP = 0x4, /* don't add to kernel direct mapping */ MEMBLOCK_DRIVER_MANAGED = 0x8, /* always detected via a driver */ MEMBLOCK_RSRV_NOINIT = 0x10, /* don't initialize struct pages */ + MEMBLOCK_SCRATCH = 0x20, /* scratch memory for kexec handover */ }; /** @@ -129,6 +134,8 @@ int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); int memblock_mark_nomap(phys_addr_t base, phys_addr_t size); int memblock_clear_nomap(phys_addr_t base, phys_addr_t size); int memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t size); +int memblock_mark_scratch(phys_addr_t base, phys_addr_t size); +int memblock_clear_scratch(phys_addr_t base, phys_addr_t size); void memblock_free_all(void); void memblock_free(void *ptr, size_t size); @@ -273,6 +280,11 @@ static inline bool memblock_is_driver_managed(struct memblock_region *m) return m->flags & MEMBLOCK_DRIVER_MANAGED; } +static inline bool memblock_is_scratch(struct memblock_region *m) +{ + return m->flags & MEMBLOCK_SCRATCH; +} + int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, unsigned long *end_pfn); void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, @@ -610,5 +622,12 @@ static inline void early_memtest(phys_addr_t start, phys_addr_t end) { } static inline void memtest_report_meminfo(struct seq_file *m) { } #endif +#ifdef CONFIG_MEMBLOCK_SCRATCH +void memblock_set_scratch_only(void); +void memblock_clear_scratch_only(void); +#else +static inline void memblock_set_scratch_only(void) { } +static inline void memblock_clear_scratch_only(void) { } +#endif #endif /* _LINUX_MEMBLOCK_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 57cd378c73d6..384369e40f10 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -513,6 +513,10 @@ config ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP config HAVE_MEMBLOCK_PHYS_MAP bool +# Enable memblock support for scratch memory which is needed for KHO +config MEMBLOCK_SCRATCH + bool + config HAVE_FAST_GUP depends on MMU bool diff --git a/mm/memblock.c b/mm/memblock.c index 5a88d6d24d79..e89e6c8f9d75 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -106,6 +106,13 @@ unsigned long min_low_pfn; unsigned long max_pfn; unsigned long long max_possible_pfn; +#ifdef CONFIG_MEMBLOCK_SCRATCH +/* When set to true, only allocate from MEMBLOCK_SCRATCH ranges */ +static bool scratch_only; +#else +#define scratch_only false +#endif + static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_MEMORY_REGIONS] __initdata_memblock; static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_RESERVED_REGIONS] __initdata_memblock; #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP @@ -168,6 +175,10 @@ bool __init_memblock memblock_has_mirror(void) static enum memblock_flags __init_memblock choose_memblock_flags(void) { + /* skip non-scratch memory for kho early boot allocations */ + if (scratch_only) + return MEMBLOCK_SCRATCH; + return system_has_some_mirror ? MEMBLOCK_MIRROR : MEMBLOCK_NONE; } @@ -643,7 +654,7 @@ static int __init_memblock memblock_add_range(struct memblock_type *type, #ifdef CONFIG_NUMA WARN_ON(nid != memblock_get_region_node(rgn)); #endif - WARN_ON(flags != rgn->flags); + WARN_ON(flags != (rgn->flags & ~MEMBLOCK_SCRATCH)); nr_new++; if (insert) { if (start_rgn == -1) @@ -890,6 +901,18 @@ int __init_memblock memblock_physmem_add(phys_addr_t base, phys_addr_t size) } #endif +#ifdef CONFIG_MEMBLOCK_SCRATCH +__init_memblock void memblock_set_scratch_only(void) +{ + scratch_only = true; +} + +__init_memblock void memblock_clear_scratch_only(void) +{ + scratch_only = false; +} +#endif + /** * memblock_setclr_flag - set or clear flag for a memory region * @type: memblock type to set/clear flag for @@ -1015,6 +1038,33 @@ int __init_memblock memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t MEMBLOCK_RSRV_NOINIT); } +/** + * memblock_mark_scratch - Mark a memory region with flag MEMBLOCK_SCRATCH. + * @base: the base phys addr of the region + * @size: the size of the region + * + * Only memory regions marked with %MEMBLOCK_SCRATCH will be considered for + * allocations during early boot with kexec handover. + * + * Return: 0 on success, -errno on failure. + */ +int __init_memblock memblock_mark_scratch(phys_addr_t base, phys_addr_t size) +{ + return memblock_setclr_flag(&memblock.memory, base, size, 1, MEMBLOCK_SCRATCH); +} + +/** + * memblock_clear_scratch - Clear flag MEMBLOCK_SCRATCH for a specified region. + * @base: the base phys addr of the region + * @size: the size of the region + * + * Return: 0 on success, -errno on failure. + */ +int __init_memblock memblock_clear_scratch(phys_addr_t base, phys_addr_t size) +{ + return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_SCRATCH); +} + static bool should_skip_region(struct memblock_type *type, struct memblock_region *m, int nid, int flags) @@ -1046,6 +1096,14 @@ static bool should_skip_region(struct memblock_type *type, if (!(flags & MEMBLOCK_DRIVER_MANAGED) && memblock_is_driver_managed(m)) return true; + /* In early alloc during kho, we can only consider scratch allocations */ + if ((flags & MEMBLOCK_SCRATCH) && !memblock_is_scratch(m)) + return true; + + /* Leave scratch memory alone after scratch-only phase */ + if (!(flags & MEMBLOCK_SCRATCH) && memblock_is_scratch(m)) + return true; + return false; } @@ -2211,6 +2269,7 @@ static const char * const flagname[] = { [ilog2(MEMBLOCK_MIRROR)] = "MIRROR", [ilog2(MEMBLOCK_NOMAP)] = "NOMAP", [ilog2(MEMBLOCK_DRIVER_MANAGED)] = "DRV_MNG", + [ilog2(MEMBLOCK_SCRATCH)] = "SCRATCH", }; static int memblock_debug_show(struct seq_file *m, void *private) From patchwork Fri Dec 22 19:35:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13503720 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63BE0C46CD8 for ; Fri, 22 Dec 2023 19:36:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EEAC36B0080; Fri, 22 Dec 2023 14:36:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E72AE6B007D; Fri, 22 Dec 2023 14:36:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D61E66B007E; Fri, 22 Dec 2023 14:36:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BF3DF6B0074 for ; Fri, 22 Dec 2023 14:36:39 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 907AAA23C6 for ; Fri, 22 Dec 2023 19:36:39 +0000 (UTC) X-FDA: 81595461318.19.04CE325 Received: from smtp-fw-80006.amazon.com (smtp-fw-80006.amazon.com [99.78.197.217]) by imf27.hostedemail.com (Postfix) with ESMTP id 5C22F4000F for ; Fri, 22 Dec 2023 19:36:37 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=DgtsUCI1; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf27.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 99.78.197.217 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703273797; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V6Ou6SwquM+raA1qs66DJzmcDcyVj924rceIRas5lS8=; b=u9lrlwJqTSFFMFVGfDKoctoMMmpvYk1T5RL+yBspXy9Ev6utkqFBD9j7bS0bTik2fWQydq COrNFRgNRuN3NWmyzakLoRZuAUpf/0LPrAS8vbnJsZ9zXH0/jLbNFV5wt502QgjaAqFtnH LBnIywTLCc3Yadw1FXizfOQtlPMG68A= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=DgtsUCI1; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf27.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 99.78.197.217 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703273797; a=rsa-sha256; cv=none; b=2gPE0njK/4QCicDiwn4C3v12Qy7tPu/AWyo2Oiw+CHlsYfF3P28/bZRJHhIX+UTHsTXoaf 0tKBA6DW0Xd7Ow3SVVf+gAtOVYQOK+wgZpJobbRe7sEYMe9ddWcknHask5A+r/DABGND9M wdKO8lzgZr2S3sY+ZPzRaWOnFzXYaqE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1703273798; x=1734809798; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=V6Ou6SwquM+raA1qs66DJzmcDcyVj924rceIRas5lS8=; b=DgtsUCI1vJ3og56u4FTE/iNknCiUc0ycq7M3hmyAcrS4ALFommBm3Yni jYIGBB/IrzRnG7PLxatXhNyTfZoPfwvCCjXwl4jTRVKt95v7B7ozKHiVG O9mlhA0OAXX4ZjlAgCz+8U09AayczmhQwbWhpROdOYY07+YhiXgHul28s Y=; X-IronPort-AV: E=Sophos;i="6.04,297,1695686400"; d="scan'208";a="261444389" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-iad-1a-m6i4x-96feee09.us-east-1.amazon.com) ([10.25.36.214]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 19:36:34 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (iad7-ws-svc-p70-lb3-vlan2.iad.amazon.com [10.32.235.34]) by email-inbound-relay-iad-1a-m6i4x-96feee09.us-east-1.amazon.com (Postfix) with ESMTPS id 9FF3949A6D; Fri, 22 Dec 2023 19:36:22 +0000 (UTC) Received: from EX19MTAUWB002.ant.amazon.com [10.0.21.151:35389] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.24.220:2525] with esmtp (Farcaster) id 9ed8fb9a-78f6-45ad-a602-e555218674d3; Fri, 22 Dec 2023 19:36:21 +0000 (UTC) X-Farcaster-Flow-ID: 9ed8fb9a-78f6-45ad-a602-e555218674d3 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWB002.ant.amazon.com (10.250.64.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:36:21 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:36:17 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH v2 02/17] memblock: Declare scratch memory as CMA Date: Fri, 22 Dec 2023 19:35:52 +0000 Message-ID: <20231222193607.15474-3-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231222193607.15474-1-graf@amazon.com> References: <20231222193607.15474-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D035UWA004.ant.amazon.com (10.13.139.109) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspamd-Queue-Id: 5C22F4000F X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: psnpcq19jz7oqqkein69sm14b79yez99 X-HE-Tag: 1703273797-445581 X-HE-Meta: U2FsdGVkX1+nt8RKkD/BT0T4h1/aW0pVFoKmgKO6ma+/g5ta+3TrUQuGjA0aznppjATn0jcG2IOFwFtPVOxl4agzVrld2eco1Xqiex+Udjb9xoB1ucAiNzDx0hskTxzURoPQhbPbmcFyi00SX87z9Tvzit+53CaOuTeKED1lrYFt3gUG0RI4rXxEcekzFWIZf3znFNOv80liVkNuGkwhjOBb9OkADWO2bLzYBSTuseaaZX4aI5wtOYUr++OWAL7kA2Pay9830aQvhReFuO6ka2ip7k6MgITSob/62J/etgvUBkePxf+VTo4Z+GphHi7SCGiReF5BVtTihFiDccDaRlms2F75rn/GxYnNWMy4lADM7fZEx1QUQe+kZ5nPHy+fmF2ppmEtwlDx0P3pzwLgANUkdRuW9JL5XSS/yZPFg/CFJFt/4GFHtJxWwJtbE9QU1g5+I7MNhaEMNjYGPQvdNOdJK0kX2uXlUY6x7vAWFldOLpQ0hsU3dKzQDBq3IccF2YegTWXZjiQLE4Ic3wDpORLvs9+y2a0DhXM8Agtt2N55XQGFFNjcUi3zaMlTvrofZPlM29B/TUHBkJuEhjetnOWIQ5SsNUT+IBg+eVnqmFU6sjvdnoiOP9j2N+s/dLAw/5K7VfujWcsuWbn3FvlSXyP/WH6Dh1foYrPK8I8bOa+rgzhQOgvcYy7iLXh31ibd9IcoHNdBQ6G7I6858PceILIAOx4AQwHPrAjeEXUzw8Bv8/BWR4M7ikHAciAcc88vq4tjxiVxQXzMdWPXD5YLCeWTvaN24fJDquStTlJpgJvKgn2eTXTaxaCz/d7sphAZyGVl1f1asiw3BkjSRJHH892Kri0guUV1XJCfFN7Ti5Vq1f0zeTyM6DOYkNerzxh3PlB4vT7dGrSrur4Px+XaBudjSsNVyIHpMXRN9yZZiLmG3QeL2RxPLw1hwzWNs/9a3PlmAvKCSAR5kKSg1+n RIijJLdY jjNhvQPhYbO2U5uvg1YSKWbByAbI4VWe1b87GGKS32enbUPaZceZ42WagbN++uMWouEUWf0nh5pt+VsZXGh5kcssRodbW9qhE3zHL/P2ZNbD8ouNsUq1OJiWP121i7ewCTJIx7YLd/A3LrFOsk5yYgRvsmN6q7MiSC9xvNtKmC2Rz1jBcrwJszzIcEFmqOoA8erCXGFT+bCVyE78742rJ/qmsTUtbUAbR4e208ajZn5kxFH54tNo6i/+IuH5YVi/wlnzC9Q3U88yQHlFXt0n3wJUEkpmxODE9h4ez61xY7phbDiM/MvkrcWM4j8tXvq80yVHhXSqzJ92pY6rA6+WbNqswHPQyO+2aDkpNvF2NTZQBGXRVhNKFzLOZ9HoE1bAsXTLfLpA2bYvYN2cNQHBxoaSPCDw9GWZoxGlUvq9NfiQnYGacvhOFPyFZm4zcfhlL6+hh1bMEzZe+hQcYTW5rGTmWHXXV9/M9o0zi32e5Jgz/+yNzkxPzXQyYTqFHW9if7DLJ/DsTpyGfXGnCs3rWDgtOvCcaOtIS9w6ZlvDoh5xtzxUN7TZ3QkrCnpg8r6HCV7D15hK3Ra1sQWU7ZrztC/pGfvby/7VIWqyNrZtIxHyKx9vokllkjOvIuw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When we finish populating our memory, we don't want to lose the scratch region as memory we can use for useful data. Do do that, we mark it as CMA memory. That means that any allocation within it only happens with movable memory which we can then happily discard for the next kexec. That way we don't lose the scratch region's memory anymore for allocations after boot. Signed-off-by: Alexander Graf --- v1 -> v2: - test bot warning fix --- mm/memblock.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/mm/memblock.c b/mm/memblock.c index e89e6c8f9d75..3700c2c1a96d 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -1100,10 +1101,6 @@ static bool should_skip_region(struct memblock_type *type, if ((flags & MEMBLOCK_SCRATCH) && !memblock_is_scratch(m)) return true; - /* Leave scratch memory alone after scratch-only phase */ - if (!(flags & MEMBLOCK_SCRATCH) && memblock_is_scratch(m)) - return true; - return false; } @@ -2153,6 +2150,20 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end) } } +#ifdef CONFIG_MEMBLOCK_SCRATCH +static void reserve_scratch_mem(phys_addr_t start, phys_addr_t end) +{ + ulong start_pfn = pageblock_start_pfn(PFN_DOWN(start)); + ulong end_pfn = pageblock_align(PFN_UP(end)); + ulong pfn; + + for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { + /* Mark as CMA to prevent kernel allocations in it */ + set_pageblock_migratetype(pfn_to_page(pfn), MIGRATE_CMA); + } +} +#endif + static unsigned long __init __free_memory_core(phys_addr_t start, phys_addr_t end) { @@ -2214,6 +2225,17 @@ static unsigned long __init free_low_memory_core_early(void) memmap_init_reserved_pages(); +#ifdef CONFIG_MEMBLOCK_SCRATCH + /* + * Mark scratch mem as CMA before we return it. That way we ensure that + * no kernel allocations happen on it. That means we can reuse it as + * scratch memory again later. + */ + __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, + MEMBLOCK_SCRATCH, &start, &end, NULL) + reserve_scratch_mem(start, end); +#endif + /* * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id * because in some case like Node0 doesn't have RAM installed From patchwork Fri Dec 22 19:35:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13503722 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CA37C41535 for ; Fri, 22 Dec 2023 19:37:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A2EE98D0002; Fri, 22 Dec 2023 14:37:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B7438D0001; Fri, 22 Dec 2023 14:37:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 858768D0002; Fri, 22 Dec 2023 14:37:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 711598D0001 for ; Fri, 22 Dec 2023 14:37:01 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3B37A1C19AA for ; Fri, 22 Dec 2023 19:37:01 +0000 (UTC) X-FDA: 81595462242.28.3DE71A6 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by imf27.hostedemail.com (Postfix) with ESMTP id 030AD40006 for ; Fri, 22 Dec 2023 19:36:58 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=LpvHkTVa; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf27.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 207.171.184.29 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703273819; a=rsa-sha256; cv=none; b=P7Zd2GNQbY1Ag4YMlnWGYBqrHAjBXuyxPksUleOx6cQbW8t+blbnA/7mvU357b33OvQTfK eSC0h9Z6UOMORY09E1EjilWqJ3pkKQCvf4mG99v0/OIRvW3dOEBZlvJ6CmoCzd7wpvgvZc Bxocv9BZ+l0Ubh6hrienaF8pkkT9I3U= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=LpvHkTVa; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf27.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 207.171.184.29 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703273819; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Kr+/87v/8dQdBUqPMINnv/tSrB3ldJ3jEMlEzDytGNk=; b=IghMO4m1nUaZq8jkM5cjr3jPMkWsIA+6afgqj3SgI3f8kCt5GJImoanKjoI+eruCyD620q LCTGVLFXWGCGXcAt7LzJvXe8VSqHDOMMNFGCjZOzlcVnhpf725JyHPDnqmr+xrZkBVH+lq GyKtuXE/cYQtG7ZW2UneFydeJjGJC3M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1703273819; x=1734809819; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Kr+/87v/8dQdBUqPMINnv/tSrB3ldJ3jEMlEzDytGNk=; b=LpvHkTVa9/vYTjgfqUFehCo8pJaJxuZ7dhEmOcK43jbRLSJTJXl09RIi EwlbTSWYc+y1x2OLndHcchZsqDv60Pajt4ufD+0mEqFcaL4Mf4e6+8tRr /o+p1gF+fCegkAOno0W4bTTlHgLEkhD83N9l5kFqVV0DEIOZ6rfhbxxnA 0=; X-IronPort-AV: E=Sophos;i="6.04,297,1695686400"; d="scan'208";a="385305976" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-pdx-2c-m6i4x-fa5fe5fb.us-west-2.amazon.com) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 19:36:52 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan3.pdx.amazon.com [10.39.38.70]) by email-inbound-relay-pdx-2c-m6i4x-fa5fe5fb.us-west-2.amazon.com (Postfix) with ESMTPS id 42B52410B2; Fri, 22 Dec 2023 19:36:51 +0000 (UTC) Received: from EX19MTAUWB001.ant.amazon.com [10.0.21.151:28489] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.27.188:2525] with esmtp (Farcaster) id a3ff6f79-54ab-49d5-b21a-7ecb29051a56; Fri, 22 Dec 2023 19:36:50 +0000 (UTC) X-Farcaster-Flow-ID: a3ff6f79-54ab-49d5-b21a-7ecb29051a56 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWB001.ant.amazon.com (10.250.64.248) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:36:47 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:36:43 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH v2 03/17] kexec: Add Kexec HandOver (KHO) generation helpers Date: Fri, 22 Dec 2023 19:35:53 +0000 Message-ID: <20231222193607.15474-4-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231222193607.15474-1-graf@amazon.com> References: <20231222193607.15474-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D031UWC004.ant.amazon.com (10.13.139.246) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 030AD40006 X-Stat-Signature: m4puyp9dw6cm7omt7y7hw6ye9jb56c5j X-HE-Tag: 1703273818-797072 X-HE-Meta: U2FsdGVkX18zvu6h57Ny8gqbc9Hx4+3bs8tnNliKv3UiCZheoL8V3qLqnCAMbyM5Oheww8+0foCZDDlWxyM6YeVSyhBqcMO2aZPqDeVJlOQFEX+xa3FBwwCLHKnanU6zboCnmC75/kxAhNIIt3Ruy+a8GVkE6ub7/Sb5x7e56NHUnpC2AowpNtAopLlNIeCUxzb2Pio4eRnh6TdVJpgr8vx3zt08TLydOAfG3nJIv6Z/cgNSar2BN9evmjE94HvxYc2TC/kIbilPzU+WhF67yvwmsT+VywQbcHFg7tdsW/HN+OoGxvJxcNx2+z6X203xR49xeWlP9f6TsGwFOFZKios8/1dCXcDY1pWX2izSF1LG8fUrLoQAiR7vyqTe9+fePZB0GlRHdGuCr4hHUpwpdXXqNNdlJQBKXWToAO/KqRncaTl7n8bIo4ciChAiKJSEIg7joy47OwHXos4X+9bjAsL4Q8VEJqXUDsKHEHoS79BMZCRkdTYdU4zogNNVWgyawVMZF+/uItem3RFye4ArCUdRK17Kwt7H0C3x/QrdrtqOoC4/2dtDxDardQBtBhJBF894rz0V7ezJg3SZ+dNkhidEXurzK0u/8jdhYTp9lQ7vF/MUkPl9EbdSy1WoS/xLOJVFQws7s41m2FHo6vjuvM/dCXIq/DUzChfh78qedYW8JqDIX/YHICiQ+3LaJOYLzJHsjoyfjbvM92bATmxaMbwp4Cgc332Qw33jkfKpeyRllPux3ixnrnsVHVy9yhptybRqQvUud97QvNqNXlEECBkP6zZT8ubQtb4spbPbELxX/NpfdDrnvggLJVhj5FRilBDQYEJrhHzmPp10gTVdmmjk41JwylpQEF9EpSHYGcb4wbxbD0t2yXcJz63Ht4RZ1gAnhG0SnEZIuyIhkKSvdKQHv8148eflVxaLiZJAx26QLEfrDZIs/958J/Z7qMtyBthATumG36CLB6TBUIv 3W/vdIBI LSAGWDKxAIBvPX/hdT1Gi4GDWA7snHORer9JGWFNEUFEzguy1cfCTNTqLqnjANEVy70HSCBpymAxkEKULk3jTafx9dvER7M2+y1eXdHhv6I2u+ZdG2Jo7qF/dmC4p1ICYzqjt89/U1C+6NUCmj8P8a+i9wtg5HvWhOd71ZDXHuWV7E178njhuA3CU6ZPO1SLFJcZ63ipIh5FdN9oEbWOfPPmqyM6d3vI0jWkNUp8qVhR3K4YMtWqxCjhkJdvgzOCPN/boY9cQWzDZJqk5OPBVHMpW6SMSEn51pzLE/1ngwORs4yKW3LVlDNcVXn6xQHchMEhx+oyBLs2JjQo8310GqNEhq+tZw37l5tzAUfefc0y6wjwr6GN9ChrMOgPf77K7Tehglym6JQYFwhZfCIVLtrI5ACj1XTTTDUT0gbDUUsAY/sHdSrjNrGd5IjvRpYUpYXVYEGe38mlnslW5LxVJD5zb6xptLz+Qz+T9abd0xRJyWAiluAsCTd07vYI7H++nY5eI206eybqqUTEkKi+anp2EUVbcMv5GuOUyXIOet/YcTzkBdIEZdeuVyhG/wqRJj+Un9AoFGw8+EKaVVgTsxF9SMliH9/NoJVKaKYLIHxOgFUbtW7KuT+ER9LWYu+8Ji3W1W5LTDyLWAtlDxRz8Gu9D6XYR3fD66m+Z61twrpRItJOlDAzUug32QFC8WKWF6t+5Otk3qOsURLSYqfo9a8zmrR8OlHLrfED+cKfhmBsw0GqccRB+PrMx+XNu72PEjsmYC5xFDHHLVc34g7MPBWcQvciRlWSd7O0j797xN3X0UADim30Tr0+dz8kKF8iAjeo9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch adds the core infrastructure to generate Kexec HandOver metadata. Kexec HandOver is a mechanism that allows Linux to preserve state - arbitrary properties as well as memory locations - across kexec. It does so using 3 concepts: 1) Device Tree - Every KHO kexec carries a KHO specific flattened device tree blob that describes the state of the system. Device drivers can register to KHO to serialize their state before kexec. 2) Mem cache - A memblocks like structure that contains full page ranges of reservations. These can not be part of the architectural reservations, because they differ on every kexec. 3) Scratch Region - A CMA region that we allocate in the first kernel. CMA gives us the guarantee that no handover pages land in that region, because handover pages must be at a static physical memory location. We use this region as the place to load future kexec images into which then won't collide with any handover data. Signed-off-by: Alexander Graf --- v1 -> v2: - s/kho_reserve/kho_reserve_scratch/g - Move kho enums out of ifdef --- Documentation/ABI/testing/sysfs-kernel-kho | 53 +++ .../admin-guide/kernel-parameters.txt | 10 + MAINTAINERS | 1 + include/linux/kexec.h | 24 ++ include/uapi/linux/kexec.h | 6 + kernel/Makefile | 1 + kernel/kexec_kho_out.c | 316 ++++++++++++++++++ 7 files changed, 411 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-kho create mode 100644 kernel/kexec_kho_out.c diff --git a/Documentation/ABI/testing/sysfs-kernel-kho b/Documentation/ABI/testing/sysfs-kernel-kho new file mode 100644 index 000000000000..f69e7b81a337 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-kho @@ -0,0 +1,53 @@ +What: /sys/kernel/kho/active +Date: December 2023 +Contact: Alexander Graf +Description: + Kexec HandOver (KHO) allows Linux to transition the state of + compatible drivers into the next kexec'ed kernel. To do so, + device drivers will serialize their current state into a DT. + While the state is serialized, they are unable to perform + any modifications to state that was serialized, such as + handed over memory allocations. + + When this file contains "1", the system is in the transition + state. When contains "0", it is not. To switch between the + two states, echo the respective number into this file. + +What: /sys/kernel/kho/dt_max +Date: December 2023 +Contact: Alexander Graf +Description: + KHO needs to allocate a buffer for the DT that gets + generated before it knows the final size. By default, it + will allocate 10 MiB for it. You can write to this file + to modify the size of that allocation. + +What: /sys/kernel/kho/scratch_len +Date: December 2023 +Contact: Alexander Graf +Description: + To support continuous KHO kexecs, we need to reserve a + physically contiguous memory region that will always stay + available for future kexec allocations. This file describes + the length of that memory region. Kexec user space tooling + can use this to determine where it should place its payload + images. + +What: /sys/kernel/kho/scratch_phys +Date: December 2023 +Contact: Alexander Graf +Description: + To support continuous KHO kexecs, we need to reserve a + physically contiguous memory region that will always stay + available for future kexec allocations. This file describes + the physical location of that memory region. Kexec user space + tooling can use this to determine where it should place its + payload images. + +What: /sys/kernel/kho/dt +Date: December 2023 +Contact: Alexander Graf +Description: + When KHO is active, the kernel exposes the generated DT that + carries its current KHO state in this file. Kexec user space + tooling can use this as input file for the KHO payload image. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 51575cd31741..efeef075617e 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2504,6 +2504,16 @@ kgdbwait [KGDB] Stop kernel execution and enter the kernel debugger at the earliest opportunity. + kho_scratch=n[KMG] [KEXEC] Sets the size of the KHO scratch + region. The KHO scratch region is a physically + memory range that can only be used for non-kernel + allocations. That way, even when memory is heavily + fragmented with handed over memory, kexec will always + be able to find contiguous memory to place the next + kernel for kexec into. + + The default is 0. + kmac= [MIPS] Korina ethernet MAC address. Configure the RouterBoard 532 series on-chip Ethernet adapter MAC address. diff --git a/MAINTAINERS b/MAINTAINERS index 9104430e148e..2a19bd282dd0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11713,6 +11713,7 @@ M: Eric Biederman L: kexec@lists.infradead.org S: Maintained W: http://kernel.org/pub/linux/utils/kernel/kexec/ +F: Documentation/ABI/testing/sysfs-kernel-kho F: include/linux/kexec.h F: include/uapi/linux/kexec.h F: kernel/kexec* diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 8227455192b7..5d3b6b015838 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -21,6 +21,8 @@ #include #include +#include +#include extern note_buf_t __percpu *crash_notes; @@ -516,6 +518,28 @@ void set_kexec_sig_enforced(void); static inline void set_kexec_sig_enforced(void) {} #endif +/* Notifier index */ +enum kho_event { + KEXEC_KHO_DUMP = 0, + KEXEC_KHO_ABORT = 1, +}; + +#ifdef CONFIG_KEXEC_KHO +extern phys_addr_t kho_scratch_phys; +extern phys_addr_t kho_scratch_len; + +/* egest handover metadata */ +void kho_reserve_scratch(void); +int register_kho_notifier(struct notifier_block *nb); +int unregister_kho_notifier(struct notifier_block *nb); +bool kho_is_active(void); +#else +static inline void kho_reserve_scratch(void) {} +static inline int register_kho_notifier(struct notifier_block *nb) { return -EINVAL; } +static inline int unregister_kho_notifier(struct notifier_block *nb) { return -EINVAL; } +static inline bool kho_is_active(void) { return false; } +#endif + #endif /* !defined(__ASSEBMLY__) */ #endif /* LINUX_KEXEC_H */ diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index 01766dd839b0..d02ffd5960d6 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -49,6 +49,12 @@ /* The artificial cap on the number of segments passed to kexec_load. */ #define KEXEC_SEGMENT_MAX 16 +/* KHO passes an array of kho_mem as "mem cache" to the new kernel */ +struct kho_mem { + __u64 addr; + __u64 len; +}; + #ifndef __KERNEL__ /* * This structure is used to hold the arguments that are used when diff --git a/kernel/Makefile b/kernel/Makefile index 3947122d618b..a6bd31e22c09 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -73,6 +73,7 @@ obj-$(CONFIG_KEXEC_CORE) += kexec_core.o obj-$(CONFIG_KEXEC) += kexec.o obj-$(CONFIG_KEXEC_FILE) += kexec_file.o obj-$(CONFIG_KEXEC_ELF) += kexec_elf.o +obj-$(CONFIG_KEXEC_KHO) += kexec_kho_out.o obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o obj-$(CONFIG_COMPAT) += compat.o obj-$(CONFIG_CGROUPS) += cgroup/ diff --git a/kernel/kexec_kho_out.c b/kernel/kexec_kho_out.c new file mode 100644 index 000000000000..765cf6ba7a46 --- /dev/null +++ b/kernel/kexec_kho_out.c @@ -0,0 +1,316 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * kexec_kho_out.c - kexec handover code to egest metadata. + * Copyright (C) 2023 Alexander Graf + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include + +struct kho_out { + struct kobject *kobj; + bool active; + struct cma *cma; + struct blocking_notifier_head chain_head; + void *dt; + u64 dt_len; + u64 dt_max; + struct mutex lock; +}; + +static struct kho_out kho = { + .dt_max = (1024 * 1024 * 10), + .chain_head = BLOCKING_NOTIFIER_INIT(kho.chain_head), + .lock = __MUTEX_INITIALIZER(kho.lock), +}; + +/* + * Size for scratch (non-KHO) memory. With KHO enabled, memory can become + * fragmented because KHO regions may be anywhere in physical address + * space. The scratch region gives us a safe zone that we will never see + * KHO allocations from. This is where we can later safely load our new kexec + * images into. + */ +static phys_addr_t kho_scratch_size __initdata; + +int register_kho_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&kho.chain_head, nb); +} +EXPORT_SYMBOL_GPL(register_kho_notifier); + +int unregister_kho_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(&kho.chain_head, nb); +} +EXPORT_SYMBOL_GPL(unregister_kho_notifier); + +bool kho_is_active(void) +{ + return kho.active; +} +EXPORT_SYMBOL_GPL(kho_is_active); + +static ssize_t raw_read(struct file *file, struct kobject *kobj, + struct bin_attribute *attr, char *buf, + loff_t pos, size_t count) +{ + mutex_lock(&kho.lock); + memcpy(buf, attr->private + pos, count); + mutex_unlock(&kho.lock); + + return count; +} + +static BIN_ATTR(dt, 0400, raw_read, NULL, 0); + +static int kho_expose_dt(void *fdt) +{ + long fdt_len = fdt_totalsize(fdt); + int err; + + kho.dt = fdt; + kho.dt_len = fdt_len; + + bin_attr_dt.size = fdt_totalsize(fdt); + bin_attr_dt.private = fdt; + err = sysfs_create_bin_file(kho.kobj, &bin_attr_dt); + + return err; +} + +static void kho_abort(void) +{ + if (!kho.active) + return; + + sysfs_remove_bin_file(kho.kobj, &bin_attr_dt); + + kvfree(kho.dt); + kho.dt = NULL; + kho.dt_len = 0; + + blocking_notifier_call_chain(&kho.chain_head, KEXEC_KHO_ABORT, NULL); + + kho.active = false; +} + +static int kho_serialize(void) +{ + void *fdt = NULL; + int err; + + kho.active = true; + err = -ENOMEM; + + fdt = kvmalloc(kho.dt_max, GFP_KERNEL); + if (!fdt) + goto out; + + if (fdt_create(fdt, kho.dt_max)) { + err = -EINVAL; + goto out; + } + + err = fdt_finish_reservemap(fdt); + if (err) + goto out; + + err = fdt_begin_node(fdt, ""); + if (err) + goto out; + + err = fdt_property_string(fdt, "compatible", "kho-v1"); + if (err) + goto out; + + /* Loop through all kho dump functions */ + err = blocking_notifier_call_chain(&kho.chain_head, KEXEC_KHO_DUMP, fdt); + err = notifier_to_errno(err); + if (err) + goto out; + + /* Close / */ + err = fdt_end_node(fdt); + if (err) + goto out; + + err = fdt_finish(fdt); + if (err) + goto out; + + if (WARN_ON(fdt_check_header(fdt))) { + err = -EINVAL; + goto out; + } + + err = kho_expose_dt(fdt); + +out: + if (err) { + pr_err("kho failed to serialize state: %d", err); + kho_abort(); + } + return err; +} + +/* Handling for /sys/kernel/kho */ + +#define KHO_ATTR_RO(_name) static struct kobj_attribute _name##_attr = __ATTR_RO_MODE(_name, 0400) +#define KHO_ATTR_RW(_name) static struct kobj_attribute _name##_attr = __ATTR_RW_MODE(_name, 0600) + +static ssize_t active_store(struct kobject *dev, struct kobj_attribute *attr, + const char *buf, size_t size) +{ + ssize_t retsize = size; + bool val = false; + int ret; + + if (kstrtobool(buf, &val) < 0) + return -EINVAL; + + if (!kho_scratch_len) + return -ENOMEM; + + mutex_lock(&kho.lock); + if (val != kho.active) { + if (val) { + ret = kho_serialize(); + if (ret) { + retsize = -EINVAL; + goto out; + } + } else { + kho_abort(); + } + } + +out: + mutex_unlock(&kho.lock); + return retsize; +} + +static ssize_t active_show(struct kobject *dev, struct kobj_attribute *attr, + char *buf) +{ + ssize_t ret; + + mutex_lock(&kho.lock); + ret = sysfs_emit(buf, "%d\n", kho.active); + mutex_unlock(&kho.lock); + + return ret; +} +KHO_ATTR_RW(active); + +static ssize_t dt_max_store(struct kobject *dev, struct kobj_attribute *attr, + const char *buf, size_t size) +{ + u64 val; + + if (kstrtoull(buf, 0, &val)) + return -EINVAL; + + kho.dt_max = val; + + return size; +} + +static ssize_t dt_max_show(struct kobject *dev, struct kobj_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "0x%llx\n", kho.dt_max); +} +KHO_ATTR_RW(dt_max); + +static ssize_t scratch_len_show(struct kobject *dev, struct kobj_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "0x%llx\n", kho_scratch_len); +} +KHO_ATTR_RO(scratch_len); + +static ssize_t scratch_phys_show(struct kobject *dev, struct kobj_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "0x%llx\n", kho_scratch_phys); +} +KHO_ATTR_RO(scratch_phys); + +static __init int kho_out_init(void) +{ + int ret = 0; + + kho.kobj = kobject_create_and_add("kho", kernel_kobj); + if (!kho.kobj) { + ret = -ENOMEM; + goto err; + } + + ret = sysfs_create_file(kho.kobj, &active_attr.attr); + if (ret) + goto err; + + ret = sysfs_create_file(kho.kobj, &dt_max_attr.attr); + if (ret) + goto err; + + ret = sysfs_create_file(kho.kobj, &scratch_phys_attr.attr); + if (ret) + goto err; + + ret = sysfs_create_file(kho.kobj, &scratch_len_attr.attr); + if (ret) + goto err; + +err: + return ret; +} +late_initcall(kho_out_init); + +static int __init early_kho_scratch(char *p) +{ + kho_scratch_size = memparse(p, &p); + return 0; +} +early_param("kho_scratch", early_kho_scratch); + +/** + * kho_reserve_scratch - Reserve a contiguous chunk of memory for kexec + * + * With KHO we can preserve arbitrary pages in the system. To ensure we still + * have a large contiguous region of memory when we search the physical address + * space for target memory, let's make sure we always have a large CMA region + * active. This CMA region will only be used for movable pages which are not a + * problem for us during KHO because we can just move them somewhere else. + */ +__init void kho_reserve_scratch(void) +{ + int r; + + if (kho_get_fdt()) { + /* + * We came from a previous KHO handover, so we already have + * a known good scratch region that we preserve. No need to + * allocate another. + */ + return; + } + + /* Only allocate KHO scratch memory when we're asked to */ + if (!kho_scratch_size) + return; + + r = cma_declare_contiguous_nid(0, kho_scratch_size, 0, PAGE_SIZE, 0, + false, "kho", &kho.cma, NUMA_NO_NODE); + if (WARN_ON(r)) + return; + + kho_scratch_phys = cma_get_base(kho.cma); + kho_scratch_len = cma_get_size(kho.cma); +} From patchwork Fri Dec 22 19:35:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13503723 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A912EC4706C for ; Fri, 22 Dec 2023 19:37:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E26EC8D0003; Fri, 22 Dec 2023 14:37:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DD7058D0001; Fri, 22 Dec 2023 14:37:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2A048D0003; Fri, 22 Dec 2023 14:37:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id ACB388D0001 for ; Fri, 22 Dec 2023 14:37:03 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 849D612047A for ; Fri, 22 Dec 2023 19:37:03 +0000 (UTC) X-FDA: 81595462326.06.500A9D9 Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) by imf30.hostedemail.com (Postfix) with ESMTP id 6200280016 for ; Fri, 22 Dec 2023 19:37:01 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=WqIKO40o; spf=pass (imf30.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 52.95.49.90 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703273821; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xGjqrLykSTwB3EIMfRg1ZA2o/Dhb+jatNBEObJviGoc=; b=hM0yNv30Ew3a/yLMkSerhhxCWhMqb+z7jWjWRc4f0n7fTmEM9QcnWROMWuX1xqXKNdVq1J P4m+cbSOPV4J2jPdCYHSGvgCjFERfyf57ps8mGtqwZcQo8MPWBT87WNDRt8zXEAYdTdJR6 t5tAmZ90EUSOtBlng24jqvVlumlx7R4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703273821; a=rsa-sha256; cv=none; b=W5hRR7W3fuHHqiO/U7kZ9AogW46zJy5zXAezfWAfl6gh/5gQFts6pGaEj0Ski/erNFGFfb /vhNTfyCGA+9bc9Oc+dhbcTKiS0aOpQP9jhFeHc93rGn0q9AlLbBcVb1mYpQZSSAuObYNp 6gZndx83aLLiLph9jYWSVvtv19LwRlE= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=WqIKO40o; spf=pass (imf30.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 52.95.49.90 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1703273821; x=1734809821; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xGjqrLykSTwB3EIMfRg1ZA2o/Dhb+jatNBEObJviGoc=; b=WqIKO40oLgXykGG6OAFCO45QPO+CA302wGOsfnv1SGKcHhzK+ony2+V/ 2palaI8M3TVyjTzB22Pbgm0sHhd33qIwxCKoWpcyGaFeRlt94+iCsFi+5 DIgcG1/7DvBjjk+fbkHqEdTtGVlg1BewJVZRPn0C9v/xukRpYlnAs3cr1 8=; X-IronPort-AV: E=Sophos;i="6.04,297,1695686400"; d="scan'208";a="375893880" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-iad-1e-m6i4x-b538c141.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-6002.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 19:36:59 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (iad7-ws-svc-p70-lb3-vlan2.iad.amazon.com [10.32.235.34]) by email-inbound-relay-iad-1e-m6i4x-b538c141.us-east-1.amazon.com (Postfix) with ESMTPS id 1F6A1A0864; Fri, 22 Dec 2023 19:36:51 +0000 (UTC) Received: from EX19MTAUWC002.ant.amazon.com [10.0.38.20:8455] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.38.150:2525] with esmtp (Farcaster) id f844e829-bc2a-438e-a699-5c7865192502; Fri, 22 Dec 2023 19:36:51 +0000 (UTC) X-Farcaster-Flow-ID: f844e829-bc2a-438e-a699-5c7865192502 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWC002.ant.amazon.com (10.250.64.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:36:51 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:36:47 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH v2 04/17] kexec: Add KHO parsing support Date: Fri, 22 Dec 2023 19:35:54 +0000 Message-ID: <20231222193607.15474-5-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231222193607.15474-1-graf@amazon.com> References: <20231222193607.15474-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D031UWC004.ant.amazon.com (10.13.139.246) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspamd-Queue-Id: 6200280016 X-Rspam-User: X-Stat-Signature: i5zbkrxuoskjdi3chetoz7ziwr91fwyf X-Rspamd-Server: rspam03 X-HE-Tag: 1703273821-803842 X-HE-Meta: U2FsdGVkX1/0d9/KFKW4mqnlFup5WvqW1jt2CWEtVvMAFF33gh7ohqYRjUdRNRhmganEypwYgkko5b5tSnSicWwlfFV+sFLtL5+X0xazcH0aLFeAsipXNSSEpgoehpgo7aFClVZvx4F11DD/xDRBxGp4POXBu6hjs3znUtaraCNTgvv2fgSAj7leZYQiyK1VN7fJOSj7yGjAi09NkBDZ49qhuvOSs28xi1CgVmk/8Ngg4pfj4gEsTbSoXeEfrGAjFghARcVR0H+uZUBMFaXsYiFg7iiZ/FKGkXBevTeznQ2PT3XYT9e04CSRQ3pdE+sb9spFIQi3uOgCPZ7dBOE8zsU/yvNFDE6/OzY4NfQNLyhv/4n3h5CNjT94u33z45FQKIH1EspI7b+u/ZgFcRlLTscetBkMO0ANPhIJdWS6AOwPEQQ5raYfXorl+kBKd1qhaRfVrDLJiBQvJrkwLVBfpXz7JgZC5H6zXvSQ9THZsI8ZrEglj23o4Sel2LIgVvra+cpgbAJGP8Tu5E3xioH2fyGh+TMgjdApxBuyyCrfuxirDdAXR8Du6kF6Y43cxXH2L9Oq7sF3+CY1lWrLmbjkwidXYEbplafUIwFiGIS1a9j3NdXYIZmXWhlwERq5Mctwxr/jTWQgEoKiVOMSnAg6fVOkCIh47m7WIphJLA3Ed1QOG8Wbe61F9clRcxy78w0Ck+Ep1cOe3dQwobMP3JxOHn8LuN5SwP45zTCwvdeB7se4+jU+JL+ooojKu3/+l8LeOArgNwyiQmcwJ476Bxz6XQcAEna7a/gMf1amjD2gxSaS3KCDNrh7t18u26ld2AejudTj3dFh23OZVP7ZCTcqWd9lyut+QF45uNo+WYmfRlBnn63UxDFzMr4PZvDjP37YZyA13ugwKswzetun+JXAUQDT3drapK2rWE/flXbprfuxToH9nOu1I7hks/Jb7+0AZZ6gJW5Iw4smoKksPf5 W3uYIlN3 FT7mhl5SqNJh7PYja2u+972VkfBEcS313eH9+KKVqRN3Gdpo6sectyXeX7UOqKnZTJSHfBOBy0XtRTSngsSQw+9c70XNtTj0F/mw2rzh5bKQY4bmy2dr1WJ6XF7WcxePRVujayaWaxiSeNukw6DQup61XjdYmD4iR91u/6Y5UZt0YS9TQ0uiUXQ8fibm6n75wXaC1o2fTROCr76s6Q4+c7WrlgWs/s2cAKKwTKbelXsXXUL19D4LYON1llTbl+CDsGHOE9Bxn2m5wY2sowfZq9Tik9pCEsqfsF8KSOS/NGcofdRS8tuVGPNr6ov/dzsmuJ0Ci0/TmnhWoqKLqT69rculnHh7xHm0/QiSdvp/i10ZEkkKGt/A+4RhPCMW8rcZz4WEUgcSmupE8b3708lkQCsiKSC3kXprYZHM7MGY9aF2IQCUPNR13rIQ2D53HpWe22StNwHC5LtjLD6LySGeIewtBotWQrzGjvVeuniuJqLCfFJq8Ipc3oUsAxhlS0TWpmlOYY6I5H7w0DjyMLQLmnijEwyAbU7IxUjqVW3skxPxel31MutJX5V3QUt1X+nXqn5qG6IRkC3uwDMmCVJfL+V/y0XPpPqwzM/lTEy14V1MLz+DT4FJkwLGBCCcKECitTwn7FDuigstpjSD4NoOPZs6rl3moRlDKk3ov/TpD31NvfoXelyEsytoqLCSfnckkjIKHQ9rptITUHo3JY7wslNNhwx18ij/+F0BB6TZVei2s5TiBDrmQ6m/h+SZpcsXHaJI9BZni3huRrt6d8/G5ZOPI5W8pqwBXTUHQ32Mwb2T5sEKSYsmoZA0bLuL8H1ReNvLv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When we have a KHO kexec, we get a device tree, mem cache and scratch region to populate the state of the system. Provide helper functions that allow architecture code to easily handle memory reservations based on them and give device drivers visibility into the KHO DT and memory reservations so they can recover their own state. Signed-off-by: Alexander Graf --- v1 -> v2: - s/kho_reserve_mem/kho_reserve_previous_mem/g - make kho_get_fdt() const - Add stubs for return_mem and claim_mem --- Documentation/ABI/testing/sysfs-firmware-kho | 9 + MAINTAINERS | 1 + include/linux/kexec.h | 27 +- kernel/Makefile | 1 + kernel/kexec_kho_in.c | 298 +++++++++++++++++++ 5 files changed, 335 insertions(+), 1 deletion(-) create mode 100644 Documentation/ABI/testing/sysfs-firmware-kho create mode 100644 kernel/kexec_kho_in.c diff --git a/Documentation/ABI/testing/sysfs-firmware-kho b/Documentation/ABI/testing/sysfs-firmware-kho new file mode 100644 index 000000000000..e4ed2cb7c810 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-firmware-kho @@ -0,0 +1,9 @@ +What: /sys/firmware/kho/dt +Date: December 2023 +Contact: Alexander Graf +Description: + When the kernel was booted with Kexec HandOver (KHO), + the device tree that carries metadata about the previous + kernel's state is in this file. This file may disappear + when all consumers of it finished to interpret their + metadata. diff --git a/MAINTAINERS b/MAINTAINERS index 2a19bd282dd0..61bdfd47bb23 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11713,6 +11713,7 @@ M: Eric Biederman L: kexec@lists.infradead.org S: Maintained W: http://kernel.org/pub/linux/utils/kernel/kexec/ +F: Documentation/ABI/testing/sysfs-firmware-kho F: Documentation/ABI/testing/sysfs-kernel-kho F: include/linux/kexec.h F: include/uapi/linux/kexec.h diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 5d3b6b015838..765f71976230 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -528,13 +528,38 @@ enum kho_event { extern phys_addr_t kho_scratch_phys; extern phys_addr_t kho_scratch_len; +/* ingest handover metadata */ +void kho_reserve_previous_mem(void); +void kho_populate(phys_addr_t dt_phys, phys_addr_t scratch_phys, u64 scratch_len, + phys_addr_t mem_phys, u64 mem_len); +void kho_populate_refcount(void); +const void *kho_get_fdt(void); +void kho_return_mem(const struct kho_mem *mem); +void *kho_claim_mem(const struct kho_mem *mem); +static inline bool is_kho_boot(void) +{ + return !!kho_scratch_phys; +} + /* egest handover metadata */ void kho_reserve_scratch(void); int register_kho_notifier(struct notifier_block *nb); int unregister_kho_notifier(struct notifier_block *nb); bool kho_is_active(void); #else -static inline void kho_reserve_scratch(void) {} +/* ingest handover metadata */ +static inline void kho_reserve_previous_mem(void) { } +static inline void kho_populate(phys_addr_t dt_phys, phys_addr_t scratch_phys, + u64 scratch_len, phys_addr_t mem_phys, + u64 mem_len) { } +static inline void kho_populate_refcount(void) { } +static inline void *kho_get_fdt(void) { return NULL; } +static inline void kho_return_mem(const struct kho_mem *mem) { } +static inline void *kho_claim_mem(const struct kho_mem *mem) { return NULL; } +static inline bool is_kho_boot(void) { return false; } + +/* egest handover metadata */ +static inline void kho_reserve_scratch(void) { } static inline int register_kho_notifier(struct notifier_block *nb) { return -EINVAL; } static inline int unregister_kho_notifier(struct notifier_block *nb) { return -EINVAL; } static inline bool kho_is_active(void) { return false; } diff --git a/kernel/Makefile b/kernel/Makefile index a6bd31e22c09..7c3065e40c75 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -73,6 +73,7 @@ obj-$(CONFIG_KEXEC_CORE) += kexec_core.o obj-$(CONFIG_KEXEC) += kexec.o obj-$(CONFIG_KEXEC_FILE) += kexec_file.o obj-$(CONFIG_KEXEC_ELF) += kexec_elf.o +obj-$(CONFIG_KEXEC_KHO) += kexec_kho_in.o obj-$(CONFIG_KEXEC_KHO) += kexec_kho_out.o obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o obj-$(CONFIG_COMPAT) += compat.o diff --git a/kernel/kexec_kho_in.c b/kernel/kexec_kho_in.c new file mode 100644 index 000000000000..5f8e0d9f9e12 --- /dev/null +++ b/kernel/kexec_kho_in.c @@ -0,0 +1,298 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * kexec_kho_in.c - kexec handover code to ingest metadata. + * Copyright (C) 2023 Alexander Graf + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include + +/* The kho dt during runtime */ +static void *fdt; + +/* Globals to hand over phys/len from early to runtime */ +static phys_addr_t handover_phys __initdata; +static u32 handover_len __initdata; + +static phys_addr_t mem_phys __initdata; +static u32 mem_len __initdata; + +phys_addr_t kho_scratch_phys; +phys_addr_t kho_scratch_len; + +const void *kho_get_fdt(void) +{ + return fdt; +} +EXPORT_SYMBOL_GPL(kho_get_fdt); + +/** + * kho_populate_refcount - Scan the DT for any memory ranges. Increase the + * affected pages' refcount by 1 for each. + */ +__init void kho_populate_refcount(void) +{ + const void *fdt = kho_get_fdt(); + void *mem_virt = __va(mem_phys); + int offset = 0, depth = 0, initial_depth = 0, len; + + if (!fdt) + return; + + /* Go through the mem list and add 1 for each reference */ + for (offset = 0; + offset >= 0 && depth >= initial_depth; + offset = fdt_next_node(fdt, offset, &depth)) { + const struct kho_mem *mems; + u32 i; + + mems = fdt_getprop(fdt, offset, "mem", &len); + if (!mems || len & (sizeof(*mems) - 1)) + continue; + + for (i = 0; i < len; i += sizeof(*mems)) { + const struct kho_mem *mem = ((void *)mems) + i; + u64 start_pfn = PFN_DOWN(mem->addr); + u64 end_pfn = PFN_UP(mem->addr + mem->len); + u64 pfn; + + for (pfn = start_pfn; pfn < end_pfn; pfn++) + get_page(pfn_to_page(pfn)); + } + } + + /* + * Then reduce the reference count by 1 to offset the initial ref count + * of 1. In addition, unreserve the page. That way, we can free_page() + * it for every consumer and automatically free it to the global memory + * pool when everyone is done. + */ + for (offset = 0; offset < mem_len; offset += sizeof(struct kho_mem)) { + struct kho_mem *mem = mem_virt + offset; + u64 start_pfn = PFN_DOWN(mem->addr); + u64 end_pfn = PFN_UP(mem->addr + mem->len); + u64 pfn; + + for (pfn = start_pfn; pfn < end_pfn; pfn++) { + struct page *page = pfn_to_page(pfn); + + /* + * This is similar to free_reserved_page(), but + * preserves the reference count + */ + ClearPageReserved(page); + __free_page(page); + adjust_managed_page_count(page, 1); + } + } +} + +static void kho_return_pfn(ulong pfn) +{ + struct page *page = pfn_to_page(pfn); + + if (WARN_ON(!page)) + return; + __free_page(page); +} + +/** + * kho_return_mem - Notify the kernel that initially reserved memory is no + * longer needed. When the last consumer of a page returns their mem, kho + * returns the page to the buddy allocator as free page. + */ +void kho_return_mem(const struct kho_mem *mem) +{ + uint64_t start_pfn, end_pfn, pfn; + + start_pfn = PFN_DOWN(mem->addr); + end_pfn = PFN_UP(mem->addr + mem->len); + + for (pfn = start_pfn; pfn < end_pfn; pfn++) + kho_return_pfn(pfn); +} +EXPORT_SYMBOL_GPL(kho_return_mem); + +static void kho_claim_pfn(ulong pfn) +{ + struct page *page = pfn_to_page(pfn); + + WARN_ON(!page); + if (WARN_ON(page_count(page) != 1)) + pr_err("Claimed non kho pfn %lx", pfn); +} + +/** + * kho_claim_mem - Notify the kernel that a handed over memory range is now in + * use by a kernel subsystem and considered an allocated page. This function + * removes the reserved state for all pages that the mem spans. + */ +void *kho_claim_mem(const struct kho_mem *mem) +{ + u64 start_pfn, end_pfn, pfn; + void *va = __va(mem->addr); + + start_pfn = PFN_DOWN(mem->addr); + end_pfn = PFN_UP(mem->addr + mem->len); + + for (pfn = start_pfn; pfn < end_pfn; pfn++) + kho_claim_pfn(pfn); + + return va; +} +EXPORT_SYMBOL_GPL(kho_claim_mem); + +/** + * kho_reserve_previous_mem - Adds all memory reservations into memblocks + * and moves us out of the scratch only phase. Must be called after page tables + * are initialized and memblock_allow_resize(). + */ +void __init kho_reserve_previous_mem(void) +{ + void *mem_virt = __va(mem_phys); + int off, err; + + if (!handover_phys || !mem_phys) + return; + + /* + * We reached here because we are running inside a working linear map + * that allows us to resize memblocks dynamically. Use the chance and + * populate the global fdt pointer + */ + fdt = __va(handover_phys); + + off = fdt_path_offset(fdt, "/"); + if (off < 0) { + fdt = NULL; + return; + } + + err = fdt_node_check_compatible(fdt, off, "kho-v1"); + if (err) { + pr_warn("KHO has invalid compatible, disabling."); + return; + } + + /* Then populate all preserved memory areas as reserved */ + for (off = 0; off < mem_len; off += sizeof(struct kho_mem)) { + struct kho_mem *mem = mem_virt + off; + + memblock_reserve(mem->addr, mem->len); + } + + /* Unreserve the mem cache - we don't need it from here on */ + memblock_phys_free(mem_phys, mem_len); + + /* + * Now we know about all memory reservations, release the scratch only + * constraint and allow normal allocations from the scratch region. + */ + memblock_clear_scratch_only(); +} + +/* Handling for /sys/firmware/kho */ +static struct kobject *kho_kobj; + +static ssize_t raw_read(struct file *file, struct kobject *kobj, + struct bin_attribute *attr, char *buf, + loff_t pos, size_t count) +{ + memcpy(buf, attr->private + pos, count); + return count; +} + +static BIN_ATTR(dt, 0400, raw_read, NULL, 0); + +static __init int kho_in_init(void) +{ + int ret = 0; + + if (!fdt) + return 0; + + kho_kobj = kobject_create_and_add("kho", firmware_kobj); + if (!kho_kobj) { + ret = -ENOMEM; + goto err; + } + + bin_attr_dt.size = fdt_totalsize(fdt); + bin_attr_dt.private = fdt; + ret = sysfs_create_bin_file(kho_kobj, &bin_attr_dt); + if (ret) + goto err; + +err: + return ret; +} +subsys_initcall(kho_in_init); + +void __init kho_populate(phys_addr_t handover_dt_phys, phys_addr_t scratch_phys, + u64 scratch_len, phys_addr_t mem_cache_phys, + u64 mem_cache_len) +{ + void *handover_dt; + + /* Determine the real size of the DT */ + handover_dt = early_memremap(handover_dt_phys, sizeof(struct fdt_header)); + if (!handover_dt) { + pr_warn("setup: failed to memremap kexec FDT (0x%llx)\n", handover_dt_phys); + return; + } + + if (fdt_check_header(handover_dt)) { + pr_warn("setup: kexec handover FDT is invalid (0x%llx)\n", handover_dt_phys); + early_memunmap(handover_dt, PAGE_SIZE); + return; + } + + handover_len = fdt_totalsize(handover_dt); + handover_phys = handover_dt_phys; + + /* Reserve the DT so we can still access it in late boot */ + memblock_reserve(handover_phys, handover_len); + + /* Reserve the mem cache so we can still access it later */ + memblock_reserve(mem_cache_phys, mem_cache_len); + + /* + * We pass a safe contiguous block of memory to use for early boot purporses from + * the previous kernel so that we can resize the memblock array as needed. + */ + memblock_add(scratch_phys, scratch_len); + + if (WARN_ON(memblock_mark_scratch(scratch_phys, scratch_len))) { + pr_err("Kexec failed to mark the scratch region. Disabling KHO."); + handover_len = 0; + handover_phys = 0; + return; + } + pr_debug("Marked 0x%lx+0x%lx as scratch", (long)scratch_phys, (long)scratch_len); + + /* + * Now that we have a viable region of scratch memory, let's tell the memblocks + * allocator to only use that for any allocations. That way we ensure that nothing + * scribbles over in use data while we initialize the page tables which we will need + * to ingest all memory reservations from the previous kernel. + */ + memblock_set_scratch_only(); + + early_memunmap(handover_dt, sizeof(struct fdt_header)); + + /* Remember the mem cache location for kho_reserve_previous_mem() */ + mem_len = mem_cache_len; + mem_phys = mem_cache_phys; + + /* Remember the scratch block - we will reuse it again for the next kexec */ + kho_scratch_phys = scratch_phys; + kho_scratch_len = scratch_len; + + pr_info("setup: Found kexec handover data. Will skip init for some devices\n"); +} From patchwork Fri Dec 22 19:35:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13503724 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD6D3C41535 for ; Fri, 22 Dec 2023 19:37:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 773DB8D0005; Fri, 22 Dec 2023 14:37:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F9978D0001; Fri, 22 Dec 2023 14:37:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 572ED8D0005; Fri, 22 Dec 2023 14:37:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 435588D0001 for ; Fri, 22 Dec 2023 14:37:07 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 25D81A2679 for ; Fri, 22 Dec 2023 19:37:07 +0000 (UTC) X-FDA: 81595462494.06.CFB4CDE Received: from smtp-fw-9105.amazon.com (smtp-fw-9105.amazon.com [207.171.188.204]) by imf09.hostedemail.com (Postfix) with ESMTP id E108D14001A for ; Fri, 22 Dec 2023 19:37:04 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=C7LGG8WO; spf=pass (imf09.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 207.171.188.204 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703273825; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FgoPrKrTf/fInUIvTT5Pn4UuSoOGQTffwp30B9kCFRs=; b=S2FPnqdKYUXHe3mjpVB0P1W4kEyo7Lw3QYij1q3YlAK+gJh43QfwHEz2rvqBP0c1MMfDcK L8ZIIODGpM767TMUMj2I+agam1e3SswXKuxJqzO5EYHejDeopwUA4rw8/6LipXhO6AdAb4 vorCSkIheQe8gWtjMA7Lxt4X7ayrT0s= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=C7LGG8WO; spf=pass (imf09.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 207.171.188.204 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703273825; a=rsa-sha256; cv=none; b=5x5aPR2vm1i8K6pJ9u1DOXtVu5+ykOxGG/2ifRQykJ/J43G5r01sJk0pJDH7VLNdDjuSeq ycW02NpCJtF8BpGHetDfNYmESwEYFntG3irmOX/lqdpkN6OVrr/tk/DEcsseP5FBgTYroI 8t0gIkIKDrLEhRpa6n6cAfdWtM8dSRg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1703273825; x=1734809825; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FgoPrKrTf/fInUIvTT5Pn4UuSoOGQTffwp30B9kCFRs=; b=C7LGG8WO2Q9VRZ53erAlM6EkmXskETDEyV17WnzGy+Wp/yE6zWP4avTJ yDD6AlcOali94BantwBaiMvb9FsG6MyFEvlKkkLFhwx09o6ryp9WYek5f /+CsfAdw7mkiABQQioBp7liUKSLtj1HqLct+ltTJ6TcEz+8YB33soxQ7B o=; X-IronPort-AV: E=Sophos;i="6.04,297,1695686400"; d="scan'208";a="693393942" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-pdx-2c-m6i4x-8c5b1df3.us-west-2.amazon.com) ([10.25.36.210]) by smtp-border-fw-9105.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 19:36:57 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan3.pdx.amazon.com [10.39.38.70]) by email-inbound-relay-pdx-2c-m6i4x-8c5b1df3.us-west-2.amazon.com (Postfix) with ESMTPS id 8AC0B40D5B; Fri, 22 Dec 2023 19:36:55 +0000 (UTC) Received: from EX19MTAUWA001.ant.amazon.com [10.0.38.20:44832] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.24.220:2525] with esmtp (Farcaster) id 21261db6-5ae1-4e8c-a141-2ca568fe2657; Fri, 22 Dec 2023 19:36:55 +0000 (UTC) X-Farcaster-Flow-ID: 21261db6-5ae1-4e8c-a141-2ca568fe2657 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWA001.ant.amazon.com (10.250.64.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:36:55 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:36:51 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH v2 05/17] kexec: Add KHO support to kexec file loads Date: Fri, 22 Dec 2023 19:35:55 +0000 Message-ID: <20231222193607.15474-6-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231222193607.15474-1-graf@amazon.com> References: <20231222193607.15474-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D031UWC004.ant.amazon.com (10.13.139.246) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspamd-Queue-Id: E108D14001A X-Rspam-User: X-Stat-Signature: y74d3rjddub48hiztyhnc34ha4d1ybsw X-Rspamd-Server: rspam01 X-HE-Tag: 1703273824-116747 X-HE-Meta: U2FsdGVkX19AVeqbjb7o4QJq1anawSeIc2BgCxRZooj8/dipsLb4ZhPwZAynag2iZjfsO+Fxy0hbAPjpa8BUEH6VO9toxmxyoiUJ5JwiPAJz4cZhNZdOr5ErOhWXFqMymN4RRui3qm5rLSBlls6ysvx4PZ7T5My0Q9CvWO7358ukspALFhoWaT4IKmLaDdgn6tRvOLah2ssFipudgnpKZbp7m46SBt8MvtlVBbEzTBlimBdMNfaArt/9PohdkN7/2hN7xA/UnZA03K3eI9RfJomvCDDcEhB9A4/u9uIuQkguhBncs+AGA+6ZAg5q77qAXgXxEm85rbF1rTloLYjtfulLvzP4Ki5vCHPrSo78J0JQWF3y65cqYdi/NvUqXXvRnmJAVPQmUWIBapxCXAAdbMHNTYjnZ9py4Ns7qs/HednZQU6ddyNacsxZQZFWMtDrzxdXIOvLFQuYVIMET5zvr1EF0iw82CN8ZAghLAZQYgLGp7lTocaG/o3gbC9S78uf3isKwdbWZ++rbgrZ1DPhx2I9CqabUWCbEHiAQWYdKbAFr9rcZ2wGyo5vip3/LExHwToc8yD71ViBPozGf6wTLOzfvPtodWDOYqGn/J09ptjgkxNYXn5CwroG+qw/2XmZn36OULPmkpd5nSGYwJt2InMkBDy/MJioTFucl+9cHgMb1+0GTGmNW6GKfeHuBevsYBOGeewKSuOld4HCM8Ld4ksdkLvQRgd2HqhHsDDZkAF1J8ysaqVDgNT3Hthsqy89gi0xb1kEKLgFM/CbzHgaXGrIITir283Qzmbs69EcF4Ejf05sa5Vs2dKgCiC2vJSA+2F6qg+rKuGsRMX4NZSOgPVt5eLgDg79JQ4wYKFZ1ACN7LNT+zYDhUwIsAlfKrENNt9lbccxluIA02+1cpUSizUXMt2nGJaYywlDXHlIFsDI4r8ZhWdf2OJ6vDcYQmufvZLVANxNAtqEfSJqtOW 6+i2RGYJ E71bU5jkBg17LtpKRcrhaWl7AyntF6TRaTcQqgK9zYjtPjZXSK/aySFTJ3vLAzoJX9U0BlUH+Cygi8iebKQHKKiUFFziIEC+oVP1W5lj182rChY69mOHlWnfvmolVG5c0I+atkuUkfbKoOihLn/XYejCT9WQUM1/7G1fu4QHShsIsrbeSJP8629eQ7dBRZl7ENhmlEhoyu0853K+o42zPhB5cV184C6fv7MT+pjr+TAnmHQ3TtOFrKp6fhveQ/CpBSiv+51IkG617Rg3EqVubi0A3gwATPl5UW09meDyXYDWPu9jlQ/Qg0fW/qBSCLZVVnVCjQwT5SnKIyOs68vZA3zrev4u3uSdobIF2AQsc5l3KeVIu+e0dcn/IU/PZCg9UrilhJYtD8YX1kjbl5ZIu4ibJkwGEKDWGxviEKebqppQsoLjuYRfgPG/xAgjDf193ga9PfsPm+1KiluO2+4wP7uQ+1YfDtXw1s2GmiyjasMy+r4lMzLeuZSfMqgxm+0iZJMcxJu/EAZ9cuSZw0WTsGl2ZnIQGKpdfzEPUA2Q/eoD6JbLkotF3enZEZH55G2q2Fu62dFua8uELNgSteCstkrrb3uEOpeAlhQlO0xrj0PwvKk0fmzrT/C7TvA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kexec has 2 modes: A user space driven mode and a kernel driven mode. For the kernel driven mode, kernel code determines the physical addresses of all target buffers that the payload gets copied into. With KHO, we can only safely copy payloads into the "scratch area". Teach the kexec file loader about it, so it only allocates for that area. In addition, enlighten it with support to ask the KHO subsystem for its respective payloads to copy into target memory. Also teach the KHO subsystem how to fill the images for file loads. Signed-off-by: Alexander Graf --- include/linux/kexec.h | 9 ++ kernel/kexec_file.c | 41 ++++++++ kernel/kexec_kho_out.c | 210 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 260 insertions(+) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 765f71976230..39a2990007a8 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -362,6 +362,13 @@ struct kimage { size_t ima_buffer_size; #endif +#ifdef CONFIG_KEXEC_KHO + struct { + struct kexec_buf dt; + struct kexec_buf mem_cache; + } kho; +#endif + /* Core ELF header buffer */ void *elf_headers; unsigned long elf_headers_sz; @@ -543,6 +550,7 @@ static inline bool is_kho_boot(void) /* egest handover metadata */ void kho_reserve_scratch(void); +int kho_fill_kimage(struct kimage *image); int register_kho_notifier(struct notifier_block *nb); int unregister_kho_notifier(struct notifier_block *nb); bool kho_is_active(void); @@ -560,6 +568,7 @@ static inline bool is_kho_boot(void) { return false; } /* egest handover metadata */ static inline void kho_reserve_scratch(void) { } +static inline int kho_fill_kimage(struct kimage *image) { return 0; } static inline int register_kho_notifier(struct notifier_block *nb) { return -EINVAL; } static inline int unregister_kho_notifier(struct notifier_block *nb) { return -EINVAL; } static inline bool kho_is_active(void) { return false; } diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index f9a419cd22d4..d895d0a49bd9 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -113,6 +113,13 @@ void kimage_file_post_load_cleanup(struct kimage *image) image->ima_buffer = NULL; #endif /* CONFIG_IMA_KEXEC */ +#ifdef CONFIG_KEXEC_KHO + kvfree(image->kho.mem_cache.buffer); + image->kho.mem_cache = (struct kexec_buf) {}; + kvfree(image->kho.dt.buffer); + image->kho.dt = (struct kexec_buf) {}; +#endif + /* See if architecture has anything to cleanup post load */ arch_kimage_file_post_load_cleanup(image); @@ -249,6 +256,11 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, /* IMA needs to pass the measurement list to the next kernel. */ ima_add_kexec_buffer(image); + /* If KHO is active, add its images to the list */ + ret = kho_fill_kimage(image); + if (ret) + goto out; + /* Call image load handler */ ldata = kexec_image_load_default(image); @@ -518,6 +530,24 @@ static int locate_mem_hole_callback(struct resource *res, void *arg) return locate_mem_hole_bottom_up(start, end, kbuf); } +#ifdef CONFIG_KEXEC_KHO +static int kexec_walk_kho_scratch(struct kexec_buf *kbuf, + int (*func)(struct resource *, void *)) +{ + int ret = 0; + + struct resource res = { + .start = kho_scratch_phys, + .end = kho_scratch_phys + kho_scratch_len, + }; + + /* Try to fit the kimage into our KHO scratch region */ + ret = func(&res, kbuf); + + return ret; +} +#endif + #ifdef CONFIG_ARCH_KEEP_MEMBLOCK static int kexec_walk_memblock(struct kexec_buf *kbuf, int (*func)(struct resource *, void *)) @@ -612,6 +642,17 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf) if (kbuf->mem != KEXEC_BUF_MEM_UNKNOWN) return 0; +#ifdef CONFIG_KEXEC_KHO + /* + * If KHO is active, only use KHO scratch memory. All other memory + * could potentially be handed over. + */ + if (kho_is_active() && kbuf->image->type != KEXEC_TYPE_CRASH) { + ret = kexec_walk_kho_scratch(kbuf, locate_mem_hole_callback); + return ret == 1 ? 0 : -EADDRNOTAVAIL; + } +#endif + if (!IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) ret = kexec_walk_resources(kbuf, locate_mem_hole_callback); else diff --git a/kernel/kexec_kho_out.c b/kernel/kexec_kho_out.c index 765cf6ba7a46..2cf5755f5e4a 100644 --- a/kernel/kexec_kho_out.c +++ b/kernel/kexec_kho_out.c @@ -50,6 +50,216 @@ int unregister_kho_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_kho_notifier); +static int kho_mem_cache_add(void *fdt, struct kho_mem *mem_cache, int size, + struct kho_mem *new_mem) +{ + int entries = size / sizeof(*mem_cache); + u64 new_start = new_mem->addr; + u64 new_end = new_mem->addr + new_mem->len; + u64 prev_start = 0; + u64 prev_end = 0; + int i; + + if (WARN_ON((new_start < (kho_scratch_phys + kho_scratch_len)) && + (new_end > kho_scratch_phys))) { + pr_err("KHO memory runs over scratch memory"); + return -EINVAL; + } + + /* + * We walk the existing sorted mem cache and find the spot where this + * new entry would start, so we can insert it right there. + */ + for (i = 0; i < entries; i++) { + struct kho_mem *mem = &mem_cache[i]; + u64 mem_end = (mem->addr + mem->len); + + if (mem_end < new_start) { + /* No overlap */ + prev_start = mem->addr; + prev_end = mem->addr + mem->len; + continue; + } else if ((new_start >= mem->addr) && (new_end <= mem_end)) { + /* new_mem fits into mem, skip */ + return size; + } else if ((new_end >= mem->addr) && (new_start <= mem_end)) { + /* new_mem and mem overlap, fold them */ + bool remove = false; + + mem->addr = min(new_start, mem->addr); + mem->len = max(mem_end, new_end) - mem->addr; + mem_end = (mem->addr + mem->len); + + if (i > 0 && prev_end >= mem->addr) { + /* We now overlap with the previous mem, fold */ + struct kho_mem *prev = &mem_cache[i - 1]; + + prev->addr = min(prev->addr, mem->addr); + prev->len = max(mem_end, prev_end) - prev->addr; + remove = true; + } else if (i < (entries - 1) && mem_end >= mem_cache[i + 1].addr) { + /* We now overlap with the next mem, fold */ + struct kho_mem *next = &mem_cache[i + 1]; + u64 next_end = (next->addr + next->len); + + next->addr = min(next->addr, mem->addr); + next->len = max(mem_end, next_end) - next->addr; + remove = true; + } + + if (remove) { + /* We folded this mem into another, remove it */ + memmove(mem, mem + 1, (entries - i - 1) * sizeof(*mem)); + size -= sizeof(*new_mem); + } + + return size; + } else if (mem->addr > new_end) { + /* + * The mem cache is sorted. If we find the current + * entry start after our new_mem's end, we shot over + * which means we need to add it by creating a new + * hole right after the current entry. + */ + memmove(mem + 1, mem, (entries - i) * sizeof(*mem)); + break; + } + } + + mem_cache[i] = *new_mem; + size += sizeof(*new_mem); + + return size; +} + +/** + * kho_alloc_mem_cache - Allocate and initialize the mem cache kexec_buf + */ +static int kho_alloc_mem_cache(struct kimage *image, void *fdt) +{ + int offset, depth, initial_depth, len; + void *mem_cache; + int size; + + /* Count the elements inside all "mem" properties in the DT */ + size = offset = depth = initial_depth = 0; + for (offset = 0; + offset >= 0 && depth >= initial_depth; + offset = fdt_next_node(fdt, offset, &depth)) { + const struct kho_mem *mems; + + mems = fdt_getprop(fdt, offset, "mem", &len); + if (!mems || len & (sizeof(*mems) - 1)) + continue; + size += len; + } + + /* Allocate based on the max size we determined */ + mem_cache = kvmalloc(size, GFP_KERNEL); + if (!mem_cache) + return -ENOMEM; + + /* And populate the array */ + size = offset = depth = initial_depth = 0; + for (offset = 0; + offset >= 0 && depth >= initial_depth; + offset = fdt_next_node(fdt, offset, &depth)) { + const struct kho_mem *mems; + int nr_mems, i; + + mems = fdt_getprop(fdt, offset, "mem", &len); + if (!mems || len & (sizeof(*mems) - 1)) + continue; + + for (i = 0, nr_mems = len / sizeof(*mems); i < nr_mems; i++) { + const struct kho_mem *mem = &mems[i]; + ulong mstart = PAGE_ALIGN_DOWN(mem->addr); + ulong mend = PAGE_ALIGN(mem->addr + mem->len); + struct kho_mem cmem = { + .addr = mstart, + .len = (mend - mstart), + }; + + size = kho_mem_cache_add(fdt, mem_cache, size, &cmem); + if (size < 0) + return size; + } + } + + image->kho.mem_cache.buffer = mem_cache; + image->kho.mem_cache.bufsz = size; + image->kho.mem_cache.memsz = size; + + return 0; +} + +int kho_fill_kimage(struct kimage *image) +{ + int err = 0; + void *dt; + + mutex_lock(&kho.lock); + + if (!kho.active) + goto out; + + /* Initialize kexec_buf for mem_cache */ + image->kho.mem_cache = (struct kexec_buf) { + .image = image, + .buffer = NULL, + .bufsz = 0, + .mem = KEXEC_BUF_MEM_UNKNOWN, + .memsz = 0, + .buf_align = SZ_64K, /* Makes it easier to map */ + .buf_max = ULONG_MAX, + .top_down = true, + }; + + /* + * We need to make all allocations visible here via the mem_cache so that + * kho_is_destination_range() can identify overlapping regions and ensure + * that no kimage (including the DT one) lands on handed over memory. + * + * Since we conveniently already built an array of all allocations, let's + * pass that on to the target kernel so that reuse it to initialize its + * memory blocks. + */ + err = kho_alloc_mem_cache(image, kho.dt); + if (err) + goto out; + + err = kexec_add_buffer(&image->kho.mem_cache); + if (err) + goto out; + + /* + * Create a kexec copy of the DT here. We need this because lifetime may + * be different between kho.dt and the kimage + */ + dt = kvmemdup(kho.dt, kho.dt_len, GFP_KERNEL); + if (!dt) { + err = -ENOMEM; + goto out; + } + + /* Allocate target memory for kho dt */ + image->kho.dt = (struct kexec_buf) { + .image = image, + .buffer = dt, + .bufsz = kho.dt_len, + .mem = KEXEC_BUF_MEM_UNKNOWN, + .memsz = kho.dt_len, + .buf_align = SZ_64K, /* Makes it easier to map */ + .buf_max = ULONG_MAX, + .top_down = true, + }; + err = kexec_add_buffer(&image->kho.dt); + +out: + mutex_unlock(&kho.lock); + return err; +} + bool kho_is_active(void) { return kho.active; From patchwork Fri Dec 22 19:35:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13503725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48121C41535 for ; Fri, 22 Dec 2023 19:37:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DB1AA8D0006; Fri, 22 Dec 2023 14:37:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D61AA8D0001; Fri, 22 Dec 2023 14:37:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C29358D0006; Fri, 22 Dec 2023 14:37:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B2E488D0001 for ; Fri, 22 Dec 2023 14:37:32 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 91878160820 for ; Fri, 22 Dec 2023 19:37:32 +0000 (UTC) X-FDA: 81595463544.11.3729578 Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) by imf17.hostedemail.com (Postfix) with ESMTP id 8151840002 for ; Fri, 22 Dec 2023 19:37:30 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=STtkU6M5; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf17.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703273850; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZLYCMQzZdYknax/wRIALOVvHrvD35sO3OPXkV3K/85M=; b=qiAu1Kxt9bXEJ08sqtVjayC7oRWBb2eNighm2mLiI71g0nPmKtTX5XyOIRoPV/QOpr7gm9 kzKYKSjq/EW/looOb9sjM1/twPV/Bun19Jwlf240vtSedMWrXNwlmY8v77VcyNhZq6w8dj k/2ic4KIlk3kwRLszbdndOg2aGM8j9U= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=STtkU6M5; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf17.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703273850; a=rsa-sha256; cv=none; b=C2qNdLXJoNcuLjsQ4zFsYC2XuJ/M9IzXbLUvb2VEd1fbKFZQbCR3KssltwMNo/rTu/Nl3m 6Duffg5TOnRk+T8Wv+7k7w71YekihzSaE+wHj3PpJxeQHM/BdrLKmG0Z3r0MherqzLd+TM wRBjJ6RcYBwUd45ayOKSLHStpm13v7w= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1703273850; x=1734809850; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZLYCMQzZdYknax/wRIALOVvHrvD35sO3OPXkV3K/85M=; b=STtkU6M5KfiXFD9yUdd9iPHv6gHDfldq+Nh01rQF+8aBPo+LZOcHcRXh r9WZ6nioU1W5hMjZlmyNzY1qqsRVE4rc+1mKV9cVKCBgM5xf1s40omtb3 hByU6RiNwuB8waw6o5UpDf032iE5T0n5Ln5/APH2e1SxbbVpYiYrXL8/5 s=; X-IronPort-AV: E=Sophos;i="6.04,297,1695686400"; d="scan'208";a="627319050" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-iad-1a-m6i4x-b5bd57cf.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 19:37:28 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (iad7-ws-svc-p70-lb3-vlan3.iad.amazon.com [10.32.235.38]) by email-inbound-relay-iad-1a-m6i4x-b5bd57cf.us-east-1.amazon.com (Postfix) with ESMTPS id C66E4499DC; Fri, 22 Dec 2023 19:37:21 +0000 (UTC) Received: from EX19MTAUWB001.ant.amazon.com [10.0.21.151:42367] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.15.218:2525] with esmtp (Farcaster) id 7d43c3c0-a068-40c5-899c-569b1b9ac52f; Fri, 22 Dec 2023 19:37:20 +0000 (UTC) X-Farcaster-Flow-ID: 7d43c3c0-a068-40c5-899c-569b1b9ac52f Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWB001.ant.amazon.com (10.250.64.248) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:37:20 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:37:16 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH v2 06/17] kexec: Add config option for KHO Date: Fri, 22 Dec 2023 19:35:56 +0000 Message-ID: <20231222193607.15474-7-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231222193607.15474-1-graf@amazon.com> References: <20231222193607.15474-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D037UWB001.ant.amazon.com (10.13.138.123) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspam-User: X-Stat-Signature: i5jh735ymi4xysf8tag5rceqnyurweag X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8151840002 X-HE-Tag: 1703273850-384993 X-HE-Meta: U2FsdGVkX19fgPA6DiueTxy8/fjysjjHPFBQgTIcebO85nZSNuUyi8Nawtwt+w0YHazvMKT2aMmckjX0F/EH7er5gDqP8UqLeldKehXxDLoNuFT1ElkIojnemEb9bmTuS7itRg3V1qns2+xaxplibl7D5yHSsgKoPOoskMQESWG7+KA3Q2o+u/ZLRKVsmbckfsCU/NTHq+dCpFrd/HnLvuICIaE6QxQmkxhq4zCcgXVK8JUBK9f0mrwZrlHXJJ73C7DE6kE/XTJY4QcOqqyVKa1N2JSPxfY+8Yx/Alv5FiKIRu3L1MbcuTX2OVFRySFtgmUUlKMVPmrPwzKu8XlvmmfjiDmiyO7ZfxelLvVtx4pv/ulYCyPWeREhXo91AGrG2hV5njHuEnqCtafMq9sChttxBt+Z2tIUWNXgzFdQzuuyVn3wpY+N61emxa46ksGHaPQBjT7JreLsw8hAV1C+Bl9U3sEX0jszjmzKeE3g8d3lUriTsZF8NATsPueXcibaP72Nw4Lz1kFGwfMnhG/yfRWuaPeN35WJRxmd4uY+J+p5wkVsKNfljiUFI1TXKx7YKQ3jUgM5AZD0EJ9DtIa8NmE4nCNsTZBYvDvar/mrBW8i321Z+6QGWg9nB4UAZTsfWvi5qrRW/2wgiErHR+RUffNy3hyTl1zzamrvclUAFol40Ex8osPIH5r4Nti/imy7krptlIF/LGQnbduvthNPRN5dYAYJVdJpcGtKjQN6ph3+go75TW3+FqdsdJs+PVEBh2knQHsq4uvZiu5tjPfPNgnzoBt795HMGyaHvcNGDyhC7RnItUUBxrJ/kArHIEkmcI3ctYUjjOOtOGRkPh17xiCQem1RcydGkccuQ97Ax0c6gEQDjEaoVRXwGyQnGXwDX7OAeEtovGJUT2EkwgH5cl5AiqRDKerOeWm1AiloEqlFS++msJntKH9L6RLADAcEmn6qjlO5cJS8bbx0ARA HZdDAIdS 42/Ynl2+1oLDI5YmkphLyvEsViW5PiYuDw7MAWH58MMQphZbVSlDb3qx6jFuDgcrQTNpbBArnmOfsL5G+bF+qr8Co1CT1/uNe4pUh8kDFosjGsU6sdeO0dLHkeul3pHleHfYnut/866vXYUBr0p6HsjcmlbdJGS9LvLLEj4MOF1N04Kq4nNUj3/iACs9EHKepWo2lVosqMA02XIAh2+fcOVdPoNDPSQNT6jXiA86d6CidtSI5mj87ohMrd936bf2X3kPc0vNRj2VNsBpI1+vGW8K22x9U+ZV0JUJ2xApESinvtsvKbjGNu2ncyxAYW8lvpf1LEFQA+dSZpNsSB7CdMV5IqYj+VgOJBkW6HFfZRI7sAqKdB1Vh9XHPA5rAWH+pgHGYvOYusE39pbvV1AEFGPeLamq3VbQ7EFuQrsjo6ughmTJ+sHY/O750+TUbuvQ/+4m2x9sjgDtOWQfo9gVxExJmSVm+cbCdl9KUv1nRegRtLVJBPnErY6cwEuEJG+niQjyXe7rP6zc7sWNiDDvlNvO736DXKj8RlZqZlOQRIqBY0qomelMdOSnmteGRNvG23pLOMxUjEAKK/NC4/aS5A22Lwh+yISbadR6EGIPDvOxbyyajmCMsAMjKxw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We have all generic code in place now to support Kexec with KHO. This patch adds a config option that depends on architecture support to enable KHO support. Signed-off-by: Alexander Graf --- kernel/Kconfig.kexec | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec index 2fd510256604..909ab28f1341 100644 --- a/kernel/Kconfig.kexec +++ b/kernel/Kconfig.kexec @@ -91,6 +91,19 @@ config KEXEC_JUMP Jump between original kernel and kexeced kernel and invoke code in physical address mode via KEXEC +config KEXEC_KHO + bool "kexec handover" + depends on ARCH_SUPPORTS_KEXEC_KHO + depends on KEXEC + select MEMBLOCK_SCRATCH + select LIBFDT + select CMA + help + Allow kexec to hand over state across kernels by generating and + passing additional metadata to the target kernel. This is useful + to keep data or state alive across the kexec. For this to work, + both source and target kernels need to have this option enabled. + config CRASH_DUMP bool "kernel crash dumps" depends on ARCH_SUPPORTS_CRASH_DUMP From patchwork Fri Dec 22 19:35:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13503726 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE6A9C41535 for ; Fri, 22 Dec 2023 19:37:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 880FC8D0007; Fri, 22 Dec 2023 14:37:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 80A7B8D0001; Fri, 22 Dec 2023 14:37:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 684C98D0007; Fri, 22 Dec 2023 14:37:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 553828D0001 for ; Fri, 22 Dec 2023 14:37:35 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3782DA2679 for ; Fri, 22 Dec 2023 19:37:35 +0000 (UTC) X-FDA: 81595463670.24.4CEC1A2 Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) by imf27.hostedemail.com (Postfix) with ESMTP id 3402240013 for ; Fri, 22 Dec 2023 19:37:33 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=Oui6GziM; spf=pass (imf27.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 52.95.49.90 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703273853; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wy3QrJqbxk6W3Bwc5BKomfQATEZZd0nBtswCXV4R7vM=; b=1lZCz3BIu+2NXETqIbudqJ3OkmLesHsdqWzct3LGTH0bYaEuAFGy6M9/uxwj3GsxLlnySN oPNCmasulJfAmztlnnPrAgsRZUOo35/gsjQt6Gm4Twg3I20qIftzX5ABQ2mBB6Knvvfbih NUX0ue6pCCYwdbJuXGF0ncLsXYKA8Jg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703273853; a=rsa-sha256; cv=none; b=4asHQ5YaUKnrl39b4SxOjYRe28JZNmJR0B1IWKDYVGqc9F1k+emuoVwODmN1jwFYZ0rho/ csbTPEn44zCy4OF3dpOK7+ClpetJX4ytqW4xyW7B3RHaD20yTjVU4hxA2dUxfvQ8qvd4mu EzCdaG/2fNKAFwtzMK/suOWUnqalEc8= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=Oui6GziM; spf=pass (imf27.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 52.95.49.90 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1703273853; x=1734809853; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Wy3QrJqbxk6W3Bwc5BKomfQATEZZd0nBtswCXV4R7vM=; b=Oui6GziM57NycQafdP/K4L4pLgmtOoIwi9a/t04WLeS3NNCaIhTeE0Zu DBFVMMulmV1zewEfiYk/jUDr4ywn9yTuFxtmq0Ox6TxJ4fPEv7scShgJ3 WKcz3P1m854FIRxlV/+9VShZx6hUW61awsPI2FNdVkMC79bY+ySVeS5+8 g=; X-IronPort-AV: E=Sophos;i="6.04,297,1695686400"; d="scan'208";a="375893923" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-iad-1e-m6i4x-a65ebc6e.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-6002.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 19:37:32 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (iad7-ws-svc-p70-lb3-vlan3.iad.amazon.com [10.32.235.38]) by email-inbound-relay-iad-1e-m6i4x-a65ebc6e.us-east-1.amazon.com (Postfix) with ESMTPS id 8C23B69595; Fri, 22 Dec 2023 19:37:25 +0000 (UTC) Received: from EX19MTAUWC002.ant.amazon.com [10.0.38.20:5239] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.56.23:2525] with esmtp (Farcaster) id ae470575-ba66-49bc-99c6-e3e90a579872; Fri, 22 Dec 2023 19:37:24 +0000 (UTC) X-Farcaster-Flow-ID: ae470575-ba66-49bc-99c6-e3e90a579872 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWC002.ant.amazon.com (10.250.64.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:37:24 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:37:20 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH v2 07/17] kexec: Add documentation for KHO Date: Fri, 22 Dec 2023 19:35:57 +0000 Message-ID: <20231222193607.15474-8-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231222193607.15474-1-graf@amazon.com> References: <20231222193607.15474-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D037UWB001.ant.amazon.com (10.13.138.123) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspamd-Queue-Id: 3402240013 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: wrrsyr635z8xgpzwndjgmfyd8q6z4xjj X-HE-Tag: 1703273853-488189 X-HE-Meta: U2FsdGVkX1/FBtIaNVuvuV1u3yHjxxr3eN/lOdH+QZLUGsWYbIGo1BrNdHhcGFi/hJZHKrlYK1uNm6i00McauQA7cX4giqcHi+1uOFbRmhgb63E/U7ddVLV/oQ8CahHR5vBx0Izw63nno/EfEvHmexs81E01wraIHKFtLNAVkVY1SyMj3p41cVzJ25zNcS9YngMMzKxJIruQFZCol/p8y8eH5uLMwAq0rBFGBIj0jhPjdfp/gBefP/uErAU2a5y9XlLkeY3Y678ZvX8gnmzcUgE4sJ9n7+K1O/BU25hWtl9sGHHJwWhWpqahbvZbobfD8PQ6MoKuddP22EElOOEnKYAMfZ1y/7i4mDzPskOeSttLcEJ7W+zXyd8a0GU81eHF/VNY1yyLgEIklt4nr6aqJZzCAryCOHjiUkJY9icg4LeMZo6iiwvOxfAhbbnytJqweCkLg6zzT5H8miYSDWkDxUJY7qyN9+r0KOjjfJGl1EoxJQjQZpWwh9Nx6TnbxIWwdAZnFgnetsckAGL5+rb1qwJRFrVQG4ZEiYAp/eFFPpnxUZNundQ56enjH2zhBDhH2bYCuDvX5j57BbO5Lx1LfXRyewn9U1dx0QKAhfjSL74i5BfKyHzlvhslsZmoC2Rsw2N5mF6CaqmN3Cr+GHM9toF0B13akbMz8e1i5BALt15uMzFkyiBJt0QpWBArEWxaP2DnAqSOE7AU94rqiw0V0x7mos286JMV2YWxd0YmveMD1rtZv3sgR6b9Rz+8cB/3DD/iCROdkRP4STaG4shu1XT4g/fiCczxrl4VgjA+Mz2sJ7lhnE+U2XZs/LdgBZN2Ntmgkl7lWpzBwVn/JHCvABlKi263BNTrWhqJoX6O+NBSaoCe9c90DKFcYOIC75ICQ5QBEp/HC0ZqxJZ53dMcKv2fUvUgPkc/m2U6+NRgCVbVnPqr74b6z9dw0mg6qeYBU2cuKW61tv9qkFzUfqT dCzgdOPc ONRgD2euNZSxFOxIoGfS6+JXDoGJggXE0EFM5GuB7fdvUTziXeDnqsDSDigj72ll4KYe8+CDPksonb+0HG+S1eInGh0jlYOhrhPIMKmV/BzvOxqEKa/YQ75TlkGr5KCnPk+jqD4fvRvigA4e/jZEfDvM+UFNJoYcU2EfYP/BtjXkOrbMUjGPDbfNIF4Yawb1H8NDXBg6OJz8/qIAkKSjyMOQVKIcn6TTs9MYoyrnbNcotxZcB9WncKW5lYCSSOExgaPkOca+8inBH4f8mtKEWG7MJshNpLtvu06PV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With KHO in place, let's add documentation that describes what it is and how to use it. Signed-off-by: Alexander Graf --- Documentation/kho/concepts.rst | 88 ++++++++++++++++++++++++++++++++ Documentation/kho/index.rst | 19 +++++++ Documentation/kho/usage.rst | 57 +++++++++++++++++++++ Documentation/subsystem-apis.rst | 1 + 4 files changed, 165 insertions(+) create mode 100644 Documentation/kho/concepts.rst create mode 100644 Documentation/kho/index.rst create mode 100644 Documentation/kho/usage.rst diff --git a/Documentation/kho/concepts.rst b/Documentation/kho/concepts.rst new file mode 100644 index 000000000000..8e4fe8c57865 --- /dev/null +++ b/Documentation/kho/concepts.rst @@ -0,0 +1,88 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +======================= +Kexec Handover Concepts +======================= + +Kexec HandOver (KHO) is a mechanism that allows Linux to preserve state - +arbitrary properties as well as memory locations - across kexec. + +It introduces multiple concepts: + +KHO Device Tree +--------------- + +Every KHO kexec carries a KHO specific flattened device tree blob that +describes the state of the system. Device drivers can register to KHO to +serialize their state before kexec. After KHO, device drivers can read +the device tree and extract previous state. + +KHO only uses the fdt container format and libfdt library, but does not +adhere to the same property semantics that normal device trees do: Properties +are passed in native endianness and standardized properties like ``regs`` and +``ranges`` do not exist, hence there are no ``#...-cells`` properties. + +KHO introduces a new concept to its device tree: ``mem`` properties. A +``mem`` property can inside any subnode in the device tree. When present, +it contains an array of physical memory ranges that the new kernel must mark +as reserved on boot. It is recommended, but not required, to make these ranges +as physically contiguous as possible to reduce the number of array elements :: + + struct kho_mem { + __u64 addr; + __u64 len; + }; + +After boot, drivers can call the kho subsystem to transfer ownership of memory +that was reserved via a ``mem`` property to themselves to continue using memory +from the previous execution. + +The KHO device tree follows the in-Linux schema requirements. Any element in +the device tree is documented via device tree schema yamls that explain what +data gets transferred. + +Mem cache +--------- + +The new kernel needs to know about all memory reservations, but is unable to +parse the device tree yet in early bootup code because of memory limitations. +To simplify the initial memory reservation flow, the old kernel passes a +preprocessed array of physically contiguous reserved ranges to the new kernel. + +These reservations have to be separate from architectural memory maps and +reservations because they differ on every kexec, while the architectural ones +get passed directly between invocations. + +The less entries this cache contains, the faster the new kernel will boot. + +Scratch Region +-------------- + +To boot into kexec, we need to have a physically contiguous memory range that +contains no handed over memory. Kexec then places the target kernel and initrd +into that region. The new kernel exclusively uses this region for memory +allocations before it ingests the mem cache. + +We guarantee that we always have such a region through the scratch region: On +first boot, you can pass the ``kho_scratch`` kernel command line option. When +it is set, Linux allocates a CMA region of the given size. CMA gives us the +guarantee that no handover pages land in that region, because handover +pages must be at a static physical memory location and CMA enforces that +only movable pages can be located inside. + +After KHO kexec, we ignore the ``kho_scratch`` kernel command line option and +instead reuse the exact same region that was originally allocated. This allows +us to recursively execute any amount of KHO kexecs. Because we used this region +for boot memory allocations and as target memory for kexec blobs, some parts +of that memory region may be reserved. These reservations are irrenevant for +the next KHO, because kexec can overwrite even the original kernel. + +KHO active phase +---------------- + +To enable user space based kexec file loader, the kernel needs to be able to +provide the device tree that describes the previous kernel's state before +performing the actual kexec. The process of generating that device tree is +called serialization. When the device tree is generated, some properties +of the system may become immutable because they are already written down +in the device tree. That state is called the KHO active phase. diff --git a/Documentation/kho/index.rst b/Documentation/kho/index.rst new file mode 100644 index 000000000000..5e7eeeca8520 --- /dev/null +++ b/Documentation/kho/index.rst @@ -0,0 +1,19 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +======================== +Kexec Handover Subsystem +======================== + +.. toctree:: + :maxdepth: 1 + + concepts + usage + +.. only:: subproject and html + + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/kho/usage.rst b/Documentation/kho/usage.rst new file mode 100644 index 000000000000..5efa2a58f9c3 --- /dev/null +++ b/Documentation/kho/usage.rst @@ -0,0 +1,57 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +==================== +Kexec Handover Usage +==================== + +Kexec HandOver (KHO) is a mechanism that allows Linux to preserve state - +arbitrary properties as well as memory locations - across kexec. + +This document expects that you are familiar with the base KHO +:ref:`Documentation/kho/concepts.rst `. If you have not read +them yet, please do so now. + +Prerequisites +------------- + +KHO is available when the ``CONFIG_KEXEC_KHO`` config option is set to y +at compile team. Every KHO producer has its own config option that you +need to enable if you would like to preserve their respective state across +kexec. + +To use KHO, please boot the kernel with the ``kho_scratch`` command +line parameter set to allocate a scratch region. For example +``kho_scratch=512M`` will reserve a 512 MiB scratch region on boot. + +Perform a KHO kexec +------------------- + +Before you can perform a KHO kexec, you need to move the system into the +:ref:`Documentation/kho/concepts.rst ` :: + + $ echo 1 > /sys/kernel/kho/active + +After this command, the KHO device tree is available in ``/sys/kernel/kho/dt``. + +Next, load the target payload and kexec into it. It is important that you +use the ``-s`` parameter to use the in-kernel kexec file loader, as user +space kexec tooling currently has no support for KHO with the user space +based file loader :: + + # kexec -l Image --initrd=initrd -s + # kexec -e + +The new kernel will boot up and contain some of the previous kernel's state. + +For example, if you enabled ``CONFIG_FTRACE_KHO``, the new kernel will contain +the old kernel's trace buffers in ``/sys/kernel/debug/tracing/trace``. + +Abort a KHO exec +---------------- + +You can move the system out of KHO active phase again by calling :: + + $ echo 1 > /sys/kernel/kho/active + +After this command, the KHO device tree is no longer available in +``/sys/kernel/kho/dt``. diff --git a/Documentation/subsystem-apis.rst b/Documentation/subsystem-apis.rst index 930dc23998a0..8207b6514d87 100644 --- a/Documentation/subsystem-apis.rst +++ b/Documentation/subsystem-apis.rst @@ -86,3 +86,4 @@ Storage interfaces misc-devices/index peci/index wmi/index + kho/index From patchwork Fri Dec 22 19:35:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13503727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 431D7C41535 for ; Fri, 22 Dec 2023 19:37:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D15A28D0008; Fri, 22 Dec 2023 14:37:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CC55D8D0001; Fri, 22 Dec 2023 14:37:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF14A8D0008; Fri, 22 Dec 2023 14:37:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9FA558D0001 for ; Fri, 22 Dec 2023 14:37:38 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 770C140761 for ; Fri, 22 Dec 2023 19:37:38 +0000 (UTC) X-FDA: 81595463796.25.1CDD169 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by imf01.hostedemail.com (Postfix) with ESMTP id 5518140004 for ; Fri, 22 Dec 2023 19:37:36 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=i+9PSn1b; spf=pass (imf01.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 207.171.190.10 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703273856; a=rsa-sha256; cv=none; b=nHGkVBRZ0LzZWh6VBiC3t3NwrmVZCxVeqwqg0YF2wb/4XafR5ZFqI2YSiUZ0CVd3D0quff zKL8qtG7tGTjdF/jtpAxF7OM53dFon/1vtWJGgdcfYT6ezaWEYyY2wYb3AGMxorhiiO6FV H9YlKzE2pGNj8OtInjzA2Pw8330RpQY= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=i+9PSn1b; spf=pass (imf01.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 207.171.190.10 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703273856; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4hs41nrvxXplJ/QyH6HSlHLbtIDrbQgqX7R8fKuobQ4=; b=c/WFvJ0uhlNE9+/KGGpWz3uQ5DuLnYbgfSl2NbUKNL1NNjT0ZFJQ8um+UmUcPL/uiF1Hfu HSGMXcajSptuR31wvxFj3dXydDI8USeyjWrDCi5TT8MIKDkpvjhE065aZIiKIUDR5L1QTv HTnZIhgsCE1KnK9cbcexJ40kVnPCY3g= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1703273857; x=1734809857; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4hs41nrvxXplJ/QyH6HSlHLbtIDrbQgqX7R8fKuobQ4=; b=i+9PSn1bh/vie2DiuC6mIormaIRdEWUoOJr9fLPo6AIy/o+35XOb6ieS kQbhZcPI+qdLlpAPZAfavtSeepm/gBNMnmdaxJzChrZE0iWHvmsPv0lgH MytZ+LuyEODa8kuXoScgZLnnSv1gzZzLEi/j46F/0CvdADDTOtboPZS7Q Y=; X-IronPort-AV: E=Sophos;i="6.04,297,1695686400"; d="scan'208";a="318657015" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-pdx-2c-m6i4x-8c5b1df3.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-33001.sea14.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 19:37:32 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan3.pdx.amazon.com [10.39.38.70]) by email-inbound-relay-pdx-2c-m6i4x-8c5b1df3.us-west-2.amazon.com (Postfix) with ESMTPS id DCC8F40D5B; Fri, 22 Dec 2023 19:37:28 +0000 (UTC) Received: from EX19MTAUWA001.ant.amazon.com [10.0.38.20:1484] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.38.150:2525] with esmtp (Farcaster) id 8eb458e3-6f3d-4684-a6b2-2776439c40d0; Fri, 22 Dec 2023 19:37:28 +0000 (UTC) X-Farcaster-Flow-ID: 8eb458e3-6f3d-4684-a6b2-2776439c40d0 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWA001.ant.amazon.com (10.250.64.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:37:28 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:37:24 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH v2 08/17] arm64: Add KHO support Date: Fri, 22 Dec 2023 19:35:58 +0000 Message-ID: <20231222193607.15474-9-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231222193607.15474-1-graf@amazon.com> References: <20231222193607.15474-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D037UWB001.ant.amazon.com (10.13.138.123) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 5518140004 X-Stat-Signature: 4zo8akax357fz4qrpr8obkfdcakfhxet X-Rspam-User: X-HE-Tag: 1703273856-214748 X-HE-Meta: U2FsdGVkX19mjD4rnFXHztW1fPdW8QWni0SXlGNjWgg2msnpl5dPbalN3lk0Q/xU3JyoVbxpMqH4Yk4DuHqataCSeNACxCqD5ZVmLCEBJX5zCFziPeYawEbvR3HgkYTt54/SVf0Wo6FLjJr0Bh2qbUAf91dgkdnPfaPa17Rzqu/idYWRS9XGHtV050x02aI29mq+O/bXpMeByXtSf6jxVDalb7vaDSzdF3KhnrwrDM1A+j+kB+kjh50358vBXlCrn1ELzMCcx6I4JxGf4fN4ixKQStxz3cFmIF93ds24UgYKoz1xgLdeHEZB6snwaw6NiJG7+mCIl1X92aruSEqD0OGrRFDBvtHJFCOYpytCCD2amcgGuSXFBkOOre3CIP5xHxeYigG0FRvc8K7dd45gmiVBM+bQikFXZhcwPb1ao+HeT6PlAcN06R/pQmbGe/MwuAi6Q3NqFj931Ck6sXXb96hCy8E0YlR1f8UJhLP3mBLYrZYPVX8qVlYP3aWLBCCD5vQc2SZLKf0bZtf+/qZvVY1vrvsICmcbskmcQnCvsGkwnv/i37YMVzaJsMUMeA3zZkSiCFUtf5MrXKPwNMPvSF61XVAgtUwOc4QbVHYpDFMOQ6PWkRbZ/KEN3FEeKy7ZIVZf3JqXmoo20d2tdnabtR/kzhJuEHuNUOGDtSzaXRIzptJt8xfUr7+C9zqz5BxhzIdGutDfcfWj8gptgOh64Otpd5Nw5Zk7H18N8X0zTOX9aurbLMTYuaFQyVArB7aSCVhrMj1oQ3fISBIACYL4mwFeXjZOPu9cqQ4PRf41uWX7xkyhWsZ+pjlZ9PD3QTysgNcW9chxw2ZaOSXhDzLh684iBj4IIwWOYOOtvUdLRk5/esqfA6EDkJ/BLb89XpkS03IhlRHqyMtHkTvhBGaa0zsUJ5mVn/hWcN65HV2UpNGh/USOXzlu/zVTAUQD/2YRajVchZ3QzJAk/1YYI7/ ijEo6FCK 0FpLWvXpIeUDH5vM3yWw5ru8VjXAcqcHz0NAbK0cZjMPep3hqPeHbMGxJqEaZ7m+OWwGKUT1vYMpdnAHE+c3Kk9cfpi5LC4xtOlrjO+PE/CDmnx6IW7nxA08zxV40E+FZMXan6aCodD7t9/atg2DpcwSJs7FAbpTxrHkCEgrcDg4dWr/I4BQ5xr0E9May8eoGwg3hAfDSyvIRKbPc9T6likRa47banY3y8yNmcy/W6ViF/UM1x7H0obgSGW4beUZnaKaYd/1Qz/WLKeKsP1lhPKVHE8JIka6T5C2NJhbzY1QZwcc1d97GN/vLuyY4qom15d+E/L/BoRwpwNh+Eg0heMamNC+eR5x2tgRcsZlrSw9O1vSuOXuSLNgV+FVbKvLRixkldXWpxqzvQ/wZ9e8vmE9sPZrmH/BPLVKVLrgN5vRp7ABlHAiFy36Gm7o/XPSx8oaxuKObgMT4RaSkkGOjC9uoTEpu7KPPQO5q21m9+JUv22bahVormW+5Xs5SwaNDewS1+xhu1tAPQdgYp/jx9ND2ONmH1CDtp8+OdEn1lY4gKHPbdVc5A/YTN7+Y2cz5seNGUqAKJ7F0ugaGXCJ3FeyfqvktuzkRjSzeBZHODPTtKFLo92KMlE+eKg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We now have all bits in place to support KHO kexecs. This patch adds awareness of KHO in the kexec file as well as boot path for arm64 and adds the respective kconfig option to the architecture so that it can use KHO successfully. Signed-off-by: Alexander Graf --- v1 -> v2: - test bot warning fix - Change kconfig option to ARCH_SUPPORTS_KEXEC_KHO - s/kho_reserve_mem/kho_reserve_previous_mem/g - s/kho_reserve/kho_reserve_scratch/g - Remove / reduce ifdefs for kho fdt code --- arch/arm64/Kconfig | 3 +++ arch/arm64/kernel/setup.c | 2 ++ arch/arm64/mm/init.c | 8 ++++++ drivers/of/fdt.c | 39 ++++++++++++++++++++++++++++ drivers/of/kexec.c | 54 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 106 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b071a00425d..4a2fd3deaa16 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1495,6 +1495,9 @@ config ARCH_SUPPORTS_KEXEC_IMAGE_VERIFY_SIG config ARCH_DEFAULT_KEXEC_IMAGE_VERIFY_SIG def_bool y +config ARCH_SUPPORTS_KEXEC_KHO + def_bool y + config ARCH_SUPPORTS_CRASH_DUMP def_bool y diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 417a8a86b2db..9aa05b84d202 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -346,6 +346,8 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p) paging_init(); + kho_reserve_previous_mem(); + acpi_table_upgrade(); /* Parse the ACPI tables for possible boot-time configuration */ diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 74c1db8ce271..1a8fc91509af 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -358,6 +358,8 @@ void __init bootmem_init(void) */ arch_reserve_crashkernel(); + kho_reserve_scratch(); + memblock_dump_all(); } @@ -386,6 +388,12 @@ void __init mem_init(void) /* this will put all unused low memory onto the freelists */ memblock_free_all(); + /* + * Now that all KHO pages are marked as reserved, let's flip them back + * to normal pages with accurate refcount. + */ + kho_populate_refcount(); + /* * Check boundaries twice: Some fundamental inconsistencies can be * detected at build time already. diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index bf502ba8da95..f9b9a36fb722 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -1006,6 +1006,42 @@ void __init early_init_dt_check_for_usable_mem_range(void) memblock_add(rgn[i].base, rgn[i].size); } +/** + * early_init_dt_check_kho - Decode info required for kexec handover from DT + */ +static void __init early_init_dt_check_kho(void) +{ + unsigned long node = chosen_node_offset; + u64 kho_start, scratch_start, scratch_size, mem_start, mem_size; + const __be32 *p; + int l; + + if (!IS_ENABLED(CONFIG_KEXEC_KHO) || (long)node < 0) + return; + + p = of_get_flat_dt_prop(node, "linux,kho-dt", &l); + if (l != (dt_root_addr_cells + dt_root_size_cells) * sizeof(__be32)) + return; + + kho_start = dt_mem_next_cell(dt_root_addr_cells, &p); + + p = of_get_flat_dt_prop(node, "linux,kho-scratch", &l); + if (l != (dt_root_addr_cells + dt_root_size_cells) * sizeof(__be32)) + return; + + scratch_start = dt_mem_next_cell(dt_root_addr_cells, &p); + scratch_size = dt_mem_next_cell(dt_root_addr_cells, &p); + + p = of_get_flat_dt_prop(node, "linux,kho-mem", &l); + if (l != (dt_root_addr_cells + dt_root_size_cells) * sizeof(__be32)) + return; + + mem_start = dt_mem_next_cell(dt_root_addr_cells, &p); + mem_size = dt_mem_next_cell(dt_root_addr_cells, &p); + + kho_populate(kho_start, scratch_start, scratch_size, mem_start, mem_size); +} + #ifdef CONFIG_SERIAL_EARLYCON int __init early_init_dt_scan_chosen_stdout(void) @@ -1304,6 +1340,9 @@ void __init early_init_dt_scan_nodes(void) /* Handle linux,usable-memory-range property */ early_init_dt_check_for_usable_mem_range(); + + /* Handle kexec handover */ + early_init_dt_check_kho(); } bool __init early_init_dt_scan(void *params) diff --git a/drivers/of/kexec.c b/drivers/of/kexec.c index 68278340cecf..59070b09ad45 100644 --- a/drivers/of/kexec.c +++ b/drivers/of/kexec.c @@ -264,6 +264,55 @@ static inline int setup_ima_buffer(const struct kimage *image, void *fdt, } #endif /* CONFIG_IMA_KEXEC */ +static int kho_add_chosen(const struct kimage *image, void *fdt, int chosen_node) +{ + void *dt = NULL; + phys_addr_t dt_mem = 0; + phys_addr_t dt_len = 0; + phys_addr_t scratch_mem = 0; + phys_addr_t scratch_len = 0; + void *mem_cache = NULL; + phys_addr_t mem_cache_mem = 0; + phys_addr_t mem_cache_len = 0; + int ret = 0; + +#ifdef CONFIG_KEXEC_KHO + dt = image->kho.dt.buffer; + dt_mem = image->kho.dt.mem; + dt_len = image->kho.dt.bufsz; + + scratch_mem = kho_scratch_phys; + scratch_len = kho_scratch_len; + + mem_cache = image->kho.mem_cache.buffer; + mem_cache_mem = image->kho.mem_cache.mem; + mem_cache_len = image->kho.mem_cache.bufsz; +#endif + + if (!dt || !mem_cache) + goto out; + + pr_debug("Adding kho metadata to DT"); + + ret = fdt_appendprop_addrrange(fdt, 0, chosen_node, "linux,kho-dt", + dt_mem, dt_len); + if (ret) + goto out; + + ret = fdt_appendprop_addrrange(fdt, 0, chosen_node, "linux,kho-scratch", + scratch_mem, scratch_len); + if (ret) + goto out; + + ret = fdt_appendprop_addrrange(fdt, 0, chosen_node, "linux,kho-mem", + mem_cache_mem, mem_cache_len); + if (ret) + goto out; + +out: + return ret; +} + /* * of_kexec_alloc_and_setup_fdt - Alloc and setup a new Flattened Device Tree * @@ -412,6 +461,11 @@ void *of_kexec_alloc_and_setup_fdt(const struct kimage *image, } } + /* Add kho metadata if this is a KHO image */ + ret = kho_add_chosen(image, fdt, chosen_node); + if (ret) + goto out; + /* add bootargs */ if (cmdline) { ret = fdt_setprop_string(fdt, chosen_node, "bootargs", cmdline); From patchwork Fri Dec 22 19:35:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13503746 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7406CC41535 for ; Fri, 22 Dec 2023 19:38:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B9FC8D000A; Fri, 22 Dec 2023 14:38:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0426A8D0001; Fri, 22 Dec 2023 14:38:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB1258D000A; Fri, 22 Dec 2023 14:38:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B301E8D0001 for ; Fri, 22 Dec 2023 14:38:06 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8AF0AA06D0 for ; Fri, 22 Dec 2023 19:38:06 +0000 (UTC) X-FDA: 81595464972.29.F6CBB4C Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by imf05.hostedemail.com (Postfix) with ESMTP id 88B1D10001D for ; Fri, 22 Dec 2023 19:37:59 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=ETnOgEAs; spf=pass (imf05.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703273884; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GaYoJtOeVSzgmYS5gukWExN0RDseJvcoWD78p448UjE=; b=a15ogxAtHPPZ+Bhgk26Wj8uKrxkYmdpva+pwqmXVjL4ePM7eO9REfhhyVNHexsYE4MiXbM 1MzqzmsIAMTgOe030SFywbyUxuWg7cPXd3fuaS+9bOfX9U7+7di/E7vWvNL6rANCYhvNSv /kXcZbWgSfIU0PBvfoZ2cBdOrG1Q7hE= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=ETnOgEAs; spf=pass (imf05.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703273884; a=rsa-sha256; cv=none; b=xjfHxHe5qxOtsjaRhFhlYmhycGRmv3ChUGew+RHgd7AOhw8caixS3wm+It4uBV8oS8N41r KwrsppXWEiluwljTo2tHDITb5VRba2dBj7tMcVZ/vtfogRKRCfj9JKr3EJ0gTWhQM0Xv+z KARFuTAbuod0Q5/gULkfMM9wNumYyrk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1703273884; x=1734809884; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GaYoJtOeVSzgmYS5gukWExN0RDseJvcoWD78p448UjE=; b=ETnOgEAsYXGrkeSpedEkyZcaPmidFsacwn9nHugtKoORR49D7595eMN7 dKnLvffivWZ+znL8+gawlQLWFpAotkxqe2zeXlFuZGFd7D4WP4897rU+W 5qM6bPfgeHmdmpX+Fx2WpVn0HKC4INF10icQ6tiURCnM8swm0x8PMqhtC U=; X-IronPort-AV: E=Sophos;i="6.04,297,1695686400"; d="scan'208";a="53450946" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-pdx-2b-m6i4x-cadc3fbd.us-west-2.amazon.com) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 19:37:56 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan2.pdx.amazon.com [10.39.38.66]) by email-inbound-relay-pdx-2b-m6i4x-cadc3fbd.us-west-2.amazon.com (Postfix) with ESMTPS id 78634A0ABC; Fri, 22 Dec 2023 19:37:54 +0000 (UTC) Received: from EX19MTAUWC001.ant.amazon.com [10.0.7.35:14564] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.56.23:2525] with esmtp (Farcaster) id 78760084-28fe-498e-84c2-ddc497dcf85c; Fri, 22 Dec 2023 19:37:54 +0000 (UTC) X-Farcaster-Flow-ID: 78760084-28fe-498e-84c2-ddc497dcf85c Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:37:54 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:37:50 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH v2 09/17] x86: Add KHO support Date: Fri, 22 Dec 2023 19:35:59 +0000 Message-ID: <20231222193607.15474-10-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231222193607.15474-1-graf@amazon.com> References: <20231222193607.15474-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D041UWA004.ant.amazon.com (10.13.139.9) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspamd-Queue-Id: 88B1D10001D X-Rspam-User: X-Stat-Signature: jyp6qi8pucfc1n4brw3dps6wuftp3gjp X-Rspamd-Server: rspam01 X-HE-Tag: 1703273879-849617 X-HE-Meta: U2FsdGVkX1+r9qQumZOMRhVnzs7fl0KlatBv4UrPKj+APLEogs7HyUCPE6shsH9j6QAClLnui1+IHiVlcwjtr8kiRFjMn4ST03lyx6gD29x+S4irL04tac3e/kcwthtFHzfAn+WXhNPDDzlQCQO4vHI83I5GUKyEuG3htcETexNHwhxPIGZfXvCrAQeAQmDJksqi+Z5E5hLmaMPbRQ5jDOPYaxtlaUornD17BiFxIs9afEIqCy6QCDpmz8HNbmMgPgFR+TDe2CgdINXimI17X3+XFAzpWzWb4M6zmHo3eUmlr4oTnhYgfHo09YyJzqxZSVAWCP+ckmWFLuQn5X/5qFpixEePX5S4kwqY8DK4CcOXKaZKUAkCHUWhHkf5xLX0BNc9Hu9Btr6wIBpRp1ttlIl6bukargSWTskGBCU9AT5hnKrU0kH2ULnvmvAljFv1SScr7Agr1IS8KXtDWDjXj3sUGHCLIeslPWkS0oy1/MCOJ/Xrfz10E+sHSvxgO6AFLgnCHJHZzp/X9UV1zSdBDLi1hKP1cPV9eGCzZrBuGiE7J5HjT2h6SXjIa0Xv2CBvIv41sKfSiA/M+u4qZ1N9oiWBx3Cg8gQwJqKZcsfrR5S0vwuTK08B0epsnBjCKupjzWV9jXNozxa/blT1aFr2sN+mynPCH/3V4lN5Hr2I2cn2BnruUN/Fjv2SFvCMLvTZAZXwltQlPOY98TbJIlJosOWVX8LPawnW3fOTYRszzMhTgpDoAHoekLvi7D5dE7TKt6LGte4zxPlS/d8KknsUB6KUBymAEPcN14//NN9mERJokMN3BuPsQpdwnVU1vc9QSNjE4xCImahdcdyurka0I6r+xMwjg/xz+EMUkiHmhP3HPozbmXUvDAkmsMM4VRdLSU6VVlcqT3GRslDNHuqx99MVkuGM2lpURNH7G7Q6Gi9ok2nXQ3/uW5fkVYhoFunAIzKUTIYdsOjbPhoeg4G A4wnKbXm WhT6FZIDnQR5gdRR/unPq122mEfkH3g4tZ09ICIUJwmq/YW8dv5d8AUG3g6kOS4YlDBkZcnqp0HS1OPNv2nTdVI2h6e5JxioAD0f30wpOCcPvBAP0zI87mNnHIZlCmWFCafLVAqO6bOiU0WBjEbMqLOZ8L+7lJVtn75cDKc/MgzeAwwReAnorMtLikhCIPo5YGG6U8Cgjtqov96RKhR3agLRx1Fduv7LVJnVO5Pmd28Rmgp+zR7EBgJ/Wyeqb6dOotJlGoql3R8CBm25LtNUi/XwMvnKAtOHUZe0lLXCU7rN8a9UqubiJvhAauVSUJrhFyp+BVCsqcplwrkmgJYEY6JWYsb9/c4aEanc+D+Z9JSBZcS2dgULrbYxKZ7aHfJyHJtRq2ttEd9vL96Big3OUo2fdIy8UguTFUmWdichrB3boSACxY0LhtaUpVdzEiCPAL5mrJnDy13ZvI3GuzUyUznIznZE1et+lcEp8mLvTRIEn9KzgR6lp668Q8q9mneKYUV2iBAqz89KTz/U16U7XdCeHt/sfIbB0tZqxkznIebvHfvT6HVQ2BpPI8x7BkJHBPSV0nmaR+TVrJn5p66LdurLKfRGHlAeWXEsjXZJTkXeq8fWwGUzTdUdaPA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We now have all bits in place to support KHO kexecs. This patch adds awareness of KHO in the kexec file as well as boot path for x86 and adds the respective kconfig option to the architecture so that it can use KHO successfully. In addition, it enlightens it decompression code with KHO so that its KASLR location finder only considers memory regions that are not already occupied by KHO memory. Signed-off-by: Alexander Graf --- v1 -> v2: - Change kconfig option to ARCH_SUPPORTS_KEXEC_KHO - s/kho_reserve_mem/kho_reserve_previous_mem/g - s/kho_reserve/kho_reserve_scratch/g --- arch/x86/Kconfig | 3 ++ arch/x86/boot/compressed/kaslr.c | 55 +++++++++++++++++++++++++++ arch/x86/include/uapi/asm/bootparam.h | 15 +++++++- arch/x86/kernel/e820.c | 9 +++++ arch/x86/kernel/kexec-bzimage64.c | 39 +++++++++++++++++++ arch/x86/kernel/setup.c | 46 ++++++++++++++++++++++ arch/x86/mm/init_32.c | 7 ++++ arch/x86/mm/init_64.c | 7 ++++ 8 files changed, 180 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3762f41bb092..9aa31b3dcebc 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2094,6 +2094,9 @@ config ARCH_SUPPORTS_KEXEC_BZIMAGE_VERIFY_SIG config ARCH_SUPPORTS_KEXEC_JUMP def_bool y +config ARCH_SUPPORTS_KEXEC_KHO + def_bool y + config ARCH_SUPPORTS_CRASH_DUMP def_bool X86_64 || (X86_32 && HIGHMEM) diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index dec961c6d16a..93ea292e4c18 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -472,6 +473,60 @@ static bool mem_avoid_overlap(struct mem_vector *img, } } +#ifdef CONFIG_KEXEC_KHO + if (ptr->type == SETUP_KEXEC_KHO) { + struct kho_data *kho = (struct kho_data *)ptr->data; + struct kho_mem *mems = (void *)kho->mem_cache_addr; + int nr_mems = kho->mem_cache_size / sizeof(*mems); + int i; + + /* Avoid the mem cache */ + avoid = (struct mem_vector) { + .start = kho->mem_cache_addr, + .size = kho->mem_cache_size, + }; + + if (mem_overlaps(img, &avoid) && (avoid.start < earliest)) { + *overlap = avoid; + earliest = overlap->start; + is_overlapping = true; + } + + /* And the KHO DT */ + avoid = (struct mem_vector) { + .start = kho->dt_addr, + .size = kho->dt_size, + }; + + if (mem_overlaps(img, &avoid) && (avoid.start < earliest)) { + *overlap = avoid; + earliest = overlap->start; + is_overlapping = true; + } + + /* As well as any other KHO memory reservations */ + for (i = 0; i < nr_mems; i++) { + avoid = (struct mem_vector) { + .start = mems[i].addr, + .size = mems[i].len, + }; + + /* + * This mem starts after our current break. + * The array is sorted, so we're done. + */ + if (avoid.start >= earliest) + break; + + if (mem_overlaps(img, &avoid)) { + *overlap = avoid; + earliest = overlap->start; + is_overlapping = true; + } + } + } +#endif + ptr = (struct setup_data *)(unsigned long)ptr->next; } diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index 01d19fc22346..013af38a9673 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -13,7 +13,8 @@ #define SETUP_CC_BLOB 7 #define SETUP_IMA 8 #define SETUP_RNG_SEED 9 -#define SETUP_ENUM_MAX SETUP_RNG_SEED +#define SETUP_KEXEC_KHO 10 +#define SETUP_ENUM_MAX SETUP_KEXEC_KHO #define SETUP_INDIRECT (1<<31) #define SETUP_TYPE_MAX (SETUP_ENUM_MAX | SETUP_INDIRECT) @@ -181,6 +182,18 @@ struct ima_setup_data { __u64 size; } __attribute__((packed)); +/* + * Locations of kexec handover metadata + */ +struct kho_data { + __u64 dt_addr; + __u64 dt_size; + __u64 scratch_addr; + __u64 scratch_size; + __u64 mem_cache_addr; + __u64 mem_cache_size; +} __attribute__((packed)); + /* The so-called "zeropage" */ struct boot_params { struct screen_info screen_info; /* 0x000 */ diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index fb8cf953380d..c891b83f5b1c 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1341,6 +1341,15 @@ void __init e820__memblock_setup(void) continue; memblock_add(entry->addr, entry->size); + + /* + * At this point with KHO we only allocate from scratch memory + * and only from memory below ISA_END_ADDRESS. Make sure that + * when we add memory for the eligible range, we add it as + * scratch memory so that we can resize the memblocks array. + */ + if (is_kho_boot() && (end <= ISA_END_ADDRESS)) + memblock_mark_scratch(entry->addr, end); } /* Throw away partial pages: */ diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index a61c12c01270..0cb8d0650a02 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -233,6 +234,33 @@ setup_ima_state(const struct kimage *image, struct boot_params *params, #endif /* CONFIG_IMA_KEXEC */ } +static void setup_kho(const struct kimage *image, struct boot_params *params, + unsigned long params_load_addr, + unsigned int setup_data_offset) +{ +#ifdef CONFIG_KEXEC_KHO + struct setup_data *sd = (void *)params + setup_data_offset; + struct kho_data *kho = (void *)sd + sizeof(*sd); + + sd->type = SETUP_KEXEC_KHO; + sd->len = sizeof(struct kho_data); + + /* Only add if we have all KHO images in place */ + if (!image->kho.dt.buffer || !image->kho.mem_cache.buffer) + return; + + /* Add setup data */ + kho->dt_addr = image->kho.dt.mem; + kho->dt_size = image->kho.dt.bufsz; + kho->scratch_addr = kho_scratch_phys; + kho->scratch_size = kho_scratch_len; + kho->mem_cache_addr = image->kho.mem_cache.mem; + kho->mem_cache_size = image->kho.mem_cache.bufsz; + sd->next = params->hdr.setup_data; + params->hdr.setup_data = params_load_addr + setup_data_offset; +#endif /* CONFIG_KEXEC_KHO */ +} + static int setup_boot_parameters(struct kimage *image, struct boot_params *params, unsigned long params_load_addr, @@ -305,6 +333,13 @@ setup_boot_parameters(struct kimage *image, struct boot_params *params, sizeof(struct ima_setup_data); } + if (IS_ENABLED(CONFIG_KEXEC_KHO)) { + /* Setup space to store preservation metadata */ + setup_kho(image, params, params_load_addr, setup_data_offset); + setup_data_offset += sizeof(struct setup_data) + + sizeof(struct kho_data); + } + /* Setup RNG seed */ setup_rng_seed(params, params_load_addr, setup_data_offset); @@ -470,6 +505,10 @@ static void *bzImage64_load(struct kimage *image, char *kernel, kbuf.bufsz += sizeof(struct setup_data) + sizeof(struct ima_setup_data); + if (IS_ENABLED(CONFIG_KEXEC_KHO)) + kbuf.bufsz += sizeof(struct setup_data) + + sizeof(struct kho_data); + params = kzalloc(kbuf.bufsz, GFP_KERNEL); if (!params) return ERR_PTR(-ENOMEM); diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 1526747bedf2..bd21f9a601a2 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -382,6 +382,29 @@ int __init ima_get_kexec_buffer(void **addr, size_t *size) } #endif +static void __init add_kho(u64 phys_addr, u32 data_len) +{ +#ifdef CONFIG_KEXEC_KHO + struct kho_data *kho; + u64 addr = phys_addr + sizeof(struct setup_data); + u64 size = data_len - sizeof(struct setup_data); + + kho = early_memremap(addr, size); + if (!kho) { + pr_warn("setup: failed to memremap kho data (0x%llx, 0x%llx)\n", + addr, size); + return; + } + + kho_populate(kho->dt_addr, kho->scratch_addr, kho->scratch_size, + kho->mem_cache_addr, kho->mem_cache_size); + + early_memunmap(kho, size); +#else + pr_warn("Passed KHO data, but CONFIG_KEXEC_KHO not set. Ignoring.\n"); +#endif +} + static void __init parse_setup_data(void) { struct setup_data *data; @@ -410,6 +433,9 @@ static void __init parse_setup_data(void) case SETUP_IMA: add_early_ima_buffer(pa_data); break; + case SETUP_KEXEC_KHO: + add_kho(pa_data, data_len); + break; case SETUP_RNG_SEED: data = early_memremap(pa_data, data_len); add_bootloader_randomness(data->data, data->len); @@ -989,8 +1015,26 @@ void __init setup_arch(char **cmdline_p) cleanup_highmap(); memblock_set_current_limit(ISA_END_ADDRESS); + e820__memblock_setup(); + /* + * We can resize memblocks at this point, let's dump all KHO + * reservations in and switch from scratch-only to normal allocations + */ + kho_reserve_previous_mem(); + + /* Allocations now skip scratch mem, return low 1M to the pool */ + if (is_kho_boot()) { + u64 i; + phys_addr_t base, end; + + __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, + MEMBLOCK_SCRATCH, &base, &end, NULL) + if (end <= ISA_END_ADDRESS) + memblock_clear_scratch(base, end - base); + } + /* * Needs to run after memblock setup because it needs the physical * memory size. @@ -1106,6 +1150,8 @@ void __init setup_arch(char **cmdline_p) */ arch_reserve_crashkernel(); + kho_reserve_scratch(); + memblock_find_dma_reserve(); if (!early_xdbc_setup_hardware()) diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index b63403d7179d..6c3810afed04 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -738,6 +739,12 @@ void __init mem_init(void) after_bootmem = 1; x86_init.hyper.init_after_bootmem(); + /* + * Now that all KHO pages are marked as reserved, let's flip them back + * to normal pages with accurate refcount. + */ + kho_populate_refcount(); + /* * Check boundaries twice: Some fundamental inconsistencies can * be detected at build time already. diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index a190aae8ceaf..3ce1a4767610 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -1339,6 +1340,12 @@ void __init mem_init(void) after_bootmem = 1; x86_init.hyper.init_after_bootmem(); + /* + * Now that all KHO pages are marked as reserved, let's flip them back + * to normal pages with accurate refcount. + */ + kho_populate_refcount(); + /* * Must be done after boot memory is put on freelist, because here we * might set fields in deferred struct pages that have not yet been From patchwork Fri Dec 22 19:36:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13503745 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1445C46CD8 for ; Fri, 22 Dec 2023 19:38:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D3938D0009; Fri, 22 Dec 2023 14:38:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 683F18D0001; Fri, 22 Dec 2023 14:38:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 572D18D0009; Fri, 22 Dec 2023 14:38:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 4806C8D0001 for ; Fri, 22 Dec 2023 14:38:05 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2F8A8A2679 for ; Fri, 22 Dec 2023 19:38:05 +0000 (UTC) X-FDA: 81595464930.10.B6ECD5E Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) by imf02.hostedemail.com (Postfix) with ESMTP id F1A148000F for ; Fri, 22 Dec 2023 19:38:02 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=rXvBFuxR; spf=pass (imf02.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703273883; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dwhP9TX+qhNuxIBvRPP4G1COkpJHtM0yxRM6jOIEYyk=; b=mdr+Gw74Ig3qA63WydeUt/Dh3NDcu+OHe/Mpw7ybH7IwOVcR8929ab8aP+KcYdv59h0nPT M18lEwibja0vMLZskkHnm9G4nAGihYRV86bB2xczwJCcIsNmUX+Se5fEt1LyRGi428NH1Q a7bY9UjsE1HaE9mB//06jfU+beF75ZM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703273883; a=rsa-sha256; cv=none; b=Sp0g2p2+kMPo8Dl6xGBLC1qDY5Ih06Fj4t/7cQbZpq4IOE1v+13Pp2wE5rw+40eb0YjDVc OoakRfgkWHUvr0tSl0Lv/BsYnaJQygLBn2+VWYARfOpPS/nzBuMPhE1EwGOSX44fiRko/R bZtdNzxq0hWKJNoCENrL/XFBEC+9JDk= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=rXvBFuxR; spf=pass (imf02.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1703273883; x=1734809883; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dwhP9TX+qhNuxIBvRPP4G1COkpJHtM0yxRM6jOIEYyk=; b=rXvBFuxRlR9ABZj70+dt5PjUIAlbfk4XKVZyU2AYh0xVaqCMMF+Fcu4C rB1VlG10XKgLHdn9IQPviSGbuS2LSR5c/CYf8kqedCVex5HL3DSGHkcJ8 4F/HZWBS9tDMPvV7Piaazc4rgNEksSIwtQnLfICufSbs8xdf7lEP1uqZA E=; X-IronPort-AV: E=Sophos;i="6.04,297,1695686400"; d="scan'208";a="262157740" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-pdx-1box-2bm6-32cf6363.us-west-2.amazon.com) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 19:38:00 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan2.pdx.amazon.com [10.39.38.66]) by email-inbound-relay-pdx-1box-2bm6-32cf6363.us-west-2.amazon.com (Postfix) with ESMTPS id BDB358B25B; Fri, 22 Dec 2023 19:37:58 +0000 (UTC) Received: from EX19MTAUWB002.ant.amazon.com [10.0.7.35:5256] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.6.144:2525] with esmtp (Farcaster) id ce48a30f-05bb-48a3-9e16-f52e7034e6f9; Fri, 22 Dec 2023 19:37:58 +0000 (UTC) X-Farcaster-Flow-ID: ce48a30f-05bb-48a3-9e16-f52e7034e6f9 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWB002.ant.amazon.com (10.250.64.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:37:58 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:37:54 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH v2 10/17] tracing: Initialize fields before registering Date: Fri, 22 Dec 2023 19:36:00 +0000 Message-ID: <20231222193607.15474-11-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231222193607.15474-1-graf@amazon.com> References: <20231222193607.15474-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D041UWA004.ant.amazon.com (10.13.139.9) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Stat-Signature: cqqf9sfhgeq5omujiepa4hj66qwpa3is X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: F1A148000F X-Rspam-User: X-HE-Tag: 1703273882-511625 X-HE-Meta: U2FsdGVkX1+wQLuQA9wCkO3bNm0Xqw2KUPVuU2ftvJb0/oYLUdQsUoko9cHsgBPsaYUW2NvfBkcHLE/Ft5YlRmdEC14xtpAmafG7bUfA6puagjsAzXDXz8iTiXXGo8rtk9fixAIkkS1y2egNq2FHy0nGXlAMXdnA1oCcMvSZHe4ZDvmfrfQpnCFipUSAg9KcypdSmV2MRlLp2kSKdYepWr5WLVprocSr6XOpC1IOgtuUGutvNN20RM5Z/I50oSiZQIN3HqeTcOBuOVqVXOcbIylChQwGQfEzIJ7Lg2ylY7+JTnOLHrkeCqLjS2J1WmJnrmrUZLn7OZmUFNY67UGoR1Shd5wy56j2XC4D+4BX/1wLrvIWKjOQbCNWjcbSRYd5/tCyuy+BoNJWc3dSaNY1CS3Uwv9iJx7KciQCqzbcTw0ha9P8MiMvmX4+b2AEmZVq7t4ipyvs6oKwAZ7r40/v5jYFsHqmMqWwZZW3zJLr0maQMAmUn+ByMe0rJbuWvGTPnqG1bJGYOsSt/qzBustO0dXuJEZi7TNolmttAn1JL90X1wgLQToaf/dBO3KyTWAzES1u0qd3mE5fgclDrn16IltpKA3p1P0db2CPgD2GGg+QPRzbOCH+XB6K5IrqlFyatcWWpNoLCth3d+Scxt40rlVCVP4ZK85kNxcFqnNM4ik+hTv+dHP8FFa+ZpqqMxnW2a7KQh17NB+OmbTgRJPzVrlxMaYpVmnKqUr4881mlY9QAbxL4CWou+2WI4XKL4nnQ1cdGaZtcgmpIGwrU+iMn7v+fGf0ZIuo27DHVDP4lwRJau6w2mlsbIbN2YikOlPqULxidilzs06/CFPd6wZF68xbffMfe0qUpWhqARWmcpyjnsw+4oKK+nPuYaF7XpINm+i7a+n8HSBM0fNLB3A9is/5iSEM4ACjZbvhP4hoJosLu+L3kxXh1mChZm2vLxH8AwKAKMKOAXVxqDZhL9I vcJ98k79 s7nk9W4UL9e9HfiedrbXR7jRcU68CNrJXXvtRKZiHpnfWRWcemgguykklaLI3R9CxCMdllosMzdC7uVeLQwYH20iRjUawKnSRRLvHfTjr4P4qsF8MtZzJYFVuWG9TcULXYrywotT690HklUl/NISBQNCZFcxYNbiGz4iHVMCqYuEap8Ej36Btv9x0xF4LyWvfUf++NAfAXAlvhtnH/SW5BZYoXF58Fx4znlYQyOS07nka9UnarDoSbF4cmm9yGypCn3iAZjGCnAaDv6elAjRCoe0rMuKD3kRCRr8gBJAw6ea1gutRV9oKU5O4arIXydrIgJqiTmuM2mxgmVYleocc78feTHzvYzCPyCSFyRHCYERk5bGyBvkfe3P814webEzPmO8m3SARNwuFDX5pXo549zDADGbNRFxfWiekJ5o9ge4QhYDHRe/IqEPoQOaMtmpg31sJNS3OqySHtORQuGI3s6DJ1Hv8sRHxrd9XrgI3kMypEh+JWbHCfr+rO/QI6fLcBntNXVWofPN6OZBOp+A3BThPpC8pia3axaE61B/I4y1F4Lwx+FCroHVYWK7wATlnXHGULsvIOrjMWTNU9FzcCaWZN8CetX8YoIAucUuxWBDqJBtmL61INXV48NLyD6rSEFxeEZtIVRU7viFLMpkyFY9MsA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With KHO, we need to know all event fields before we allocate an event type for a trace event so that we can recover it based on a previous execution context. Before this patch, fields were only initialized after we allocated a type id. After this patch, we try to allocate it early as well. This patch leaves the old late initialization logic in place. The field init code already validates whether there are any fields present, which means it's legal to call it multiple times. This way we're sure we don't miss any call sites. Signed-off-by: Alexander Graf --- include/linux/trace_events.h | 1 + kernel/trace/trace_events.c | 14 +++++++++----- kernel/trace/trace_events_synth.c | 14 +++++++++----- kernel/trace/trace_events_user.c | 4 ++++ kernel/trace/trace_probe.c | 4 ++++ 5 files changed, 27 insertions(+), 10 deletions(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index d68ff9b1247f..8fe8970b48e3 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -842,6 +842,7 @@ extern int trace_define_field(struct trace_event_call *call, const char *type, extern int trace_add_event_call(struct trace_event_call *call); extern int trace_remove_event_call(struct trace_event_call *call); extern int trace_event_get_offsets(struct trace_event_call *call); +extern int trace_event_define_fields(struct trace_event_call *call); int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); int trace_set_clr_event(const char *system, const char *event, int set); diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index f29e815ca5b2..fbf8be1d2806 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -462,6 +462,11 @@ static void test_event_printk(struct trace_event_call *call) int trace_event_raw_init(struct trace_event_call *call) { int id; + int ret; + + ret = trace_event_define_fields(call); + if (ret) + return ret; id = register_trace_event(&call->event); if (!id) @@ -2402,8 +2407,7 @@ event_subsystem_dir(struct trace_array *tr, const char *name, return NULL; } -static int -event_define_fields(struct trace_event_call *call) +int trace_event_define_fields(struct trace_event_call *call) { struct list_head *head; int ret = 0; @@ -2592,7 +2596,7 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file) file->ei = ei; - ret = event_define_fields(call); + ret = trace_event_define_fields(call); if (ret < 0) { pr_warn("Could not initialize trace point events/%s\n", name); return ret; @@ -2978,7 +2982,7 @@ __trace_add_new_event(struct trace_event_call *call, struct trace_array *tr) if (eventdir_initialized) return event_create_dir(tr->event_dir, file); else - return event_define_fields(call); + return trace_event_define_fields(call); } static void trace_early_triggers(struct trace_event_file *file, const char *name) @@ -3015,7 +3019,7 @@ __trace_early_add_new_event(struct trace_event_call *call, if (!file) return -ENOMEM; - ret = event_define_fields(call); + ret = trace_event_define_fields(call); if (ret) return ret; diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c index 846e02c0fb59..4db41218ccf7 100644 --- a/kernel/trace/trace_events_synth.c +++ b/kernel/trace/trace_events_synth.c @@ -880,17 +880,21 @@ static int register_synth_event(struct synth_event *event) INIT_LIST_HEAD(&call->class->fields); call->event.funcs = &synth_event_funcs; call->class->fields_array = synth_event_fields_array; + call->flags = TRACE_EVENT_FL_TRACEPOINT; + call->class->reg = trace_event_reg; + call->class->probe = trace_event_raw_event_synth; + call->data = event; + call->tp = event->tp; + + ret = trace_event_define_fields(call); + if (ret) + goto out; ret = register_trace_event(&call->event); if (!ret) { ret = -ENODEV; goto out; } - call->flags = TRACE_EVENT_FL_TRACEPOINT; - call->class->reg = trace_event_reg; - call->class->probe = trace_event_raw_event_synth; - call->data = event; - call->tp = event->tp; ret = trace_add_event_call(call); if (ret) { diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 9365ce407426..b9837e987525 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -1900,6 +1900,10 @@ static int user_event_trace_register(struct user_event *user) { int ret; + ret = trace_event_define_fields(&user->call); + if (ret) + return ret; + ret = register_trace_event(&user->call.event); if (!ret) diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c index 4dc74d73fc1d..da73a02246d8 100644 --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -1835,6 +1835,10 @@ int trace_probe_register_event_call(struct trace_probe *tp) trace_probe_name(tp))) return -EEXIST; + ret = trace_event_define_fields(call); + if (ret) + return ret; + ret = register_trace_event(&call->event); if (!ret) return -ENODEV; From patchwork Fri Dec 22 19:36:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13503747 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 966F4C46CD8 for ; Fri, 22 Dec 2023 19:38:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 304AE8D000B; Fri, 22 Dec 2023 14:38:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B3B68D0001; Fri, 22 Dec 2023 14:38:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12D3F8D000B; Fri, 22 Dec 2023 14:38:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F21B18D0001 for ; Fri, 22 Dec 2023 14:38:14 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C49D1809C9 for ; Fri, 22 Dec 2023 19:38:14 +0000 (UTC) X-FDA: 81595465308.23.2703648 Received: from smtp-fw-9106.amazon.com (smtp-fw-9106.amazon.com [207.171.188.206]) by imf22.hostedemail.com (Postfix) with ESMTP id A48D1C0019 for ; Fri, 22 Dec 2023 19:38:11 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=W7BmPt3H; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf22.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 207.171.188.206 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703273891; a=rsa-sha256; cv=none; b=1qeHNhOB2bOGJid+sLYeLRGZv6Agu/WyCS7xdCdNa2hggoTY+bRXSsqb0d1j1qrOfQbzgm Wk10DIvPiXNOw6+kmTJ0/Jz/w9rJPwE5c7O82jXTYkD/AmiE6Vy+dPzeFuRhSaFgLLrVRq 40f9N3x9pLZX5GBIC5tNXv5b9gBR6ns= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=W7BmPt3H; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf22.hostedemail.com: domain of "prvs=71347c2e1=graf@amazon.de" designates 207.171.188.206 as permitted sender) smtp.mailfrom="prvs=71347c2e1=graf@amazon.de" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703273891; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qmhUZjf5F0irRMFPH4YUunFpNZODTBv7SKMl/vOJmCU=; b=Gv3mtn+zZbUagQu3oGmaHYDG/aUIrzIMlKH1JmEheaWD63Y6qKkFyOEY+rMo8arUKPLcSw jK/06Wo8PdjrabH2sQRDeJcMr3A4HaTSnJ0j9GkUZNJwOAGgrFcnl5ll6RnHD64ur4Qllg UlrC5fBUcLnVNKKk+mt6Ysi+yvgo/XE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1703273892; x=1734809892; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qmhUZjf5F0irRMFPH4YUunFpNZODTBv7SKMl/vOJmCU=; b=W7BmPt3HxaoGPoT68i41WOdZ/L1m1D0EejxvgEUnjsgFL1UqG+45KCWs vHILyEEz/YK6l0/2DcrxAJFp3sCFBuK5PgXnDDytPid81pgrj//LdBkhm M2stP43wBwl0/npWN68WEiyXKxM2lBemOb/nwTpRzB+Y9+FFIhSxp/5+t g=; X-IronPort-AV: E=Sophos;i="6.04,297,1695686400"; d="scan'208";a="692781574" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-pdx-2a-m6i4x-1197e3af.us-west-2.amazon.com) ([10.25.36.210]) by smtp-border-fw-9106.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 19:38:06 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan2.pdx.amazon.com [10.39.38.66]) by email-inbound-relay-pdx-2a-m6i4x-1197e3af.us-west-2.amazon.com (Postfix) with ESMTPS id 2C677100395; Fri, 22 Dec 2023 19:38:04 +0000 (UTC) Received: from EX19MTAUWA002.ant.amazon.com [10.0.21.151:8114] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.6.30:2525] with esmtp (Farcaster) id 118258fa-6e62-4bd3-8f88-fab34a1b9f0a; Fri, 22 Dec 2023 19:38:03 +0000 (UTC) X-Farcaster-Flow-ID: 118258fa-6e62-4bd3-8f88-fab34a1b9f0a Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:38:02 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Dec 2023 19:37:58 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH v2 11/17] tracing: Introduce kho serialization Date: Fri, 22 Dec 2023 19:36:01 +0000 Message-ID: <20231222193607.15474-12-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231222193607.15474-1-graf@amazon.com> References: <20231222193607.15474-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D041UWA004.ant.amazon.com (10.13.139.9) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A48D1C0019 X-Stat-Signature: 15jusgwjeq5qubzmwe3p6rnzsw5kc6yj X-HE-Tag: 1703273891-699006 X-HE-Meta: U2FsdGVkX1/xyQOBNJV+jEyyEQKjAS3cBvujaA9q8mCce15/aJ+51g0yHDVepGF7G76QFE57MuK09RHaiG3taonS28ZTvwpm/gjJad5/Frbe7KQaxKMb7u3nu7PZLzK190s7ZZ0clFFQvAih8DQAsadd9M5AiivTl9E/r9w0dVHcbiLTt+Mwsvs+DDwCKDvmSMTjeRbPTUeWmvWG/hNKxa7EnnN8x38QwqQIlgSAfdKqgS7N7GG7VxWOzlL/jUEzGFixrrhQX8u3vr62pnTwL3YOBspatPSwBY9ajMepG6RMMXx/y8bCMS2d+EE0WGYsfXGeJLLboFuMC53faIYAPt1AK7H9AEmSmUlevQ3SprriMByRik33H+90F1/V/iDgqP3EUU5Jj9gdDbm7fT3F1lnPhCI1g8RK6TKA9rFn+2uMx2MEt7cB3kbPsCnpdU3lPzErG4kTp6GciF6t/1PbnxoI87k6OCM5oO8PZwjd3nLGgyab/ZakrUkvyLPMXbC+nVs7543v9utRYotxViVglCQ+R1ZZlqR6nOUbWjnnBpTzT+cRiyrHjEDQio+faAu8C36naUnnQwgPIWkZM4fUpXRxBG1DDj7MCN0OJag8hQGcndJesJN3iBiTx+aorCGC9mePR7XIZKOxywl9BOKiHQWZYPpYnICstyQ7FBh5nP4GnszsEcm38ucmgTK2Y+9uOIx8n1TLjwIaBacRmt41MrAzebN8mCzSoU+9xCFQIE5VTPXFVe/oVclKyGtbm8vOlAyuDND6K6nkpO/As9I3RPTJmwGANG2Y8OZyKw1k0dWa/LBsyURa/LKN3n+WcMYUOrrLpGI2M0ugobGdH2E0JyCxOYvSjEsVYFaLUp8Hb5o6q/VsFAzIVpZ1meGjXZZ8P4XkRvi0BmYpVBUe9Pc9h3bo9y6CdGyAtH14qQ8SxbTwIaMge/OBeW5bBWFrrX+yjafs/6L2NRwebzB81oD rKS+jIwo GHoUulugTW9pva7DO1OJNmIlxZ2yLEixoO9xAa+I9CX9wV4RQOT13RrqtMmInP2DognE6HLFV1GVy9AX7vI03+kGDUbOatOeST6zopREtxrEaBc6XESkqZvT1fb8xJzvusiVD3I+/fCrrpDNcyVmjzhw55xX07qTfsJWBxd+b7TiiLO7zlh8VxSwAtz0pRUDzVZB5nLXBzMM8teH6R4+OGHiMpm6vT2zSC5H43WSa1LV/ooOwYH7/J7SeqGPfghVwxfH59PlSuuQWGnhDPeGfaj0APHXIquRNiZxdj1TZVKez0foKr/27oxH0cUzEiiU10K0w2Wrd4UFnbGQp7qZXwzfdP02fDFRBERyPYsWDMFf5sLdhb9elxW/uuRWhz/J7vYTEPQ2bfIbFODAthQ4Fpv5qfW0+NStb0F25ecbuHFt9Df8JeyKhalRCZv61KPvawlGnYlKDRrV3CkFACCEd3tWVzbtNYxf+utFyW7KRZWcjxqcyzKAGZRC25fp9YluwqCgV+R2Zxzrt3wye3AX3CUi17MJuKMiHayxSo6lu1pXeeNk8yhjLTxjExXdPlqi/Ym8PUC+BmZ3DU2WUc6rztpA4WyfV/dcrk+NCPFWaGvLYCH5xrmh+Nc4Z7JUalamaArckQ0fP75Po1r1IkPn7cO4R0g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We want to be able to transfer ftrace state from one kernel to the next. To start off with, let's establish all the boiler plate to get a write hook when KHO wants to serialize and fill out basic data. Follow-up patches will fill in serialization of ring buffers and events. Signed-off-by: Alexander Graf --- v1 -> v2: - Remove ifdefs --- kernel/trace/trace.c | 47 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 199df497db07..6ec31879b4eb 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -32,6 +32,7 @@ #include #include #include +#include #include #include #include @@ -866,6 +867,8 @@ static struct tracer *trace_types __read_mostly; */ DEFINE_MUTEX(trace_types_lock); +static bool trace_in_kho; + /* * serialize the access of the ring buffer * @@ -10560,12 +10563,56 @@ void __init early_trace_init(void) init_events(); } +static int trace_kho_notifier(struct notifier_block *self, + unsigned long cmd, + void *v) +{ + const char compatible[] = "ftrace-v1"; + void *fdt = v; + int err = 0; + + switch (cmd) { + case KEXEC_KHO_ABORT: + if (trace_in_kho) + mutex_unlock(&trace_types_lock); + trace_in_kho = false; + return NOTIFY_DONE; + case KEXEC_KHO_DUMP: + /* Handled below */ + break; + default: + return NOTIFY_BAD; + } + + if (unlikely(tracing_disabled)) + return NOTIFY_DONE; + + err |= fdt_begin_node(fdt, "ftrace"); + err |= fdt_property(fdt, "compatible", compatible, sizeof(compatible)); + err |= fdt_end_node(fdt); + + if (!err) { + /* Hold all future allocations */ + mutex_lock(&trace_types_lock); + trace_in_kho = true; + } + + return err ? NOTIFY_BAD : NOTIFY_DONE; +} + +static struct notifier_block trace_kho_nb = { + .notifier_call = trace_kho_notifier, +}; + void __init trace_init(void) { trace_event_init(); if (boot_instance_index) enable_instances(); + + if (IS_ENABLED(CONFIG_FTRACE_KHO)) + register_kho_notifier(&trace_kho_nb); } __init static void clear_boot_tracer(void)