Message ID | 20250206132754.2596694-1-rppt@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | kexec: introduce Kexec HandOver (KHO) | expand |
On Thu, 6 Feb 2025 15:27:40 +0200 Mike Rapoport <rppt@kernel.org> wrote: > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > just to make things simpler instead of ftrace we decided to preserve > "reserve_mem" regions. > > The patches are also available in git: > https://git.kernel.org/rppt/h/kho/v4 > > > Kexec today considers itself purely a boot loader: When we enter the new > kernel, any state the previous kernel left behind is irrelevant and the > new kernel reinitializes the system. I tossed this into mm.git for some testing and exposure. What merge path are you anticipating? Review activity seems pretty thin thus far?
On Thu, Feb 6, 2025 at 7:29 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 6 Feb 2025 15:27:40 +0200 Mike Rapoport <rppt@kernel.org> wrote: > > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > > just to make things simpler instead of ftrace we decided to preserve > > "reserve_mem" regions. > > > > The patches are also available in git: > > https://git.kernel.org/rppt/h/kho/v4 > > > > > > Kexec today considers itself purely a boot loader: When we enter the new > > kernel, any state the previous kernel left behind is irrelevant and the > > new kernel reinitializes the system. > > I tossed this into mm.git for some testing and exposure. > > What merge path are you anticipating? > > Review activity seems pretty thin thus far? KHO is going to be discussed at the upcoming lsfmm, we are also planning to send v5 of this patch series (discussed with Mike Rapoport) in a couple of weeks. It will include enhancements needed for the hypervisor live update scenario: 1. Allow nodes to be added to the KHO tree at any time 2. Remove "activate" (I will also send a live update framework that provides the activate functionality). 3. Allow serialization during shutdown. 4. Decouple KHO from kexec_file_load(), as kexec_file_load() should not be used during live update blackout time. 5. Enable multithreaded serialization by using hash-table as an intermediate step before conversion to FDT. Pasha
My x86_64 allmodconfig sayeth: WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0xca (section: .text) -> memblock_alloc_try_nid (section: .init.text) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0xf5 (section: .text) -> scratch_scale (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x100 (section: .text) -> scratch_size_global (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x11d (section: .text) -> scratch_size_global (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x129 (section: .text) -> scratch_size_pernode (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x14e (section: .text) -> memblock_phys_alloc_range (section: .init.text) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x261 (section: .text) -> scratch_size_pernode (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x26d (section: .text) -> scratch_size_pernode (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x29b (section: .text) -> memblock_alloc_range_nid (section: .init.text) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x334 (section: .text) -> scratch_scale (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x33f (section: .text) -> scratch_size_global (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x363 (section: .text) -> scratch_scale (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x371 (section: .text) -> scratch_size_global (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x3a1 (section: .text) -> scratch_scale (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0x3af (section: .text) -> scratch_size_global (section: .init.data)
On Thu, Feb 06, 2025 at 08:50:30PM -0800, Andrew Morton wrote: > My x86_64 allmodconfig sayeth: > > WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0xca (section: .text) -> memblock_alloc_try_nid (section: .init.text) > WARNING: modpost: vmlinux: section mismatch in reference: kho_reserve_scratch+0xf5 (section: .text) -> scratch_scale (section: .init.data) This should fix it: From 176767698d4ac5b7cddffe16677b60cb18dce786 Mon Sep 17 00:00:00 2001 From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Date: Fri, 7 Feb 2025 09:57:09 +0200 Subject: [PATCH] kho: make kho_reserve_scratch and kho_init_reserved_pages __init Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> --- kernel/kexec_handover.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index c21ea2a09d47..e0b92011afe2 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -620,7 +620,7 @@ static phys_addr_t __init scratch_size(int nid) * active. This CMA region will only be used for movable pages which are not a * problem for us during KHO because we can just move them somewhere else. */ -static void kho_reserve_scratch(void) +static void __init kho_reserve_scratch(void) { phys_addr_t addr, size; int nid, i = 1; @@ -672,7 +672,7 @@ static void kho_reserve_scratch(void) * Scan the DT for any memory ranges and make sure they are reserved in * memblock, otherwise they will end up in a weird state on free lists. */ -static void kho_init_reserved_pages(void) +static void __init kho_init_reserved_pages(void) { const void *fdt = kho_get_fdt(); int offset = 0, depth = 0, initial_depth = 0, len;
On Thu, Feb 06, 2025 at 04:29:39PM -0800, Andrew Morton wrote: > On Thu, 6 Feb 2025 15:27:40 +0200 Mike Rapoport <rppt@kernel.org> wrote: > > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > > just to make things simpler instead of ftrace we decided to preserve > > "reserve_mem" regions. > > > > The patches are also available in git: > > https://git.kernel.org/rppt/h/kho/v4 > > > > > > Kexec today considers itself purely a boot loader: When we enter the new > > kernel, any state the previous kernel left behind is irrelevant and the > > new kernel reinitializes the system. > > I tossed this into mm.git for some testing and exposure. > > What merge path are you anticipating? I think your tree is the most appropriate, but let's wait for Acks from x86 and arm64 people ;-) > Review activity seems pretty thin thus far? Yeah :( Maybe with Pasha's version on top of that we'll have more people reviewing. And here is another fixup for a sparse error kbuild reported: From e1e34b96b96b89a01ee31be223c8dfc2ce1c4cbe Mon Sep 17 00:00:00 2001 From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Date: Fri, 7 Feb 2025 09:55:03 +0200 Subject: [PATCH] kho: make bin_attr_dt_kern static Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> --- kernel/kexec_handover.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index c26753d613cb..c21ea2a09d47 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -258,7 +258,7 @@ static ssize_t dt_read(struct file *file, struct kobject *kobj, return count; } -struct bin_attribute bin_attr_dt_kern = __BIN_ATTR(dt, 0400, dt_read, NULL, 0); +static struct bin_attribute bin_attr_dt_kern = __BIN_ATTR(dt, 0400, dt_read, NULL, 0); static int kho_expose_dt(void *fdt) {
On 02/06/25 at 08:28pm, Pasha Tatashin wrote: > On Thu, Feb 6, 2025 at 7:29 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Thu, 6 Feb 2025 15:27:40 +0200 Mike Rapoport <rppt@kernel.org> wrote: > > > > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > > > just to make things simpler instead of ftrace we decided to preserve > > > "reserve_mem" regions. > > > > > > The patches are also available in git: > > > https://git.kernel.org/rppt/h/kho/v4 > > > > > > > > > Kexec today considers itself purely a boot loader: When we enter the new > > > kernel, any state the previous kernel left behind is irrelevant and the > > > new kernel reinitializes the system. > > > > I tossed this into mm.git for some testing and exposure. > > > > What merge path are you anticipating? > > > > Review activity seems pretty thin thus far? > > KHO is going to be discussed at the upcoming lsfmm, we are also > planning to send v5 of this patch series (discussed with Mike > Rapoport) in a couple of weeks. It will include enhancements needed > for the hypervisor live update scenario: So is this V4 still a RFC if v5 will be sent by plan? Should we hold the reviewing until v5? Or this series is a infrustructure building, v5 will add more details as you listed as below. I am a little confused. > > 1. Allow nodes to be added to the KHO tree at any time > 2. Remove "activate" (I will also send a live update framework that > provides the activate functionality). > 3. Allow serialization during shutdown. > 4. Decouple KHO from kexec_file_load(), as kexec_file_load() should > not be used during live update blackout time. > 5. Enable multithreaded serialization by using hash-table as an > intermediate step before conversion to FDT.
Hi Baoquan, On Sat, Feb 08, 2025 at 09:38:27AM +0800, Baoquan He wrote: > On 02/06/25 at 08:28pm, Pasha Tatashin wrote: > > On Thu, Feb 6, 2025 at 7:29 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Thu, 6 Feb 2025 15:27:40 +0200 Mike Rapoport <rppt@kernel.org> wrote: > > > > > > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > > > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > > > > just to make things simpler instead of ftrace we decided to preserve > > > > "reserve_mem" regions. > > > > > > > > The patches are also available in git: > > > > https://git.kernel.org/rppt/h/kho/v4 > > > > > > > > > > > > Kexec today considers itself purely a boot loader: When we enter the new > > > > kernel, any state the previous kernel left behind is irrelevant and the > > > > new kernel reinitializes the system. > > > > > > I tossed this into mm.git for some testing and exposure. > > > > > > What merge path are you anticipating? > > > > > > Review activity seems pretty thin thus far? > > > > KHO is going to be discussed at the upcoming lsfmm, we are also > > planning to send v5 of this patch series (discussed with Mike > > Rapoport) in a couple of weeks. It will include enhancements needed > > for the hypervisor live update scenario: > > So is this V4 still a RFC if v5 will be sent by plan? Should we hold the > reviewing until v5? Or this series is a infrustructure building, v5 will > add more details as you listed as below. I am a little confused. v4 adds the very basic support for kexec handover in the simplest form we could think of. There were discussions on Linux MM Alignment and Hypervisor live update meetings and there people agreed about MVP for KHO that v4 essentially implements. v5 will add more details on top of v4 and I'm not sure there's a consensus about some of them among the people involved in KHO. > > 1. Allow nodes to be added to the KHO tree at any time > > 2. Remove "activate" (I will also send a live update framework that > > provides the activate functionality). > > 3. Allow serialization during shutdown. > > 4. Decouple KHO from kexec_file_load(), as kexec_file_load() should > > not be used during live update blackout time. > > 5. Enable multithreaded serialization by using hash-table as an > > intermediate step before conversion to FDT. >
On 02/08/25 at 10:41am, Mike Rapoport wrote: > Hi Baoquan, > > On Sat, Feb 08, 2025 at 09:38:27AM +0800, Baoquan He wrote: > > On 02/06/25 at 08:28pm, Pasha Tatashin wrote: > > > On Thu, Feb 6, 2025 at 7:29 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > On Thu, 6 Feb 2025 15:27:40 +0200 Mike Rapoport <rppt@kernel.org> wrote: > > > > > > > > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > > > > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > > > > > just to make things simpler instead of ftrace we decided to preserve > > > > > "reserve_mem" regions. > > > > > > > > > > The patches are also available in git: > > > > > https://git.kernel.org/rppt/h/kho/v4 > > > > > > > > > > > > > > > Kexec today considers itself purely a boot loader: When we enter the new > > > > > kernel, any state the previous kernel left behind is irrelevant and the > > > > > new kernel reinitializes the system. > > > > > > > > I tossed this into mm.git for some testing and exposure. > > > > > > > > What merge path are you anticipating? > > > > > > > > Review activity seems pretty thin thus far? > > > > > > KHO is going to be discussed at the upcoming lsfmm, we are also > > > planning to send v5 of this patch series (discussed with Mike > > > Rapoport) in a couple of weeks. It will include enhancements needed > > > for the hypervisor live update scenario: > > > > So is this V4 still a RFC if v5 will be sent by plan? Should we hold the > > reviewing until v5? Or this series is a infrustructure building, v5 will > > add more details as you listed as below. I am a little confused. > > v4 adds the very basic support for kexec handover in the simplest form we > could think of. There were discussions on Linux MM Alignment and Hypervisor > live update meetings and there people agreed about MVP for KHO that v4 > essentially implements. > > v5 will add more details on top of v4 and I'm not sure there's a consensus > about some of them among the people involved in KHO. Thanks for the information. Then I will apply v4 and learn the infrastructure and mechanism firstly. While what sounds more meaningful to me is v4 can be reviewed, then updated and merged. Then another patchset can be posted to add details, if you have reached the consensus on the infrastructure part. With that, posting and reviewing will be much easier. Unless you guys are still discussing the infrastructure part. > > > > 1. Allow nodes to be added to the KHO tree at any time > > > 2. Remove "activate" (I will also send a live update framework that > > > provides the activate functionality). > > > 3. Allow serialization during shutdown. > > > 4. Decouple KHO from kexec_file_load(), as kexec_file_load() should > > > not be used during live update blackout time. > > > 5. Enable multithreaded serialization by using hash-table as an > > > intermediate step before conversion to FDT. > > > > -- > Sincerely yours, > Mike. >
Hi Mike, On Thu, Feb 6, 2025 at 5:28 AM Mike Rapoport <rppt@kernel.org> wrote: > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> > > Hi, > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > just to make things simpler instead of ftrace we decided to preserve > "reserve_mem" regions. > > The patches are also available in git: > https://git.kernel.org/rppt/h/kho/v4 > > > Kexec today considers itself purely a boot loader: When we enter the new > kernel, any state the previous kernel left behind is irrelevant and the > new kernel reinitializes the system. > > However, there are use cases where this mode of operation is not what we > actually want. In virtualization hosts for example, we want to use kexec > to update the host kernel while virtual machine memory stays untouched. > When we add device assignment to the mix, we also need to ensure that > IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA, we > need to do the same for the PCI subsystem. If we want to kexec while an > SEV-SNP enabled virtual machine is running, we need to preserve the VM > context pages and physical memory. See "pkernfs: Persisting guest memory > and kernel/device state safely across kexec" Linux Plumbers > Conference 2023 presentation for details: > > https://lpc.events/event/17/contributions/1485/ > > To start us on the journey to support all the use cases above, this patch > implements basic infrastructure to allow hand over of kernel state across > kexec (Kexec HandOver, aka KHO). As a really simple example target, we use > memblock's reserve_mem. > With this patch set applied, memory that was reserved using "reserve_mem" > command line options remains intact after kexec and it is guaranteed to > reside at the same physical address. Nice work! One concern there is that using memblock to reserve memory as crashkernel= is not flexible. I worked on kdump years ago and one of the biggest pains of kdump is how much memory should be reserved with crashkernel=. And it is still a pain today. If we reserve more, that would mean more waste for the 1st kernel. If we reserve less, that would induce more OOM for the 2nd kernel. I'd suggest considering using CMA, where the "reserved" memory can be still reusable for other purposes, just that pages can be migrated out of this reserved region on demand, that is, when loading a kexec kernel. Of course, we need to make sure they are not reused by what you want to preserve here, e.g., IOMMU. So you might need additional work to make it work, but still I believe this is the right direction. Just my two cents. Thanks!
On Sat, Feb 8, 2025 at 6:39 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > Hi Mike, > > On Thu, Feb 6, 2025 at 5:28 AM Mike Rapoport <rppt@kernel.org> wrote: > > > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> > > > > Hi, > > > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > > just to make things simpler instead of ftrace we decided to preserve > > "reserve_mem" regions. > > > > The patches are also available in git: > > https://git.kernel.org/rppt/h/kho/v4 > > > > > > Kexec today considers itself purely a boot loader: When we enter the new > > kernel, any state the previous kernel left behind is irrelevant and the > > new kernel reinitializes the system. > > > > However, there are use cases where this mode of operation is not what we > > actually want. In virtualization hosts for example, we want to use kexec > > to update the host kernel while virtual machine memory stays untouched. > > When we add device assignment to the mix, we also need to ensure that > > IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA, we > > need to do the same for the PCI subsystem. If we want to kexec while an > > SEV-SNP enabled virtual machine is running, we need to preserve the VM > > context pages and physical memory. See "pkernfs: Persisting guest memory > > and kernel/device state safely across kexec" Linux Plumbers > > Conference 2023 presentation for details: > > > > https://lpc.events/event/17/contributions/1485/ > > > > To start us on the journey to support all the use cases above, this patch > > implements basic infrastructure to allow hand over of kernel state across > > kexec (Kexec HandOver, aka KHO). As a really simple example target, we use > > memblock's reserve_mem. > > With this patch set applied, memory that was reserved using "reserve_mem" > > command line options remains intact after kexec and it is guaranteed to > > reside at the same physical address. > > Nice work! > > One concern there is that using memblock to reserve memory as crashkernel= > is not flexible. I worked on kdump years ago and one of the biggest pains > of kdump is how much memory should be reserved with crashkernel=. And > it is still a pain today. > > If we reserve more, that would mean more waste for the 1st kernel. If we > reserve less, that would induce more OOM for the 2nd kernel. > > I'd suggest considering using CMA, where the "reserved" memory can be > still reusable for other purposes, just that pages can be migrated out of this > reserved region on demand, that is, when loading a kexec kernel. Of course, > we need to make sure they are not reused by what you want to preserve here, > e.g., IOMMU. So you might need additional work to make it work, but still I > believe this is the right direction. This is exactly what scratch memory is used for. Unlike crashkernel=, the entire scratch area is available to user applications as CMA, as we know that no kernel-reserved memory will come from that area. This doesn't work for crashkernel=, because in some cases, the user pages might also need to be preserved in the crash dump. However, if user pages are going to be discarded from the crash dump (as is done 99% of the time), then it is better to also make it use CMA or ZONE_MOVABLE and use only the memory occupied by the crash kernel and do not waste any memory at all. We have an internal patch at Google that does this, and I think it would be a good improvement for the upstream kernel to carry as well. Pasha > > Just my two cents. > > Thanks!
On Fri, Feb 7, 2025 at 8:38 PM Baoquan He <bhe@redhat.com> wrote: > > On 02/06/25 at 08:28pm, Pasha Tatashin wrote: > > On Thu, Feb 6, 2025 at 7:29 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Thu, 6 Feb 2025 15:27:40 +0200 Mike Rapoport <rppt@kernel.org> wrote: > > > > > > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > > > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > > > > just to make things simpler instead of ftrace we decided to preserve > > > > "reserve_mem" regions. > > > > > > > > The patches are also available in git: > > > > https://git.kernel.org/rppt/h/kho/v4 > > > > > > > > > > > > Kexec today considers itself purely a boot loader: When we enter the new > > > > kernel, any state the previous kernel left behind is irrelevant and the > > > > new kernel reinitializes the system. > > > > > > I tossed this into mm.git for some testing and exposure. > > > > > > What merge path are you anticipating? > > > > > > Review activity seems pretty thin thus far? > > > > KHO is going to be discussed at the upcoming lsfmm, we are also > > planning to send v5 of this patch series (discussed with Mike > > Rapoport) in a couple of weeks. It will include enhancements needed > > for the hypervisor live update scenario: > > So is this V4 still a RFC if v5 will be sent by plan? Should we hold the > reviewing until v5? Or this series is a infrustructure building, v5 will > add more details as you listed as below. I am a little confused. We will modify the existing patches and send as v5 because some interfaces are going to be changed*. Otherwise, v5 will make KHO a lot more flexible as it will allow to use the tree all the time while the system is running instead of only once during the activation phase. * Changing interfaces is optional, but decision whether to change will be discussed at Hypervisor Live Update on Feb 10th: https://lore.kernel.org/all/26a4b7ca-93a6-30e2-923b-f551ced03d62@google.com/ > > > > > 1. Allow nodes to be added to the KHO tree at any time > > 2. Remove "activate" (I will also send a live update framework that > > provides the activate functionality). > > 3. Allow serialization during shutdown. > > 4. Decouple KHO from kexec_file_load(), as kexec_file_load() should > > not be used during live update blackout time. > > 5. Enable multithreaded serialization by using hash-table as an > > intermediate step before conversion to FDT. >
Hi Mike, On Thu, Feb 6, 2025 at 5:28 AM Mike Rapoport <rppt@kernel.org> wrote: > We introduce a metadata file that the kernels pass between each other. How > they pass it is architecture specific. The file's format is a Flattened > Device Tree (fdt) which has a generator and parser already included in > Linux. When the root user enables KHO through /sys/kernel/kho/active, the > kernel invokes callbacks to every driver that supports KHO to serialize > its state. When the actual kexec happens, the fdt is part of the image > set that we boot into. In addition, we keep a "scratch regions" available > for kexec: A physically contiguous memory regions that is guaranteed to > not have any memory that KHO would preserve. The new kernel bootstraps > itself using the scratch regions and sets all handed over memory as in use. > When drivers initialize that support KHO, they introspect the fdt and > recover their state from it. This includes memory reservations, where the > driver can either discard or claim reservations. I have gone through your entire patchset, if you could provide an example of a specific driver that supports KHO it would help a lot for people to understand and more importantly help driver developers to adopt. Even with a simulated driver, e.g. netdevsim, it would be greatly helpful. Thanks.
On Sat, Feb 8, 2025 at 4:14 PM Pasha Tatashin <pasha.tatashin@soleen.com> wrote: > > On Sat, Feb 8, 2025 at 6:39 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > Hi Mike, > > > > On Thu, Feb 6, 2025 at 5:28 AM Mike Rapoport <rppt@kernel.org> wrote: > > > > > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> > > > > > > Hi, > > > > > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > > > just to make things simpler instead of ftrace we decided to preserve > > > "reserve_mem" regions. > > > > > > The patches are also available in git: > > > https://git.kernel.org/rppt/h/kho/v4 > > > > > > > > > Kexec today considers itself purely a boot loader: When we enter the new > > > kernel, any state the previous kernel left behind is irrelevant and the > > > new kernel reinitializes the system. > > > > > > However, there are use cases where this mode of operation is not what we > > > actually want. In virtualization hosts for example, we want to use kexec > > > to update the host kernel while virtual machine memory stays untouched. > > > When we add device assignment to the mix, we also need to ensure that > > > IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA, we > > > need to do the same for the PCI subsystem. If we want to kexec while an > > > SEV-SNP enabled virtual machine is running, we need to preserve the VM > > > context pages and physical memory. See "pkernfs: Persisting guest memory > > > and kernel/device state safely across kexec" Linux Plumbers > > > Conference 2023 presentation for details: > > > > > > https://lpc.events/event/17/contributions/1485/ > > > > > > To start us on the journey to support all the use cases above, this patch > > > implements basic infrastructure to allow hand over of kernel state across > > > kexec (Kexec HandOver, aka KHO). As a really simple example target, we use > > > memblock's reserve_mem. > > > With this patch set applied, memory that was reserved using "reserve_mem" > > > command line options remains intact after kexec and it is guaranteed to > > > reside at the same physical address. > > > > Nice work! > > > > One concern there is that using memblock to reserve memory as crashkernel= > > is not flexible. I worked on kdump years ago and one of the biggest pains > > of kdump is how much memory should be reserved with crashkernel=. And > > it is still a pain today. > > > > If we reserve more, that would mean more waste for the 1st kernel. If we > > reserve less, that would induce more OOM for the 2nd kernel. > > > > I'd suggest considering using CMA, where the "reserved" memory can be > > still reusable for other purposes, just that pages can be migrated out of this > > reserved region on demand, that is, when loading a kexec kernel. Of course, > > we need to make sure they are not reused by what you want to preserve here, > > e.g., IOMMU. So you might need additional work to make it work, but still I > > believe this is the right direction. > > This is exactly what scratch memory is used for. Unlike crashkernel=, > the entire scratch area is available to user applications as CMA, as > we know that no kernel-reserved memory will come from that area. This > doesn't work for crashkernel=, because in some cases, the user pages > might also need to be preserved in the crash dump. However, if user > pages are going to be discarded from the crash dump (as is done 99% of > the time), then it is better to also make it use CMA or ZONE_MOVABLE > and use only the memory occupied by the crash kernel and do not waste > any memory at all. We have an internal patch at Google that does this, > and I think it would be a good improvement for the upstream kernel to > carry as well. Good to know CMA is already used, I could not tell from the cover letter. The case that user-space pages need to be preserved is for scenarios like RDMA which pins user-space pages for DMA transfer. Since the goal here is also to preserve hardware states like RDMA's I guess the same concern remains. Thanks!
On 02/08/25 at 07:23pm, Pasha Tatashin wrote: > On Fri, Feb 7, 2025 at 8:38 PM Baoquan He <bhe@redhat.com> wrote: > > > > On 02/06/25 at 08:28pm, Pasha Tatashin wrote: > > > On Thu, Feb 6, 2025 at 7:29 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > On Thu, 6 Feb 2025 15:27:40 +0200 Mike Rapoport <rppt@kernel.org> wrote: > > > > > > > > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > > > > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > > > > > just to make things simpler instead of ftrace we decided to preserve > > > > > "reserve_mem" regions. > > > > > > > > > > The patches are also available in git: > > > > > https://git.kernel.org/rppt/h/kho/v4 > > > > > > > > > > > > > > > Kexec today considers itself purely a boot loader: When we enter the new > > > > > kernel, any state the previous kernel left behind is irrelevant and the > > > > > new kernel reinitializes the system. > > > > > > > > I tossed this into mm.git for some testing and exposure. > > > > > > > > What merge path are you anticipating? > > > > > > > > Review activity seems pretty thin thus far? > > > > > > KHO is going to be discussed at the upcoming lsfmm, we are also > > > planning to send v5 of this patch series (discussed with Mike > > > Rapoport) in a couple of weeks. It will include enhancements needed > > > for the hypervisor live update scenario: > > > > So is this V4 still a RFC if v5 will be sent by plan? Should we hold the > > reviewing until v5? Or this series is a infrustructure building, v5 will > > add more details as you listed as below. I am a little confused. > > We will modify the existing patches and send as v5 because some > interfaces are going to be changed*. > > Otherwise, v5 will make KHO a lot more flexible as it will allow to > use the tree all the time while the system is running instead of only > once during the activation phase. > > * Changing interfaces is optional, but decision whether to change > will be discussed at Hypervisor Live Update on Feb 10th: > https://lore.kernel.org/all/26a4b7ca-93a6-30e2-923b-f551ced03d62@google.com/ Ah, this is what I would like to know about the difference between v4 and v5. Thanks for the information, and looking forward to seeing the v5 update. > > > > > > > > > 1. Allow nodes to be added to the KHO tree at any time > > > 2. Remove "activate" (I will also send a live update framework that > > > provides the activate functionality). > > > 3. Allow serialization during shutdown. > > > 4. Decouple KHO from kexec_file_load(), as kexec_file_load() should > > > not be used during live update blackout time. > > > 5. Enable multithreaded serialization by using hash-table as an > > > intermediate step before conversion to FDT. > > >
On 07/02/2025 01:29, Andrew Morton wrote: > On Thu, 6 Feb 2025 15:27:40 +0200 Mike Rapoport <rppt@kernel.org> wrote: > >> This a next version of Alex's "kexec: Allow preservation of ftrace buffers" >> series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), >> just to make things simpler instead of ftrace we decided to preserve >> "reserve_mem" regions. >> >> The patches are also available in git: >> https://git.kernel.org/rppt/h/kho/v4 >> >> >> Kexec today considers itself purely a boot loader: When we enter the new >> kernel, any state the previous kernel left behind is irrelevant and the >> new kernel reinitializes the system. > > I tossed this into mm.git for some testing and exposure. > > What merge path are you anticipating? > > Review activity seems pretty thin thus far? At least for DT ABI because: 1. For some reason this escaped Patchwork. Maybe was blocked by spam filters, maybe Cc list is too big. No clue. 2. In the same time fallback to Patchwork was avoided by: Cc-ing wrong address and not using expected (see git log) subject prefixes. Best regards, Krzysztof
On Thu, 6 Feb 2025 at 21:34, Mike Rapoport <rppt@kernel.org> wrote: > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> > > Hi, > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > just to make things simpler instead of ftrace we decided to preserve > "reserve_mem" regions. > > The patches are also available in git: > https://git.kernel.org/rppt/h/kho/v4 > > > Kexec today considers itself purely a boot loader: When we enter the new > kernel, any state the previous kernel left behind is irrelevant and the > new kernel reinitializes the system. > > However, there are use cases where this mode of operation is not what we > actually want. In virtualization hosts for example, we want to use kexec > to update the host kernel while virtual machine memory stays untouched. > When we add device assignment to the mix, we also need to ensure that > IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA, we > need to do the same for the PCI subsystem. If we want to kexec while an > SEV-SNP enabled virtual machine is running, we need to preserve the VM > context pages and physical memory. See "pkernfs: Persisting guest memory > and kernel/device state safely across kexec" Linux Plumbers > Conference 2023 presentation for details: > > https://lpc.events/event/17/contributions/1485/ > > To start us on the journey to support all the use cases above, this patch > implements basic infrastructure to allow hand over of kernel state across > kexec (Kexec HandOver, aka KHO). As a really simple example target, we use > memblock's reserve_mem. > With this patch set applied, memory that was reserved using "reserve_mem" > command line options remains intact after kexec and it is guaranteed to > reside at the same physical address. > > == Alternatives == > > There are alternative approaches to (parts of) the problems above: > > * Memory Pools [1] - preallocated persistent memory region + allocator > * PRMEM [2] - resizable persistent memory regions with fixed metadata > pointer on the kernel command line + allocator > * Pkernfs [3] - preallocated file system for in-kernel data with fixed > address location on the kernel command line > * PKRAM [4] - handover of user space pages using a fixed metadata page > specified via command line > > All of the approaches above fundamentally have the same problem: They > require the administrator to explicitly carve out a physical memory > location because they have no mechanism outside of the kernel command > line to pass data (including memory reservations) between kexec'ing > kernels. > > KHO provides that base foundation. We will determine later whether we > still need any of the approaches above for fast bulk memory handover of for > example IOMMU page tables. But IMHO they would all be users of KHO, with > KHO providing the foundational primitive to pass metadata and bulk memory > reservations as well as provide easy versioning for data. > > == Overview == > > We introduce a metadata file that the kernels pass between each other. How > they pass it is architecture specific. The file's format is a Flattened > Device Tree (fdt) which has a generator and parser already included in > Linux. When the root user enables KHO through /sys/kernel/kho/active, the > kernel invokes callbacks to every driver that supports KHO to serialize > its state. When the actual kexec happens, the fdt is part of the image > set that we boot into. In addition, we keep a "scratch regions" available > for kexec: A physically contiguous memory regions that is guaranteed to > not have any memory that KHO would preserve. The new kernel bootstraps > itself using the scratch regions and sets all handed over memory as in use. > When drivers initialize that support KHO, they introspect the fdt and > recover their state from it. This includes memory reservations, where the > driver can either discard or claim reservations. > > == Limitations == > > Currently KHO is only implemented for file based kexec. The kernel > interfaces in the patch set are already in place to support user space > kexec as well, but it is still not implemented it yet inside kexec tools. > What architecture exactly does this KHO work fine? Device Tree should be ok on arm*, x86 and power*, but how about s390? Thanks Dae
On Mon, Feb 17, 2025 at 11:19:45AM +0800, RuiRui Yang wrote: > On Thu, 6 Feb 2025 at 21:34, Mike Rapoport <rppt@kernel.org> wrote: > > == Limitations == > > > > Currently KHO is only implemented for file based kexec. The kernel > > interfaces in the patch set are already in place to support user space > > kexec as well, but it is still not implemented it yet inside kexec tools. > > > > What architecture exactly does this KHO work fine? Device Tree > should be ok on arm*, x86 and power*, but how about s390? KHO does not use device tree as the boot protocol, it uses FDT as a data structure and adds architecture specific bits to the boot structures to point to that data, very similar to how IMA_KEXEC works. Currently KHO is implemented on arm64 and x86, but there is no fundamental reason why it wouldn't work on any architecture that supports kexec. > Thanks > Dae >
On Wed, 19 Feb 2025 at 15:32, Mike Rapoport <rppt@kernel.org> wrote: > > On Mon, Feb 17, 2025 at 11:19:45AM +0800, RuiRui Yang wrote: > > On Thu, 6 Feb 2025 at 21:34, Mike Rapoport <rppt@kernel.org> wrote: > > > == Limitations == > > > > > > Currently KHO is only implemented for file based kexec. The kernel > > > interfaces in the patch set are already in place to support user space > > > kexec as well, but it is still not implemented it yet inside kexec tools. > > > > > > > What architecture exactly does this KHO work fine? Device Tree > > should be ok on arm*, x86 and power*, but how about s390? > > KHO does not use device tree as the boot protocol, it uses FDT as a data > structure and adds architecture specific bits to the boot structures to > point to that data, very similar to how IMA_KEXEC works. > > Currently KHO is implemented on arm64 and x86, but there is no fundamental > reason why it wouldn't work on any architecture that supports kexec. Well, the problem is whether there is a way to add dtb in the early boot path, for X86 it is added via setup_data, if there is no such way I'm not sure if it is doable especially for passing some info for early boot use. Then the KHO will be only for limited use cases. > > > Thanks > > Dae > > > > -- > Sincerely yours, > Mike. >
On 19.02.25 13:49, Dave Young wrote: > On Wed, 19 Feb 2025 at 15:32, Mike Rapoport <rppt@kernel.org> wrote: >> On Mon, Feb 17, 2025 at 11:19:45AM +0800, RuiRui Yang wrote: >>> On Thu, 6 Feb 2025 at 21:34, Mike Rapoport <rppt@kernel.org> wrote: >>>> == Limitations == >>>> >>>> Currently KHO is only implemented for file based kexec. The kernel >>>> interfaces in the patch set are already in place to support user space >>>> kexec as well, but it is still not implemented it yet inside kexec tools. >>>> >>> What architecture exactly does this KHO work fine? Device Tree >>> should be ok on arm*, x86 and power*, but how about s390? >> KHO does not use device tree as the boot protocol, it uses FDT as a data >> structure and adds architecture specific bits to the boot structures to >> point to that data, very similar to how IMA_KEXEC works. >> >> Currently KHO is implemented on arm64 and x86, but there is no fundamental >> reason why it wouldn't work on any architecture that supports kexec. > Well, the problem is whether there is a way to add dtb in the early > boot path, for X86 it is added via setup_data, if there is no such > way I'm not sure if it is doable especially for passing some info for > early boot use. Then the KHO will be only for limited use cases. Every architecture has a platform specific way of passing data into the kernel so it can find its command line and initrd. S390x for example has struct parmarea. To enable s390x, you would remove some of its padding and replace it with a KHO base addr + size, so that the new kernel can find the KHO state tree. Alex
On Wed, 19 Feb 2025 at 21:55, Alexander Graf <graf@amazon.com> wrote: > > > On 19.02.25 13:49, Dave Young wrote: > > On Wed, 19 Feb 2025 at 15:32, Mike Rapoport <rppt@kernel.org> wrote: > >> On Mon, Feb 17, 2025 at 11:19:45AM +0800, RuiRui Yang wrote: > >>> On Thu, 6 Feb 2025 at 21:34, Mike Rapoport <rppt@kernel.org> wrote: > >>>> == Limitations == > >>>> > >>>> Currently KHO is only implemented for file based kexec. The kernel > >>>> interfaces in the patch set are already in place to support user space > >>>> kexec as well, but it is still not implemented it yet inside kexec tools. > >>>> > >>> What architecture exactly does this KHO work fine? Device Tree > >>> should be ok on arm*, x86 and power*, but how about s390? > >> KHO does not use device tree as the boot protocol, it uses FDT as a data > >> structure and adds architecture specific bits to the boot structures to > >> point to that data, very similar to how IMA_KEXEC works. > >> > >> Currently KHO is implemented on arm64 and x86, but there is no fundamental > >> reason why it wouldn't work on any architecture that supports kexec. > > Well, the problem is whether there is a way to add dtb in the early > > boot path, for X86 it is added via setup_data, if there is no such > > way I'm not sure if it is doable especially for passing some info for > > early boot use. Then the KHO will be only for limited use cases. > > > Every architecture has a platform specific way of passing data into the > kernel so it can find its command line and initrd. S390x for example has > struct parmarea. To enable s390x, you would remove some of its padding > and replace it with a KHO base addr + size, so that the new kernel can > find the KHO state tree. Ok, thanks for the info, I cced s390 people maybe they can provide inputs. Other than the arch concern, I'm not so excited about the KHO because for kexec reboot there is a fundamental problem which makes us (Red Hat kexec/kdump team) can not full support it in RHEL distribution, that is the stability due to drivers usually do not implement the device shutdown method or not well tested. From time to time we see weird bugs, could be malfunctioned devices or memory corruption caused by ongoing DMA etc. Also no way for the time being to make some graphic/drm drivers work ok after a kexec reboot, it might happen to work by luck but also not stable. So I personally think that improving the above concern is more important than introducing more features to utilize kexec reboot. > > > Alex > >
On Thu, Feb 20, 2025 at 09:49:52AM +0800, Dave Young wrote: > On Wed, 19 Feb 2025 at 21:55, Alexander Graf <graf@amazon.com> wrote: > > >>> What architecture exactly does this KHO work fine? Device Tree > > >>> should be ok on arm*, x86 and power*, but how about s390? > > >> KHO does not use device tree as the boot protocol, it uses FDT as a data > > >> structure and adds architecture specific bits to the boot structures to > > >> point to that data, very similar to how IMA_KEXEC works. > > >> > > >> Currently KHO is implemented on arm64 and x86, but there is no fundamental > > >> reason why it wouldn't work on any architecture that supports kexec. > > > Well, the problem is whether there is a way to add dtb in the early > > > boot path, for X86 it is added via setup_data, if there is no such > > > way I'm not sure if it is doable especially for passing some info for > > > early boot use. Then the KHO will be only for limited use cases. > > > > > > Every architecture has a platform specific way of passing data into the > > kernel so it can find its command line and initrd. S390x for example has > > struct parmarea. To enable s390x, you would remove some of its padding > > and replace it with a KHO base addr + size, so that the new kernel can > > find the KHO state tree. > > Ok, thanks for the info, I cced s390 people maybe they can provide inputs. If I understand correctly, the parmarea would be used for passing the FDT address - which appears to be fine. However, s390 does not implement early_memremap()/early_memunmap(), which KHO needs. Thanks, Dave!
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Hi, This a next version of Alex's "kexec: Allow preservation of ftrace buffers" series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), just to make things simpler instead of ftrace we decided to preserve "reserve_mem" regions. The patches are also available in git: https://git.kernel.org/rppt/h/kho/v4 Kexec today considers itself purely a boot loader: When we enter the new kernel, any state the previous kernel left behind is irrelevant and the new kernel reinitializes the system. However, there are use cases where this mode of operation is not what we actually want. In virtualization hosts for example, we want to use kexec to update the host kernel while virtual machine memory stays untouched. When we add device assignment to the mix, we also need to ensure that IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA, we need to do the same for the PCI subsystem. If we want to kexec while an SEV-SNP enabled virtual machine is running, we need to preserve the VM context pages and physical memory. See "pkernfs: Persisting guest memory and kernel/device state safely across kexec" Linux Plumbers Conference 2023 presentation for details: https://lpc.events/event/17/contributions/1485/ To start us on the journey to support all the use cases above, this patch implements basic infrastructure to allow hand over of kernel state across kexec (Kexec HandOver, aka KHO). As a really simple example target, we use memblock's reserve_mem. With this patch set applied, memory that was reserved using "reserve_mem" command line options remains intact after kexec and it is guaranteed to reside at the same physical address. == Alternatives == There are alternative approaches to (parts of) the problems above: * Memory Pools [1] - preallocated persistent memory region + allocator * PRMEM [2] - resizable persistent memory regions with fixed metadata pointer on the kernel command line + allocator * Pkernfs [3] - preallocated file system for in-kernel data with fixed address location on the kernel command line * PKRAM [4] - handover of user space pages using a fixed metadata page specified via command line All of the approaches above fundamentally have the same problem: They require the administrator to explicitly carve out a physical memory location because they have no mechanism outside of the kernel command line to pass data (including memory reservations) between kexec'ing kernels. KHO provides that base foundation. We will determine later whether we still need any of the approaches above for fast bulk memory handover of for example IOMMU page tables. But IMHO they would all be users of KHO, with KHO providing the foundational primitive to pass metadata and bulk memory reservations as well as provide easy versioning for data. == Overview == We introduce a metadata file that the kernels pass between each other. How they pass it is architecture specific. The file's format is a Flattened Device Tree (fdt) which has a generator and parser already included in Linux. When the root user enables KHO through /sys/kernel/kho/active, the kernel invokes callbacks to every driver that supports KHO to serialize its state. When the actual kexec happens, the fdt is part of the image set that we boot into. In addition, we keep a "scratch regions" available for kexec: A physically contiguous memory regions that is guaranteed to not have any memory that KHO would preserve. The new kernel bootstraps itself using the scratch regions and sets all handed over memory as in use. When drivers initialize that support KHO, they introspect the fdt and recover their state from it. This includes memory reservations, where the driver can either discard or claim reservations. == Limitations == Currently KHO is only implemented for file based kexec. The kernel interfaces in the patch set are already in place to support user space kexec as well, but it is still not implemented it yet inside kexec tools. == How to Use == To use the code, please boot the kernel with the "kho=on" command line parameter. KHO will automatically create scratch regions. If you want to set the scratch size explicitly you can use "kho_scratch=" command line parameter. For instance, "kho_scratch=512M,256M" will create a global scratch area of 512Mib and per-node scrath areas of 256Mib. Make sure to to have a reserved memory range requested with reserv_mem command line option. Then before you invoke file based "kexec -l", activate KHO: # echo 1 > /sys/kernel/kho/active # kexec -l Image --initrd=initrd -s # kexec -e The new kernel will boot up and contain the previous kernel's reserve_mem contents at the same physical address as the first kernel. == Changelog == v3 -> v4: - Major rework of scrach management. Rather than force scratch memory allocations only very early in boot now we rely on scratch for all memblock allocations. - Use simple example usecase (reserv_mem instead of ftrace) - merge all KHO functionality into a single kernel/kexec_handover.c file - rename CONFIG_KEXEC_KHO to CONFIG_KEXEC_HANDOVER v1 -> v2: - Removed: tracing: Introduce names for ring buffers - Removed: tracing: Introduce names for events - New: kexec: Add config option for KHO - New: kexec: Add documentation for KHO - New: tracing: Initialize fields before registering - New: devicetree: Add bindings for ftrace KHO - test bot warning fixes - Change kconfig option to ARCH_SUPPORTS_KEXEC_KHO - s/kho_reserve_mem/kho_reserve_previous_mem/g - s/kho_reserve/kho_reserve_scratch/g - Remove / reduce ifdefs - Select crc32 - Leave anything that requires a name in trace.c to keep buffers unnamed entities - Put events as array into a property, use fingerprint instead of names to identify them - Reduce footprint without CONFIG_FTRACE_KHO - s/kho_reserve_mem/kho_reserve_previous_mem/g - make kho_get_fdt() const - Add stubs for return_mem and claim_mem - make kho_get_fdt() const - Get events as array from a property, use fingerprint instead of names to identify events - Change kconfig option to ARCH_SUPPORTS_KEXEC_KHO - s/kho_reserve_mem/kho_reserve_previous_mem/g - s/kho_reserve/kho_reserve_scratch/g - Leave the node generation code that needs to know the name in trace.c so that ring buffers can stay anonymous - s/kho_reserve/kho_reserve_scratch/g - Move kho enums out of ifdef - Move from names to fdt offsets. That way, trace.c can find the trace array offset and then the ring buffer code only needs to read out its per-CPU data. That way it can stay oblivient to its name. - Make kho_get_fdt() const v2 -> v3: - Fix make dt_binding_check - Add descriptions for each object - s/trace_flags/trace-flags/ - s/global_trace/global-trace/ - Make all additionalProperties false - Change subject to reflect subsysten (dt-bindings) - Fix indentation - Remove superfluous examples - Convert to 64bit syntax - Move to kho directory - s/"global_trace"/"global-trace"/ - s/"global_trace"/"global-trace"/ - s/"trace_flags"/"trace-flags"/ - Fix wording - Add Documentation to MAINTAINERS file - Remove kho reference on read error - Move handover_dt unmap up - s/reserve_scratch_mem/mark_phys_as_cma/ - Remove ifdeffery - Remove superfluous comment Alexander Graf (9): memblock: Add support for scratch memory kexec: Add Kexec HandOver (KHO) generation helpers kexec: Add KHO parsing support kexec: Add KHO support to kexec file loads kexec: Add config option for KHO kexec: Add documentation for KHO arm64: Add KHO support x86: Add KHO support memblock: Add KHO support for reserve_mem Mike Rapoport (Microsoft) (5): mm/mm_init: rename init_reserved_page to init_deferred_page memblock: add MEMBLOCK_RSRV_KERN flag memblock: introduce memmap_init_kho_scratch() x86/setup: use memblock_reserve_kern for memory used by kernel Documentation: KHO: Add memblock bindings Documentation/ABI/testing/sysfs-firmware-kho | 9 + Documentation/ABI/testing/sysfs-kernel-kho | 53 ++ .../admin-guide/kernel-parameters.txt | 24 + .../kho/bindings/memblock/reserve_mem.yaml | 41 + .../bindings/memblock/reserve_mem_map.yaml | 42 + Documentation/kho/concepts.rst | 80 ++ Documentation/kho/index.rst | 19 + Documentation/kho/usage.rst | 60 ++ Documentation/subsystem-apis.rst | 1 + MAINTAINERS | 3 + arch/arm64/Kconfig | 3 + arch/x86/Kconfig | 3 + arch/x86/boot/compressed/kaslr.c | 52 +- arch/x86/include/asm/setup.h | 4 + arch/x86/include/uapi/asm/setup_data.h | 13 +- arch/x86/kernel/e820.c | 18 + arch/x86/kernel/kexec-bzimage64.c | 36 + arch/x86/kernel/setup.c | 39 +- arch/x86/realmode/init.c | 2 + drivers/of/fdt.c | 36 + drivers/of/kexec.c | 42 + include/linux/cma.h | 2 + include/linux/kexec.h | 37 + include/linux/kexec_handover.h | 10 + include/linux/memblock.h | 38 +- kernel/Kconfig.kexec | 13 + kernel/Makefile | 1 + kernel/kexec_file.c | 19 + kernel/kexec_handover.c | 808 ++++++++++++++++++ kernel/kexec_internal.h | 16 + mm/Kconfig | 4 + mm/internal.h | 5 +- mm/memblock.c | 247 +++++- mm/mm_init.c | 19 +- 34 files changed, 1775 insertions(+), 24 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-firmware-kho create mode 100644 Documentation/ABI/testing/sysfs-kernel-kho create mode 100644 Documentation/kho/bindings/memblock/reserve_mem.yaml create mode 100644 Documentation/kho/bindings/memblock/reserve_mem_map.yaml create mode 100644 Documentation/kho/concepts.rst create mode 100644 Documentation/kho/index.rst create mode 100644 Documentation/kho/usage.rst create mode 100644 include/linux/kexec_handover.h create mode 100644 kernel/kexec_handover.c base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b